SlideShare a Scribd company logo
1 of 38
Deploying and Researching
Hadoop in Virtual Machines
Hadoop:
• Hadoop is an open source software platform.
• It is derived from Google’s MapReduce and GFS(Google file
system).
• Hadoop is an open source implementation of MapReduce.
• It develops open source software for reliable and scalable distributed
computing.
Definition:
• Basically, it's a way of storing enormous data sets across clusters of
computers .
• It is designed to be Robust and Efficient.
• The Apache Hadoop software library is a framework .
• It is designed to scale up from single servers to thousands of
machines.
Who uses Hadoop?
Abstract:
• Hadoop's emerging and the maturity of virtualization make it
feasible.
• It introduces some technologies used such as CloudStack,
MapReduce and Hadoop.
• How to deploy Hadoop in virtual machines which can be
obtained from Cloud Stack .
• we run some Hadoop programs under the virtual cluster.
Introduction:
• Now a days, the most frequently used programs are those
Internet based services.
• MapReduce can process 20 PB of data per day.
• Ability to read and write data.
• A reliable shared storage and analysis system (HDFS and
MapReduce)
• Enables applications to work .
Literature survey:
• Ignoring the data locality issue in different types of
environments can easily reduce the MapReduce
performance.
• Experimental results on two real data-intensive
applications show that their data placement strategy.
• The first generation of Hadoop had two single points of
failure: the NameNode and JobTracker processes.
• Hadoop MapReduce has two main services: the
jobtracker and the tasktracker.
Existing System:
• Need to process terabytes of data in efficient manner on daily
bases.
• In the existing system we are using single virtual machine.
• The disadvantage is that the potential for poor performance
and heavy load undoubtedly, which is what to be solved .
Proposed System:
• In the proposed system we are using cloud stack infrastructure.
• MapReduce is designed under cluster, management of thousands
commodity PCs is a big job.
• Deploying the Hadoop Applications on virtual machines .
• Maybe the biggest problem is the power consumption.
Modules:
• Module 1: User has to start namenode, datanode,
jobtracker and task tracker nodes based on the virtual
machine.
• Module2: User observes the virtual machines running on
cluster infrastructure.
• Module3: User can connect to any virtual machine
running on cluster by providing required details.
• Module4: In this module user can deploy the files on
connected virtual machine and do research on any virtual
machine.
Hardware Requirements
• Pentium 4 Processor
• 8GB RAM
• 64 bit OS(Ubuntu)
• 200 GB HDD
Software Requirements
• Java 6
• Eclipse Indigo (With Hadoop Configuration)
• Hadoop Appliance
• Cygwin
• CloudStack
ARCHITECTURE
3-TierArchitecture
Master/Slave Architeture
HDFSArchitecture
DESIGNING
CLASS DIAGRAM
USECASE DIAGRAM
name node
data node
start job tracker
connect to VM
logout
deploy files
research on files
user
SEQUENCE DIAGRAM
user HDFS
start name node
response
data node
response
job tracker
response
deploy files
response
research on files
response
logout
response
COLLABORATION DIAGRAM
user HDFS
1: start name node
2: response
3: data node
4: response
5: job tracker
6: response
7: deploy files
8: response
9: research on files
10: response
11: logout
12: response
TESTING
 Black Box Testing
 White Box Testing
 Grey Box Testing
 Regression Testing
Test Cases
Name Input Output
Activate Root Account Username and password Successfully Enabled
Starting management
Server
Management Server Details Successfully started
Adding Pod Pod details Successfully Added
Adding Zone Zone Details Successfully Added
Adding Cluster Cluster Details Successfully Added
Primary Storage Primary Storage Details Successfully Added
Secondary Storage Secondary Storage Details Successfully Added
OUTPUTSCREENS
Home Page
Dash Board
Instances
Network
Events
Accounts
Domains
Infrastructure
Projects
Global Settings
Service Settings
Conclusion:
• This Project CloudStack, MapReduce programming
model and Hadoop, which allows distributed parallel
running, which shows that it is feasible to deploying and
research Hadoop in Virtual machines . The advantages are
that it can ease the management, fully utilize the
computing resources, make Hadoop more reliable and
save power and so on. Then some methods to optimize
Hadoop in virtual machines are discussed.
Future Enhancements
• Right Management:
For example, we can arrange a test administrator to be
responsible for this experimental course, then the
experimental teachers can only view and count related
information of experimental course, other courses do not
have permission.
• Experimental Control and Report Submission:
The instructor can specify the actionable experimental
project, and the system design experimental record, save the
1219 experimental project information that students have
taken in pilot project, facilitate faculty management .
BIBLIOGRAPHY
• List of Reference Documents:
• Grady Brooch, “The Unified Modeling Language Users guide”
• Roger S Pressman, “Software Engineering”, A practitioners
approach
• Walker Royce, “Software Project Management”
• Head First Series for Java
• Web References:
• http://en.wikipedia.org/wiki/HDFS#Hadoop_distributed_file_system
• http://hadoop.apache.org/
• http://en.wikipedia.org/wiki/Mapreduce
• http://en.wikipedia.org/wiki/Main_Page
• http://cloudstack.apache.org/about.html
project--2 nd review_2

More Related Content

What's hot

Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
Payment Gateway Live hadoop project
Payment Gateway Live hadoop projectPayment Gateway Live hadoop project
Payment Gateway Live hadoop projectKamal A
 
Day1_23Aug.txt - Notepad
Day1_23Aug.txt - NotepadDay1_23Aug.txt - Notepad
Day1_23Aug.txt - NotepadVenkat Krishnan
 
Hadoop_RealTime_Processing_eVenkat
Hadoop_RealTime_Processing_eVenkatHadoop_RealTime_Processing_eVenkat
Hadoop_RealTime_Processing_eVenkatVenkat Krishnan
 
Summary machine learning and model deployment
Summary machine learning and model deploymentSummary machine learning and model deployment
Summary machine learning and model deploymentNovita Sari
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningDataWorks Summit
 
Hadoop_Architect__eVenkat
Hadoop_Architect__eVenkatHadoop_Architect__eVenkat
Hadoop_Architect__eVenkatVenkat Krishnan
 
Future of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsFuture of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsMilind Bhandarkar
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSBouquet
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitMilind Bhandarkar
 

What's hot (18)

Anju
AnjuAnju
Anju
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Payment Gateway Live hadoop project
Payment Gateway Live hadoop projectPayment Gateway Live hadoop project
Payment Gateway Live hadoop project
 
Day1_23Aug.txt - Notepad
Day1_23Aug.txt - NotepadDay1_23Aug.txt - Notepad
Day1_23Aug.txt - Notepad
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop_RealTime_Processing_eVenkat
Hadoop_RealTime_Processing_eVenkatHadoop_RealTime_Processing_eVenkat
Hadoop_RealTime_Processing_eVenkat
 
Summary machine learning and model deployment
Summary machine learning and model deploymentSummary machine learning and model deployment
Summary machine learning and model deployment
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Hadoop_Architect__eVenkat
Hadoop_Architect__eVenkatHadoop_Architect__eVenkat
Hadoop_Architect__eVenkat
 
Future of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsFuture of Data Intensive Applicaitons
Future of Data Intensive Applicaitons
 
Hadoop
HadoopHadoop
Hadoop
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 

Viewers also liked

Beatles redes sociais
Beatles redes sociaisBeatles redes sociais
Beatles redes sociaisllucky14
 
Ch. 16 Database Case Study: XML/XSLT
Ch. 16 Database Case Study: XML/XSLTCh. 16 Database Case Study: XML/XSLT
Ch. 16 Database Case Study: XML/XSLTmh-108
 
Adventure tourism - A Passion for Change
Adventure tourism - A Passion for Change Adventure tourism - A Passion for Change
Adventure tourism - A Passion for Change Âshîsh Bârñwâl
 
Digipack explanation
Digipack explanationDigipack explanation
Digipack explanationjam3scoles
 
Revago20141004 peter - reis door het hart van gods plan
Revago20141004 peter - reis door het hart van gods planRevago20141004 peter - reis door het hart van gods plan
Revago20141004 peter - reis door het hart van gods planmissim77
 
Revago20141004 goswin - gods genade zwart op wit
Revago20141004 goswin - gods genade zwart op witRevago20141004 goswin - gods genade zwart op wit
Revago20141004 goswin - gods genade zwart op witmissim77
 
20130909 sacloudの薄い本
20130909 sacloudの薄い本20130909 sacloudの薄い本
20130909 sacloudの薄い本Yasuyuki SAITO
 
Shot Types Diagram
Shot Types DiagramShot Types Diagram
Shot Types Diagramjam3scoles
 
Digipack explanation
Digipack explanationDigipack explanation
Digipack explanationjam3scoles
 
Revago20141004 andre - de overtreffende aeonen
Revago20141004 andre - de overtreffende aeonenRevago20141004 andre - de overtreffende aeonen
Revago20141004 andre - de overtreffende aeonenmissim77
 
Ch. 10 FIT5, CIS 110 13F
Ch. 10 FIT5, CIS 110 13FCh. 10 FIT5, CIS 110 13F
Ch. 10 FIT5, CIS 110 13Fmh-108
 
NCC ART104 1
NCC ART104 1NCC ART104 1
NCC ART104 165swiss
 

Viewers also liked (18)

Beatles redes sociais
Beatles redes sociaisBeatles redes sociais
Beatles redes sociais
 
Ch. 16 Database Case Study: XML/XSLT
Ch. 16 Database Case Study: XML/XSLTCh. 16 Database Case Study: XML/XSLT
Ch. 16 Database Case Study: XML/XSLT
 
Ne-smity
Ne-smityNe-smity
Ne-smity
 
Civil Rights: Historical View
Civil Rights: Historical ViewCivil Rights: Historical View
Civil Rights: Historical View
 
Theories
TheoriesTheories
Theories
 
Kasvatuskumppanuus
KasvatuskumppanuusKasvatuskumppanuus
Kasvatuskumppanuus
 
Adventure tourism - A Passion for Change
Adventure tourism - A Passion for Change Adventure tourism - A Passion for Change
Adventure tourism - A Passion for Change
 
Shan bhai
Shan bhaiShan bhai
Shan bhai
 
презентация30.08.2013
презентация30.08.2013презентация30.08.2013
презентация30.08.2013
 
Digipack explanation
Digipack explanationDigipack explanation
Digipack explanation
 
Revago20141004 peter - reis door het hart van gods plan
Revago20141004 peter - reis door het hart van gods planRevago20141004 peter - reis door het hart van gods plan
Revago20141004 peter - reis door het hart van gods plan
 
Revago20141004 goswin - gods genade zwart op wit
Revago20141004 goswin - gods genade zwart op witRevago20141004 goswin - gods genade zwart op wit
Revago20141004 goswin - gods genade zwart op wit
 
20130909 sacloudの薄い本
20130909 sacloudの薄い本20130909 sacloudの薄い本
20130909 sacloudの薄い本
 
Shot Types Diagram
Shot Types DiagramShot Types Diagram
Shot Types Diagram
 
Digipack explanation
Digipack explanationDigipack explanation
Digipack explanation
 
Revago20141004 andre - de overtreffende aeonen
Revago20141004 andre - de overtreffende aeonenRevago20141004 andre - de overtreffende aeonen
Revago20141004 andre - de overtreffende aeonen
 
Ch. 10 FIT5, CIS 110 13F
Ch. 10 FIT5, CIS 110 13FCh. 10 FIT5, CIS 110 13F
Ch. 10 FIT5, CIS 110 13F
 
NCC ART104 1
NCC ART104 1NCC ART104 1
NCC ART104 1
 

Similar to project--2 nd review_2

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptxRATISHKUMAR32
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoopch adnan
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceeakasit_dpu
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTAmmarHassan80
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With HadoopUmair Shafique
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsrishavkumar1402
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Platform as a service standard for hadoop environment
Platform as a service standard for hadoop environmentPlatform as a service standard for hadoop environment
Platform as a service standard for hadoop environmentAbhay Pai
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 

Similar to project--2 nd review_2 (20)

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptx
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoT
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 
Talend for big_data_intorduction
Talend for big_data_intorductionTalend for big_data_intorduction
Talend for big_data_intorduction
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Platform as a service standard for hadoop environment
Platform as a service standard for hadoop environmentPlatform as a service standard for hadoop environment
Platform as a service standard for hadoop environment
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 

project--2 nd review_2

  • 1. Deploying and Researching Hadoop in Virtual Machines
  • 2. Hadoop: • Hadoop is an open source software platform. • It is derived from Google’s MapReduce and GFS(Google file system). • Hadoop is an open source implementation of MapReduce. • It develops open source software for reliable and scalable distributed computing. Definition: • Basically, it's a way of storing enormous data sets across clusters of computers . • It is designed to be Robust and Efficient. • The Apache Hadoop software library is a framework . • It is designed to scale up from single servers to thousands of machines.
  • 4. Abstract: • Hadoop's emerging and the maturity of virtualization make it feasible. • It introduces some technologies used such as CloudStack, MapReduce and Hadoop. • How to deploy Hadoop in virtual machines which can be obtained from Cloud Stack . • we run some Hadoop programs under the virtual cluster.
  • 5. Introduction: • Now a days, the most frequently used programs are those Internet based services. • MapReduce can process 20 PB of data per day. • Ability to read and write data. • A reliable shared storage and analysis system (HDFS and MapReduce) • Enables applications to work .
  • 6. Literature survey: • Ignoring the data locality issue in different types of environments can easily reduce the MapReduce performance. • Experimental results on two real data-intensive applications show that their data placement strategy. • The first generation of Hadoop had two single points of failure: the NameNode and JobTracker processes. • Hadoop MapReduce has two main services: the jobtracker and the tasktracker.
  • 7. Existing System: • Need to process terabytes of data in efficient manner on daily bases. • In the existing system we are using single virtual machine. • The disadvantage is that the potential for poor performance and heavy load undoubtedly, which is what to be solved .
  • 8. Proposed System: • In the proposed system we are using cloud stack infrastructure. • MapReduce is designed under cluster, management of thousands commodity PCs is a big job. • Deploying the Hadoop Applications on virtual machines . • Maybe the biggest problem is the power consumption.
  • 9. Modules: • Module 1: User has to start namenode, datanode, jobtracker and task tracker nodes based on the virtual machine. • Module2: User observes the virtual machines running on cluster infrastructure. • Module3: User can connect to any virtual machine running on cluster by providing required details. • Module4: In this module user can deploy the files on connected virtual machine and do research on any virtual machine.
  • 10. Hardware Requirements • Pentium 4 Processor • 8GB RAM • 64 bit OS(Ubuntu) • 200 GB HDD
  • 11. Software Requirements • Java 6 • Eclipse Indigo (With Hadoop Configuration) • Hadoop Appliance • Cygwin • CloudStack
  • 18. USECASE DIAGRAM name node data node start job tracker connect to VM logout deploy files research on files user
  • 19. SEQUENCE DIAGRAM user HDFS start name node response data node response job tracker response deploy files response research on files response logout response
  • 20. COLLABORATION DIAGRAM user HDFS 1: start name node 2: response 3: data node 4: response 5: job tracker 6: response 7: deploy files 8: response 9: research on files 10: response 11: logout 12: response
  • 21. TESTING  Black Box Testing  White Box Testing  Grey Box Testing  Regression Testing
  • 22. Test Cases Name Input Output Activate Root Account Username and password Successfully Enabled Starting management Server Management Server Details Successfully started Adding Pod Pod details Successfully Added Adding Zone Zone Details Successfully Added Adding Cluster Cluster Details Successfully Added Primary Storage Primary Storage Details Successfully Added Secondary Storage Secondary Storage Details Successfully Added
  • 35. Conclusion: • This Project CloudStack, MapReduce programming model and Hadoop, which allows distributed parallel running, which shows that it is feasible to deploying and research Hadoop in Virtual machines . The advantages are that it can ease the management, fully utilize the computing resources, make Hadoop more reliable and save power and so on. Then some methods to optimize Hadoop in virtual machines are discussed.
  • 36. Future Enhancements • Right Management: For example, we can arrange a test administrator to be responsible for this experimental course, then the experimental teachers can only view and count related information of experimental course, other courses do not have permission. • Experimental Control and Report Submission: The instructor can specify the actionable experimental project, and the system design experimental record, save the 1219 experimental project information that students have taken in pilot project, facilitate faculty management .
  • 37. BIBLIOGRAPHY • List of Reference Documents: • Grady Brooch, “The Unified Modeling Language Users guide” • Roger S Pressman, “Software Engineering”, A practitioners approach • Walker Royce, “Software Project Management” • Head First Series for Java • Web References: • http://en.wikipedia.org/wiki/HDFS#Hadoop_distributed_file_system • http://hadoop.apache.org/ • http://en.wikipedia.org/wiki/Mapreduce • http://en.wikipedia.org/wiki/Main_Page • http://cloudstack.apache.org/about.html