SlideShare a Scribd company logo
1 of 33
Download to read offline
Hadoop 2.7.2 Introduction
and Installation
-Santosh G. Nage
 Big Data: The 5 Vs Everyone Must Know
1. Volume : the vast amounts of data generated every second
2. Velocity : the speed at which new data is generated
3. Variety : the different types of data
4. Veracity : the messiness or trustworthiness of the data
5. Value : Value! It is all well and good having access to big data but
unless we can turn it into value it is useless
Problems with Traditional Large Systems-Scale
 Traditionally, computation has been processor-bound
• Relatively small amounts of data
• Significant amount of complex processing performed on that data
 For decades, the primary push was to increase the computing power of a Single
machine
 Faster processor, more RAM
 Moore’s Law: roughly stated, processing power doubles every two years
 Even that hasn’t always proved adequate for very CPU-intensive jobs
Distributed Systems: Problems
 Programming for traditional distributed systems is complex
 Data exchange requires synchronization
 Finite bandwidth is available
 Temporal dependencies are complicated
 It is difficult to deal with partial failures of the system
Distributed Systems: Data Storage
 Typically, data for a distributed system is stored on a SAN
 At compute time, data is copied to the compute nodes
 Fine for relatively limited amounts of data
The Data-Driven World
 Modern systems have to deal with far more data than was the case in the past
 Organizations are generating huge amounts of data
 That data has inherent value, and cannot be discarded
 E.g :
Facebook – over 15 Petabytes
eBay – over 5 Petabytes
 Many organizations are generating data at a rate of terabytes per day
Data Becomes the Bottleneck
 Getting the data to the processors becomes the bottleneck
 Quick calculation
-Typical disk data transfer rate: 75Mb/sec
-Time taken to transfer 100Gb of data to the processor: approx 22 minutes
Assuming sustained reads
 Actual time will be worse, since most servers have less than 100Gb of RAM
available
----- A new approach is needed -----
Requirements for a New Approach
 The system must support partial failure
 Data Recoverability
--If a component of the system fails, its workload should be assumed by still
functioning units in the system
--Failure should not result in the loss of any data
 Component Recovery
--If a component of the system fails and then recovers, it should be able to rejoin
the system
 Consistency
--Component failures during execution of a job should not affect the outcome of the job
 Scalability
--Adding load to the system should result in a graceful decline in performance
of individual jobs
--Increasing resources should support a proportional increase in load capacity
HADOOP is a free, Java -based programming framework that supports
the processing of large data sets in a distributed computing environment.
 It is part of the Apache project sponsored by the Apache Software Foundation.
The Hadoop Project
 HADOOP is an open source project overseen by the Apache Software Foundation
 Originally based on papers published by Google in 2003 and 2004
 HADOOP committers work at several different organizations
 Including Cloudera, Yahoo!, Facebook
Hadoop Versions…
Hadoop Distributions(providers)
Hadoop Components
HADOOP consists of two core components
 MapReduce
 HDFS(HADOOP Distributed File System)
 There are many other projects based around core Hadoop Often referred to as the
‘HADOOP Ecosystem’
 Pig, Hive, HBase, Flume, Oozie, Sqoop, etc
 Set of machines running HDFS and MapReduce is known as a Cluster
 Individual machines are known as nodes
 A cluster can have as few as one node, as many as several thousands
HDFS
 HADOOP Distributed File System, is responsible for storing data on the cluster
 Data is split into blocks and distributed across multiple nodes in the cluster
 Each block is typically 64Mb or 128Mb in size
 Each block is replicated multiple times
 Default is to replicate each block three times
 Replicas are stored on different nodes
 This ensures both reliability and availability
 HDFS is a file system written in Java
 Based on Google’s GFS
 Provides redundant storage for massive amounts of data
MapReduce
 MapReduce is the system used to process data in the HADOOP cluster
 Consists of two phases: Map, and then Reduce
 Between the two is a stage known as the shuffle and sort
 Each Map task operates on a discrete portion of the overall dataset
 After all Maps are complete, the MapReduce system distributes the
intermediate data to nodes which perform the Reduce phase
Hadoop Ecosystem
Master Slave Architecture
Name Node
 HDFS is one primary components of Hadoop cluster and HDFS is designed to
have Master-slave architecture
 Master: NameNode
 The Master (NameNode) manages the file system namespace operations like
opening, closing, and renaming files and directories and determines the
mapping of blocks to DataNodes along with regulating access to files by
clients
Data Node
 Slave: DataNode
 Slaves (DataNodes) are responsible for serving read and write requests from
the file system’s clients along with perform block creation, deletion, and
replication upon instruction from the Master (NameNode).
Task Tracker
 Map/Reduce is also primary component of Hadoop and it also have Master-slave
architecture
 Master: JobTracker
 Jobtracker is the point of interaction between users and the map/reduce framework.
When a map/reduce job is submitted, Jobtracker puts it in a queue of pending jobs
and executes them on a first-come/first-served basis and then manages the
assignment of map and reduce tasks to the tasktrackers.
Job Tracker
 Slaves tasktracker execute tasks upon instruction from the Jobtracker and
also handle data motion between the map and reduce phases.
 Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management
technology.
 YARN allows multiple access engines to use Hadoop as the common standard for
batch, interactive and real-time engines that can simultaneously access the same
data set.
 YARN’s ResourceManager focuses exclusively on scheduling and keeps pace as
clusters expand to thousands of nodes managing petabytes of data.
YARN
Installation of Hadoop 2.7.2
 Hadoop is supported by GNU/Linux platform and its flavors.
 In case you have an OS other than Linux, you can install a Virtualbox software
in it and have Linux inside it.
 Hadoop Operation Modes
1. Local/Standalone Mode
2. Pseudo Distributed Mode
3. Fully Distributed Mode
Prerequisites
 Environment :
 Ubuntu 14. 04
 JDK 8 or above(Oracle latest release)
 Hadoop-2.7.2 (Any latest release)
Installing Java manually
 Download latest Java from
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
 Move downloaded [*.tar.gz] file to “/usr/local/java”
 Extract [*.tar.gz]
 Set java path in “./bashrc” file as follows…..
sant@ULTP-453:~$ gedit ~/.bashrc
 Copy following lines and append into it….
export JAVA_HOME=/usr/local/java
export PATH=$PATH:$JAVA_HOME/bin
SSH configuration
 Install SSH using the command
sant@ULTP-453:~$ sudo apt-get install ssh
 Generate RSA key
sant@ULTP-453:~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
 Copy the public key into the new machine's authorized_keys file
sant@ULTP-453:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
 Permission
sant@ULTP-453:~$ chmod 0600 ~/.ssh/authorized_keys
 Download latest hadoop from
https://archive.apache.org/dist/hadoop/common/
 Move downloaded [*.tar.gz] file to “/usr/local/hadoop”
 Extract [*.tar.gz]
 Change permission of hadoop directory as
sant@ULTP-453:~$ chmod –R 777 /usr/local/hadoop
 Open ‘hadoop-env.sh’ located in ‘/usr/local/hadoop/etc/hadoop/’ to set JAVA path
export JAVA_HOME=/usr/local/java
Installing hadoop 2.7.2
Changes in Configuration file….
 etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
 etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
 etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
 etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Start hadoop in Psudo mode
 Format NameNode
$ bin/hdfs namenode –format
 Starting all demons
$ sh start-all.sh OR $ sbin/start-dfs.sh
Check demons running…
Stop all services…
$ sh stop-all.sh OR $ sbin/stop-dfs.sh
Hadoop installation by santosh nage

More Related Content

What's hot

Apache kafka configuration-guide
Apache kafka configuration-guideApache kafka configuration-guide
Apache kafka configuration-guideChetan Khatri
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentationpuneet yadav
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop ClusterEdureka!
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Hadoop migration and upgradation
Hadoop migration and upgradationHadoop migration and upgradation
Hadoop migration and upgradationShashwat Shriparv
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an exampleNikita Kesharwani
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!Edureka!
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterEdureka!
 

What's hot (20)

Apache kafka configuration-guide
Apache kafka configuration-guideApache kafka configuration-guide
Apache kafka configuration-guide
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Hadoop administration
Hadoop administrationHadoop administration
Hadoop administration
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Hadoop migration and upgradation
Hadoop migration and upgradationHadoop migration and upgradation
Hadoop migration and upgradation
 
Next generation technology
Next generation technologyNext generation technology
Next generation technology
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an example
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Hadoop2.2
Hadoop2.2Hadoop2.2
Hadoop2.2
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 

Similar to Hadoop installation by santosh nage

Similar to Hadoop installation by santosh nage (20)

Unit 1
Unit 1Unit 1
Unit 1
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Hadoop
HadoopHadoop
Hadoop
 

Recently uploaded

Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 

Recently uploaded (20)

Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 

Hadoop installation by santosh nage

  • 1. Hadoop 2.7.2 Introduction and Installation -Santosh G. Nage
  • 2.  Big Data: The 5 Vs Everyone Must Know 1. Volume : the vast amounts of data generated every second 2. Velocity : the speed at which new data is generated 3. Variety : the different types of data 4. Veracity : the messiness or trustworthiness of the data 5. Value : Value! It is all well and good having access to big data but unless we can turn it into value it is useless
  • 3. Problems with Traditional Large Systems-Scale  Traditionally, computation has been processor-bound • Relatively small amounts of data • Significant amount of complex processing performed on that data  For decades, the primary push was to increase the computing power of a Single machine  Faster processor, more RAM  Moore’s Law: roughly stated, processing power doubles every two years  Even that hasn’t always proved adequate for very CPU-intensive jobs
  • 4. Distributed Systems: Problems  Programming for traditional distributed systems is complex  Data exchange requires synchronization  Finite bandwidth is available  Temporal dependencies are complicated  It is difficult to deal with partial failures of the system
  • 5. Distributed Systems: Data Storage  Typically, data for a distributed system is stored on a SAN  At compute time, data is copied to the compute nodes  Fine for relatively limited amounts of data
  • 6. The Data-Driven World  Modern systems have to deal with far more data than was the case in the past  Organizations are generating huge amounts of data  That data has inherent value, and cannot be discarded  E.g : Facebook – over 15 Petabytes eBay – over 5 Petabytes  Many organizations are generating data at a rate of terabytes per day
  • 7. Data Becomes the Bottleneck  Getting the data to the processors becomes the bottleneck  Quick calculation -Typical disk data transfer rate: 75Mb/sec -Time taken to transfer 100Gb of data to the processor: approx 22 minutes Assuming sustained reads  Actual time will be worse, since most servers have less than 100Gb of RAM available ----- A new approach is needed -----
  • 8. Requirements for a New Approach  The system must support partial failure  Data Recoverability --If a component of the system fails, its workload should be assumed by still functioning units in the system --Failure should not result in the loss of any data  Component Recovery --If a component of the system fails and then recovers, it should be able to rejoin the system  Consistency --Component failures during execution of a job should not affect the outcome of the job  Scalability --Adding load to the system should result in a graceful decline in performance of individual jobs --Increasing resources should support a proportional increase in load capacity
  • 9. HADOOP is a free, Java -based programming framework that supports the processing of large data sets in a distributed computing environment.  It is part of the Apache project sponsored by the Apache Software Foundation.
  • 10. The Hadoop Project  HADOOP is an open source project overseen by the Apache Software Foundation  Originally based on papers published by Google in 2003 and 2004  HADOOP committers work at several different organizations  Including Cloudera, Yahoo!, Facebook
  • 11.
  • 14. Hadoop Components HADOOP consists of two core components  MapReduce  HDFS(HADOOP Distributed File System)  There are many other projects based around core Hadoop Often referred to as the ‘HADOOP Ecosystem’  Pig, Hive, HBase, Flume, Oozie, Sqoop, etc  Set of machines running HDFS and MapReduce is known as a Cluster  Individual machines are known as nodes  A cluster can have as few as one node, as many as several thousands
  • 15. HDFS  HADOOP Distributed File System, is responsible for storing data on the cluster  Data is split into blocks and distributed across multiple nodes in the cluster  Each block is typically 64Mb or 128Mb in size  Each block is replicated multiple times  Default is to replicate each block three times  Replicas are stored on different nodes  This ensures both reliability and availability  HDFS is a file system written in Java  Based on Google’s GFS  Provides redundant storage for massive amounts of data
  • 16. MapReduce  MapReduce is the system used to process data in the HADOOP cluster  Consists of two phases: Map, and then Reduce  Between the two is a stage known as the shuffle and sort  Each Map task operates on a discrete portion of the overall dataset  After all Maps are complete, the MapReduce system distributes the intermediate data to nodes which perform the Reduce phase
  • 19. Name Node  HDFS is one primary components of Hadoop cluster and HDFS is designed to have Master-slave architecture  Master: NameNode  The Master (NameNode) manages the file system namespace operations like opening, closing, and renaming files and directories and determines the mapping of blocks to DataNodes along with regulating access to files by clients Data Node  Slave: DataNode  Slaves (DataNodes) are responsible for serving read and write requests from the file system’s clients along with perform block creation, deletion, and replication upon instruction from the Master (NameNode).
  • 20. Task Tracker  Map/Reduce is also primary component of Hadoop and it also have Master-slave architecture  Master: JobTracker  Jobtracker is the point of interaction between users and the map/reduce framework. When a map/reduce job is submitted, Jobtracker puts it in a queue of pending jobs and executes them on a first-come/first-served basis and then manages the assignment of map and reduce tasks to the tasktrackers. Job Tracker  Slaves tasktracker execute tasks upon instruction from the Jobtracker and also handle data motion between the map and reduce phases.
  • 21.  Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology.  YARN allows multiple access engines to use Hadoop as the common standard for batch, interactive and real-time engines that can simultaneously access the same data set.  YARN’s ResourceManager focuses exclusively on scheduling and keeps pace as clusters expand to thousands of nodes managing petabytes of data. YARN
  • 22. Installation of Hadoop 2.7.2  Hadoop is supported by GNU/Linux platform and its flavors.  In case you have an OS other than Linux, you can install a Virtualbox software in it and have Linux inside it.  Hadoop Operation Modes 1. Local/Standalone Mode 2. Pseudo Distributed Mode 3. Fully Distributed Mode
  • 23. Prerequisites  Environment :  Ubuntu 14. 04  JDK 8 or above(Oracle latest release)  Hadoop-2.7.2 (Any latest release)
  • 24. Installing Java manually  Download latest Java from http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html  Move downloaded [*.tar.gz] file to “/usr/local/java”  Extract [*.tar.gz]  Set java path in “./bashrc” file as follows….. sant@ULTP-453:~$ gedit ~/.bashrc  Copy following lines and append into it…. export JAVA_HOME=/usr/local/java export PATH=$PATH:$JAVA_HOME/bin
  • 25. SSH configuration  Install SSH using the command sant@ULTP-453:~$ sudo apt-get install ssh  Generate RSA key sant@ULTP-453:~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa  Copy the public key into the new machine's authorized_keys file sant@ULTP-453:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys  Permission sant@ULTP-453:~$ chmod 0600 ~/.ssh/authorized_keys
  • 26.  Download latest hadoop from https://archive.apache.org/dist/hadoop/common/  Move downloaded [*.tar.gz] file to “/usr/local/hadoop”  Extract [*.tar.gz]  Change permission of hadoop directory as sant@ULTP-453:~$ chmod –R 777 /usr/local/hadoop  Open ‘hadoop-env.sh’ located in ‘/usr/local/hadoop/etc/hadoop/’ to set JAVA path export JAVA_HOME=/usr/local/java Installing hadoop 2.7.2
  • 27. Changes in Configuration file….  etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>  etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
  • 29. Start hadoop in Psudo mode  Format NameNode $ bin/hdfs namenode –format
  • 30.  Starting all demons $ sh start-all.sh OR $ sbin/start-dfs.sh
  • 32. Stop all services… $ sh stop-all.sh OR $ sbin/stop-dfs.sh