SlideShare a Scribd company logo
1 of 15
AN INTRODUCTION
TO
HADOOP
SudarshanPant
MOKPONATIONALUNIVERSITY
2017.06.01
HADOOP
An open-sourceimplementationof frameworks for reliable,scalable,
distributedcomputingand data storage.
A flexibleand highly-availablearchitecturefor large scalecomputation
and dataprocessing on a networkof commodity hardware.
Redundantand reliable.
Batch processing centric.
HADOOP
2003
2004
2006
2005: Doug Cutting and Michael J. Cafarella developed Hadoop with funding from
Yahoo.
2006: Yahoo gave the project to Apache Software Foundation.
Google File system
Google MapReduce
BigTable
HADOOP
Hadoopis a system for large scale data processing.
It has two components
HDFS – Hadoop distributed file system(for storage)
 Distributed across nodes
 Natively redundant
Map Reduce(forprocessing)
 Splits a taskacross processors
 “near” the data and assembles results
 Clustered storage
Machine
HDFS
MapReduce
HADOOP
 The Mapreduce server on a typical machineis called TaskTracker
 The HDFSserver on a typical machineis called a DataNode
Machine
DataNode
TaskTracker
Machine
DataNode
TaskTracker
Machine
DataNode
TaskTracker
HADOOP
Machine
DataNode
TaskTracker
Machine
DataNode
TaskTracker
Machine
DataNode
TaskTracker
JobTracker
JOBTRACKER
 JobTracker keeps track of jobs being run.
 Keeps track of the nodes bydetectingthe nodefailureand reassigning
the taskto anothernode
 NameNodekeeps informationon data location
 Keeps noticesthenode failureand automaticallyre-replicate thedata
NAME NODE
Machine
DataNode
TaskTracker
Machine
DataNode
TaskTracker
Machine
DataNode
TaskTracker
NameNode
Pig
Hive
MapReduce
HDFS
HBase
Hadoop System
ZOOKEEPER
HADOOP ECOSYSTEM
Pig : High level languagethattranslatesdown into MapReduce Job.
Allows to write job description in highlevel languageinsteadof writing all
MapReduce job code in Java.
Hive: SQL likeinterface for computations. TakesSQL-like language and converts
it intoMapReduce Jobs
Hbase: provides simple interface to distribute datathat allowsthe realtime data
availabilityto the data.
ZooKeeper: provides scalable, highly availablecoordination service for servers
by keeping the metadata about different servers.
HADOOP ECOSYSTEM
 Mahout– for machinelearning
 Sqoop – helps tomove datato and from RelationationalSQL
databases.
 Oozie - workflow coordinationtool (definingscheduledtasks)
 Flume–for importingstreamingdatato Hadoop
 Protobuff,Avro, Thrift- for data serialization
OTHER USEFUL TOOLS
Client NameNode
 Allocationofblockstofile
 MonitoringdataNodeforfailure
 NewNodeAddition
 ReplicationManagement
 ManagingUser Requests(read/write)
DataNode1 DataNode2 DataNode3 DataNoden
Metadata operations
Read/Write
Block
NameNode
Commands
 Write/Readblocks to/from Local File
 Perform operation as directedbynameNode
 Register/Heartbeat itself with namenodeand provideblock report to name
node
FUNCTIONOFHDFSCONPONENTS
File A
70MB
File B
8MB
BlockA
Allocated64MB
Consumed64MB
SpaceLeft0MB
BlockB
Allocated64MB
Consumed8MB
SpaceLeft56MB
BlockC
Allocated64MB
Consumed8MB
SpaceLeft56MB
 Blocks arecreatedbasedonpredefined block size (default : 64MB)
 Blocks arethen replicated as perthe defined Replication Factor (default : 3)
 1 file –1 block rule :Oneblock does not contain the contents from two different files.
FILEHANDLING
 Parallel programming modelfor largeclusters
 Co-Designed, Co-developed andCo-deployed with HDFS
 Processing takes place where data is located.
 Conducted in two phases
 Map
 Reduce
Map: Processakey/valuepairtogenerateintermediatekey/valuepairs.
Reduce:Mergeall intermediatevaluesassociatedwiththesamekey.
Input
Data
Map
(Thread 1)
Map
(Thread P)
Sort
Reduce
(Thread 1)
Reduce
(Thread Q)
Output
DataSplit Split
Merge
MAPREDUCE
TheDriverconfigures how the program will run in Hadoop
TheMapperis fed input data from Hadoopand produces intermediate output
TheReduceris fed the intermediate output from the mappers
and produces the final output.
When data moves between the mappers andreducers,
it might cross the network so all the values mustbeserializable.
STRUCTUREOF
MAPREDUCEPROGRAM

More Related Content

What's hot

HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemSteve Loughran
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an exampleNikita Kesharwani
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfsdatabloginfo
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file systemAnshul Bhatnagar
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of HadoopKnoldus Inc.
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basicHafizur Rahman
 
Simplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopSimplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopGetInData
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemVaibhav Jain
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Apache Hadoop an Introduction - Todd Lipcon - Gluecon 2010
Apache Hadoop an Introduction - Todd Lipcon - Gluecon 2010Apache Hadoop an Introduction - Todd Lipcon - Gluecon 2010
Apache Hadoop an Introduction - Todd Lipcon - Gluecon 2010Cloudera, Inc.
 
report on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hivereport on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hivesiddharthboora
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache HadoopSufi Nawaz
 

What's hot (20)

HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an example
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basic
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Simplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopSimplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in Hadoop
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Apache Hadoop an Introduction - Todd Lipcon - Gluecon 2010
Apache Hadoop an Introduction - Todd Lipcon - Gluecon 2010Apache Hadoop an Introduction - Todd Lipcon - Gluecon 2010
Apache Hadoop an Introduction - Todd Lipcon - Gluecon 2010
 
report on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hivereport on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hive
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 

Similar to Introduction to Hadoop

Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...AyeeshaParveen
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...AyeeshaParveen
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoopManoj Jangalva
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxUttara University
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxAltafKhadim
 

Similar to Introduction to Hadoop (20)

Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Unit 1
Unit 1Unit 1
Unit 1
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Big data
Big dataBig data
Big data
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 

Recently uploaded

一比一原版(NYU毕业证书)美国纽约大学毕业证学位证书
一比一原版(NYU毕业证书)美国纽约大学毕业证学位证书一比一原版(NYU毕业证书)美国纽约大学毕业证学位证书
一比一原版(NYU毕业证书)美国纽约大学毕业证学位证书c6eb683559b3
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样ayvbos
 
一比一原版美国北卡罗莱纳大学毕业证如何办理
一比一原版美国北卡罗莱纳大学毕业证如何办理一比一原版美国北卡罗莱纳大学毕业证如何办理
一比一原版美国北卡罗莱纳大学毕业证如何办理A
 
一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样
一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样
一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样ayvbos
 
Lowongan Kerja LC Yogyakarta Terbaru 085746015303
Lowongan Kerja LC Yogyakarta Terbaru 085746015303Lowongan Kerja LC Yogyakarta Terbaru 085746015303
Lowongan Kerja LC Yogyakarta Terbaru 085746015303Dewi Agency
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样ayvbos
 
一比一原版帝国理工学院毕业证如何办理
一比一原版帝国理工学院毕业证如何办理一比一原版帝国理工学院毕业证如何办理
一比一原版帝国理工学院毕业证如何办理F
 
APNIC Policy Roundup presented by Sunny Chendi at TWNOG 5.0
APNIC Policy Roundup presented by Sunny Chendi at TWNOG 5.0APNIC Policy Roundup presented by Sunny Chendi at TWNOG 5.0
APNIC Policy Roundup presented by Sunny Chendi at TWNOG 5.0APNIC
 
Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...
Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...
Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...mikehavy0
 
一比一原版澳大利亚迪肯大学毕业证如何办理
一比一原版澳大利亚迪肯大学毕业证如何办理一比一原版澳大利亚迪肯大学毕业证如何办理
一比一原版澳大利亚迪肯大学毕业证如何办理SS
 
一比一原版(Dundee毕业证书)英国爱丁堡龙比亚大学毕业证如何办理
一比一原版(Dundee毕业证书)英国爱丁堡龙比亚大学毕业证如何办理一比一原版(Dundee毕业证书)英国爱丁堡龙比亚大学毕业证如何办理
一比一原版(Dundee毕业证书)英国爱丁堡龙比亚大学毕业证如何办理AS
 
[Hackersuli] Élő szövet a fémvázon: Python és gépi tanulás a Zeek platformon
[Hackersuli] Élő szövet a fémvázon: Python és gépi tanulás a Zeek platformon[Hackersuli] Élő szövet a fémvázon: Python és gépi tanulás a Zeek platformon
[Hackersuli] Élő szövet a fémvázon: Python és gépi tanulás a Zeek platformonhackersuli
 
Down bad crying at the gym t shirtsDown bad crying at the gym t shirts
Down bad crying at the gym t shirtsDown bad crying at the gym t shirtsDown bad crying at the gym t shirtsDown bad crying at the gym t shirts
Down bad crying at the gym t shirtsDown bad crying at the gym t shirtsrahman018755
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 
一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样
一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样
一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样AS
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrHenryBriggs2
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdfMatthew Sinclair
 
Loker Pemandu Lagu LC Semarang 085746015303
Loker Pemandu Lagu LC Semarang 085746015303Loker Pemandu Lagu LC Semarang 085746015303
Loker Pemandu Lagu LC Semarang 085746015303Dewi Agency
 
一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理
一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理
一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理apekaom
 
原版定制英国赫瑞瓦特大学毕业证原件一模一样
原版定制英国赫瑞瓦特大学毕业证原件一模一样原版定制英国赫瑞瓦特大学毕业证原件一模一样
原版定制英国赫瑞瓦特大学毕业证原件一模一样AS
 

Recently uploaded (20)

一比一原版(NYU毕业证书)美国纽约大学毕业证学位证书
一比一原版(NYU毕业证书)美国纽约大学毕业证学位证书一比一原版(NYU毕业证书)美国纽约大学毕业证学位证书
一比一原版(NYU毕业证书)美国纽约大学毕业证学位证书
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
一比一原版美国北卡罗莱纳大学毕业证如何办理
一比一原版美国北卡罗莱纳大学毕业证如何办理一比一原版美国北卡罗莱纳大学毕业证如何办理
一比一原版美国北卡罗莱纳大学毕业证如何办理
 
一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样
一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样
一比一原版(USYD毕业证书)悉尼大学毕业证原件一模一样
 
Lowongan Kerja LC Yogyakarta Terbaru 085746015303
Lowongan Kerja LC Yogyakarta Terbaru 085746015303Lowongan Kerja LC Yogyakarta Terbaru 085746015303
Lowongan Kerja LC Yogyakarta Terbaru 085746015303
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
一比一原版帝国理工学院毕业证如何办理
一比一原版帝国理工学院毕业证如何办理一比一原版帝国理工学院毕业证如何办理
一比一原版帝国理工学院毕业证如何办理
 
APNIC Policy Roundup presented by Sunny Chendi at TWNOG 5.0
APNIC Policy Roundup presented by Sunny Chendi at TWNOG 5.0APNIC Policy Roundup presented by Sunny Chendi at TWNOG 5.0
APNIC Policy Roundup presented by Sunny Chendi at TWNOG 5.0
 
Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...
Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...
Abortion Clinic in Germiston +27791653574 WhatsApp Abortion Clinic Services i...
 
一比一原版澳大利亚迪肯大学毕业证如何办理
一比一原版澳大利亚迪肯大学毕业证如何办理一比一原版澳大利亚迪肯大学毕业证如何办理
一比一原版澳大利亚迪肯大学毕业证如何办理
 
一比一原版(Dundee毕业证书)英国爱丁堡龙比亚大学毕业证如何办理
一比一原版(Dundee毕业证书)英国爱丁堡龙比亚大学毕业证如何办理一比一原版(Dundee毕业证书)英国爱丁堡龙比亚大学毕业证如何办理
一比一原版(Dundee毕业证书)英国爱丁堡龙比亚大学毕业证如何办理
 
[Hackersuli] Élő szövet a fémvázon: Python és gépi tanulás a Zeek platformon
[Hackersuli] Élő szövet a fémvázon: Python és gépi tanulás a Zeek platformon[Hackersuli] Élő szövet a fémvázon: Python és gépi tanulás a Zeek platformon
[Hackersuli] Élő szövet a fémvázon: Python és gépi tanulás a Zeek platformon
 
Down bad crying at the gym t shirtsDown bad crying at the gym t shirts
Down bad crying at the gym t shirtsDown bad crying at the gym t shirtsDown bad crying at the gym t shirtsDown bad crying at the gym t shirts
Down bad crying at the gym t shirtsDown bad crying at the gym t shirts
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样
一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样
一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
Loker Pemandu Lagu LC Semarang 085746015303
Loker Pemandu Lagu LC Semarang 085746015303Loker Pemandu Lagu LC Semarang 085746015303
Loker Pemandu Lagu LC Semarang 085746015303
 
一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理
一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理
一比一原版桑佛德大学毕业证成绩单申请学校Offer快速办理
 
原版定制英国赫瑞瓦特大学毕业证原件一模一样
原版定制英国赫瑞瓦特大学毕业证原件一模一样原版定制英国赫瑞瓦特大学毕业证原件一模一样
原版定制英国赫瑞瓦特大学毕业证原件一模一样
 

Introduction to Hadoop

  • 2. HADOOP An open-sourceimplementationof frameworks for reliable,scalable, distributedcomputingand data storage. A flexibleand highly-availablearchitecturefor large scalecomputation and dataprocessing on a networkof commodity hardware. Redundantand reliable. Batch processing centric.
  • 4. 2005: Doug Cutting and Michael J. Cafarella developed Hadoop with funding from Yahoo. 2006: Yahoo gave the project to Apache Software Foundation. Google File system Google MapReduce BigTable HADOOP
  • 5. Hadoopis a system for large scale data processing. It has two components HDFS – Hadoop distributed file system(for storage)  Distributed across nodes  Natively redundant Map Reduce(forprocessing)  Splits a taskacross processors  “near” the data and assembles results  Clustered storage Machine HDFS MapReduce HADOOP
  • 6.  The Mapreduce server on a typical machineis called TaskTracker  The HDFSserver on a typical machineis called a DataNode Machine DataNode TaskTracker Machine DataNode TaskTracker Machine DataNode TaskTracker HADOOP
  • 7. Machine DataNode TaskTracker Machine DataNode TaskTracker Machine DataNode TaskTracker JobTracker JOBTRACKER  JobTracker keeps track of jobs being run.  Keeps track of the nodes bydetectingthe nodefailureand reassigning the taskto anothernode
  • 8.  NameNodekeeps informationon data location  Keeps noticesthenode failureand automaticallyre-replicate thedata NAME NODE Machine DataNode TaskTracker Machine DataNode TaskTracker Machine DataNode TaskTracker NameNode
  • 10. Pig : High level languagethattranslatesdown into MapReduce Job. Allows to write job description in highlevel languageinsteadof writing all MapReduce job code in Java. Hive: SQL likeinterface for computations. TakesSQL-like language and converts it intoMapReduce Jobs Hbase: provides simple interface to distribute datathat allowsthe realtime data availabilityto the data. ZooKeeper: provides scalable, highly availablecoordination service for servers by keeping the metadata about different servers. HADOOP ECOSYSTEM
  • 11.  Mahout– for machinelearning  Sqoop – helps tomove datato and from RelationationalSQL databases.  Oozie - workflow coordinationtool (definingscheduledtasks)  Flume–for importingstreamingdatato Hadoop  Protobuff,Avro, Thrift- for data serialization OTHER USEFUL TOOLS
  • 12. Client NameNode  Allocationofblockstofile  MonitoringdataNodeforfailure  NewNodeAddition  ReplicationManagement  ManagingUser Requests(read/write) DataNode1 DataNode2 DataNode3 DataNoden Metadata operations Read/Write Block NameNode Commands  Write/Readblocks to/from Local File  Perform operation as directedbynameNode  Register/Heartbeat itself with namenodeand provideblock report to name node FUNCTIONOFHDFSCONPONENTS
  • 13. File A 70MB File B 8MB BlockA Allocated64MB Consumed64MB SpaceLeft0MB BlockB Allocated64MB Consumed8MB SpaceLeft56MB BlockC Allocated64MB Consumed8MB SpaceLeft56MB  Blocks arecreatedbasedonpredefined block size (default : 64MB)  Blocks arethen replicated as perthe defined Replication Factor (default : 3)  1 file –1 block rule :Oneblock does not contain the contents from two different files. FILEHANDLING
  • 14.  Parallel programming modelfor largeclusters  Co-Designed, Co-developed andCo-deployed with HDFS  Processing takes place where data is located.  Conducted in two phases  Map  Reduce Map: Processakey/valuepairtogenerateintermediatekey/valuepairs. Reduce:Mergeall intermediatevaluesassociatedwiththesamekey. Input Data Map (Thread 1) Map (Thread P) Sort Reduce (Thread 1) Reduce (Thread Q) Output DataSplit Split Merge MAPREDUCE
  • 15. TheDriverconfigures how the program will run in Hadoop TheMapperis fed input data from Hadoopand produces intermediate output TheReduceris fed the intermediate output from the mappers and produces the final output. When data moves between the mappers andreducers, it might cross the network so all the values mustbeserializable. STRUCTUREOF MAPREDUCEPROGRAM