SlideShare a Scribd company logo
Presented By
 Framework for running applications on large clusters of commodity
hardware
 Scale: petabytes of data on thousands of nodes
 Include
 Storage: HDFS
 Processing: MapReduce
 Support the Map/Reduce programming model
 Requirements
 Economy: use cluster of comodity computers
 Easy to use
 Users: no need to deal with the complexity of distributed computing
 Reliable: can handle node failures automatically
http://www.kellytechno.com
 Implemented in Java
 Apache Top Level Project
 http://hadoop.apache.org/core/
 Core (15 Committers)
 HDFS
 MapReduce
 Community of contributors is growing
 Though mostly Yahoo for HDFS and MapReduce
 You can contribute too!
http://www.kellytechno.com
 Commodity HW
 Add inexpensive servers
 Storage servers and their disks are not assumed to be highly reliable and
available
 Use replication across servers to deal with unreliable storage/servers
 Metadata-data separation - simple design
 Namenode maintains metadata
 Datanodes manage storage
 Slightly Restricted file semantics
 Focus is mostly sequential access
 Single writers
 No file locking features
 Support for moving computation close to data
 Servers have 2 purposes: data storage and computation
 Single ‘storage + compute’ cluster vs. Separate clusters
http://www.kellytechno.com
Data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Data data data data data
Results
Data data data data
Data data data data
Data data data data
Data data data data
Data data data data
Data data data data
Data data data data
Data data data data
Data data data data
Hadoop Cluster
DFS Block 1
DFS Block 1
DFS Block 2
DFS Block 2
DFS Block 2
DFS Block 1
DFS Block 3
DFS Block 3
DFS Block 3
MAP
MAP
MAP
Reduce
http://www.kellytechno.com
 Data is organized into files and directories
 Files are divided into uniform sized blocks and
distributed across cluster nodes
 Blocks are replicated to handle hardware
failure
 Filesystem keeps checksums of data for
corruption detection and recovery
 HDFS exposes block placement so that
computation can be migrated to data
http://www.kellytechno.com
NameNode(Filename, replicationFactor, block-ids, …)
/users/user1/data/part-0, r:2, {1,3}, …
/users/user1/data/part-1, r:3, {2,4,5}, …
Datanodes
1 1
3
3
2
2
2
4
4
4
55
5
http://www.kellytechno.com
 Master-Slave architecture
 DFS Master “Namenode”
 Manages the filesystem namespace
 Maintain file name to list blocks + location mapping
 Manages block allocation/replication
 Checkpoints namespace and journals namespace changes
for reliability
 Control access to namespace
 DFS Slaves “Datanodes” handle block storage
 Stores blocks using the underlying OS’s files
 Clients access the blocks directly from datanodes
 Periodically sends block reports to Namenode
 Periodically check block integrity
http://www.kellytechno.com
Read
Metadata ops
Client
Metadata (Name, #replicas, …):
/users/foo/data, 3, …
Namenode
Client
Datanodes
Rack 1 Rack 2
Replication
Block ops
Datanodes
Write
Blocks
http://www.kellytechno.com
 A file’s replication factor can be set per file (default 3)
 Block placement is rack aware
 Guarantee placement on two racks
 1st replica is on local node, 2rd/3rd replicas are on remote rack
 Avoid hot spots: balance I/O traffic
 Writes are pipelined to block replicas
 Minimize bandwidth usage
 Overlapping disk writes and network writes
 Reads are from the nearest replica
 Block under-replication & over-replication is detected by Namenode
 Balancer application rebalances blocks to balance DN utilization
http://www.kellytechno.com
 Scale cluster size
 Scale number of clients
 Scale namespace size (total number of files,
amount of data)
 Possible solutions
 Multiple namenodes
 Read-only secondary namenode
 Separate cluster management and namespace management
 Dynamic Partition namespace
 Mounting
http://www.kellytechno.com
 Map/Reduce is a programming model for efficient
distributed computing
 It works like a Unix pipeline:
 cat input | grep | sort
| uniq -c | cat > output
 Input | Map | Shuffle & Sort | Reduce |
Output
 A simple model but good for a lot of applications
 Log processing
 Web index building
http://www.kellytechno.com
http://www.kellytechno.com
 Mapper
 Input: value: lines of text of input
 Output: key: word, value: 1
 Reducer
 Input: key: word, value: set of counts
 Output: key: word, value: sum
 Launching program
 Defines the job
 Submits job to cluster
http://www.kellytechno.com
 Fine grained Map and Reduce tasks
 Improved load balancing
 Faster recovery from failed tasks
 Automatic re-execution on failure
 In a large cluster, some nodes are always slow or flaky
 Framework re-executes failed tasks
 Locality optimizations
 With large data, bandwidth to data is a problem
 Map-Reduce + HDFS is a very effective solution
 Map-Reduce queries HDFS for locations of input data
 Map tasks are scheduled close to the inputs when
possible
http://www.kellytechno.com
• Hadoop Wiki
– Introduction
• http://hadoop.apache.org/core/
– Getting Started
• http://wiki.apache.org/hadoop/GettingStartedWithHadoop
– Map/Reduce Overview
• http://wiki.apache.org/hadoop/HadoopMapReduce
– DFS
• http://hadoop.apache.org/core/docs/current/hdfs_design.html
• Javadoc
– http://hadoop.apache.org/core/docs/current/api/index.html
http://www.kellytechno.com
Thank you!
http://www.kellytechno.com
Presented
By

More Related Content

What's hot

2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
databloginfo
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
vinayiqbusiness
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
Aisha Siddiqa
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
Abhishek Mukherjee
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
IIIT-H
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemAnand Kulkarni
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Rutvik Bapat
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
Jazan University
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
Vibrant Technologies & Computers
 
Snapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemSnapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File System
Bhavesh Padharia
 
Cppt
CpptCppt
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
JigsawAcademy2014
 
Sector Vs Hadoop
Sector Vs HadoopSector Vs Hadoop
Sector Vs Hadoop
lilyco
 
Hadoop
HadoopHadoop
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemVaibhav Jain
 

What's hot (18)

Unit 1
Unit 1Unit 1
Unit 1
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Hadoop ppt2
Hadoop ppt2Hadoop ppt2
Hadoop ppt2
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
 
Snapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemSnapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File System
 
Cppt
CpptCppt
Cppt
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Sector Vs Hadoop
Sector Vs HadoopSector Vs Hadoop
Sector Vs Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 

Viewers also liked

RewardTrax Overview
RewardTrax OverviewRewardTrax Overview
RewardTrax Overview
joshuathomasbrown
 
Bab1
Bab1Bab1
Guía ahorro energetico
Guía ahorro energeticoGuía ahorro energetico
Guía ahorro energetico
AMPAGoya
 
Debere prezi
Debere preziDebere prezi
Spain E Book
Spain E BookSpain E Book
Spain E Book
Aatish Raj
 
Importancia de la internet en la educacion
Importancia de la internet en la educacionImportancia de la internet en la educacion
Importancia de la internet en la educacion
Fernando Marmolejos Mesa
 
Thesis Presentation01
Thesis Presentation01Thesis Presentation01
Thesis Presentation01
Oylum Boran
 
Trabajo grupal de fisica
Trabajo grupal de fisicaTrabajo grupal de fisica
Trabajo grupal de fisica
mishel villegas
 
Protocolos
ProtocolosProtocolos
Bases + inscripciones concurso yo no lo compro ygualarte 2017
Bases + inscripciones concurso yo no lo compro ygualarte 2017Bases + inscripciones concurso yo no lo compro ygualarte 2017
Bases + inscripciones concurso yo no lo compro ygualarte 2017
AMPAGoya
 
Yoga Meditation in 9 steps
Yoga Meditation in 9 stepsYoga Meditation in 9 steps
Yoga Meditation in 9 steps
Dada Rainjitananda
 
Generation of electricity_from_coal_&_dts
Generation of electricity_from_coal_&_dtsGeneration of electricity_from_coal_&_dts
Generation of electricity_from_coal_&_dts
Kamal Kaushik
 
Contes bojos presentacio power
Contes bojos presentacio powerContes bojos presentacio power
Contes bojos presentacio power
orcatac
 
Research Method for Business chapter 6
Research Method for Business chapter  6Research Method for Business chapter  6
Research Method for Business chapter 6
Mazhar Poohlah
 
Lo scenario del loyalty marketing: retail e industria a confronto (C.Ziliani)
Lo scenario del loyalty marketing: retail e industria a confronto (C.Ziliani)Lo scenario del loyalty marketing: retail e industria a confronto (C.Ziliani)
Lo scenario del loyalty marketing: retail e industria a confronto (C.Ziliani)
Osservatorio Fedeltà Università di Parma
 
Virtual Reality & Marketing Part 3 of 3
Virtual Reality &  Marketing Part 3 of 3Virtual Reality &  Marketing Part 3 of 3
Virtual Reality & Marketing Part 3 of 3
Jennifer Irene Guevara
 
Research Method for Business chapter 3
Research Method for Business chapter 3Research Method for Business chapter 3
Research Method for Business chapter 3
Mazhar Poohlah
 
Overview of-device-regulation-fda-san-diego-ca
Overview of-device-regulation-fda-san-diego-caOverview of-device-regulation-fda-san-diego-ca
Overview of-device-regulation-fda-san-diego-ca
GlobalCompliancePanel
 

Viewers also liked (20)

RewardTrax Overview
RewardTrax OverviewRewardTrax Overview
RewardTrax Overview
 
Bab1
Bab1Bab1
Bab1
 
CartaBAT
CartaBATCartaBAT
CartaBAT
 
Guía ahorro energetico
Guía ahorro energeticoGuía ahorro energetico
Guía ahorro energetico
 
Debere prezi
Debere preziDebere prezi
Debere prezi
 
Spain E Book
Spain E BookSpain E Book
Spain E Book
 
Importancia de la internet en la educacion
Importancia de la internet en la educacionImportancia de la internet en la educacion
Importancia de la internet en la educacion
 
Furacões
FuracõesFuracões
Furacões
 
Thesis Presentation01
Thesis Presentation01Thesis Presentation01
Thesis Presentation01
 
Trabajo grupal de fisica
Trabajo grupal de fisicaTrabajo grupal de fisica
Trabajo grupal de fisica
 
Protocolos
ProtocolosProtocolos
Protocolos
 
Bases + inscripciones concurso yo no lo compro ygualarte 2017
Bases + inscripciones concurso yo no lo compro ygualarte 2017Bases + inscripciones concurso yo no lo compro ygualarte 2017
Bases + inscripciones concurso yo no lo compro ygualarte 2017
 
Yoga Meditation in 9 steps
Yoga Meditation in 9 stepsYoga Meditation in 9 steps
Yoga Meditation in 9 steps
 
Generation of electricity_from_coal_&_dts
Generation of electricity_from_coal_&_dtsGeneration of electricity_from_coal_&_dts
Generation of electricity_from_coal_&_dts
 
Contes bojos presentacio power
Contes bojos presentacio powerContes bojos presentacio power
Contes bojos presentacio power
 
Research Method for Business chapter 6
Research Method for Business chapter  6Research Method for Business chapter  6
Research Method for Business chapter 6
 
Lo scenario del loyalty marketing: retail e industria a confronto (C.Ziliani)
Lo scenario del loyalty marketing: retail e industria a confronto (C.Ziliani)Lo scenario del loyalty marketing: retail e industria a confronto (C.Ziliani)
Lo scenario del loyalty marketing: retail e industria a confronto (C.Ziliani)
 
Virtual Reality & Marketing Part 3 of 3
Virtual Reality &  Marketing Part 3 of 3Virtual Reality &  Marketing Part 3 of 3
Virtual Reality & Marketing Part 3 of 3
 
Research Method for Business chapter 3
Research Method for Business chapter 3Research Method for Business chapter 3
Research Method for Business chapter 3
 
Overview of-device-regulation-fda-san-diego-ca
Overview of-device-regulation-fda-san-diego-caOverview of-device-regulation-fda-san-diego-ca
Overview of-device-regulation-fda-san-diego-ca
 

Similar to Hadoop training in bangalore-kellytechnologies

Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
shrey mehrotra
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
Sunil D Patil
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hive
srikanthhadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
HADOOP
HADOOPHADOOP
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
harithakannan
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Hadoop
HadoopHadoop
Hadoop
Dinakar nk
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbai
Unmesh Baile
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbai
Unmesh Baile
 
HadoopDB
HadoopDBHadoopDB
HadoopDB
Miguel Pastor
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
ssuserec53e73
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
datastack
 
assignment3
assignment3assignment3
assignment3Kirti J
 

Similar to Hadoop training in bangalore-kellytechnologies (20)

Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hive
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
HADOOP
HADOOPHADOOP
HADOOP
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbai
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbai
 
HadoopDB
HadoopDBHadoopDB
HadoopDB
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
assignment3
assignment3assignment3
assignment3
 

Recently uploaded

Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 

Recently uploaded (20)

Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 

Hadoop training in bangalore-kellytechnologies

  • 2.  Framework for running applications on large clusters of commodity hardware  Scale: petabytes of data on thousands of nodes  Include  Storage: HDFS  Processing: MapReduce  Support the Map/Reduce programming model  Requirements  Economy: use cluster of comodity computers  Easy to use  Users: no need to deal with the complexity of distributed computing  Reliable: can handle node failures automatically http://www.kellytechno.com
  • 3.  Implemented in Java  Apache Top Level Project  http://hadoop.apache.org/core/  Core (15 Committers)  HDFS  MapReduce  Community of contributors is growing  Though mostly Yahoo for HDFS and MapReduce  You can contribute too! http://www.kellytechno.com
  • 4.  Commodity HW  Add inexpensive servers  Storage servers and their disks are not assumed to be highly reliable and available  Use replication across servers to deal with unreliable storage/servers  Metadata-data separation - simple design  Namenode maintains metadata  Datanodes manage storage  Slightly Restricted file semantics  Focus is mostly sequential access  Single writers  No file locking features  Support for moving computation close to data  Servers have 2 purposes: data storage and computation  Single ‘storage + compute’ cluster vs. Separate clusters http://www.kellytechno.com
  • 5. Data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Data data data data data Results Data data data data Data data data data Data data data data Data data data data Data data data data Data data data data Data data data data Data data data data Data data data data Hadoop Cluster DFS Block 1 DFS Block 1 DFS Block 2 DFS Block 2 DFS Block 2 DFS Block 1 DFS Block 3 DFS Block 3 DFS Block 3 MAP MAP MAP Reduce http://www.kellytechno.com
  • 6.  Data is organized into files and directories  Files are divided into uniform sized blocks and distributed across cluster nodes  Blocks are replicated to handle hardware failure  Filesystem keeps checksums of data for corruption detection and recovery  HDFS exposes block placement so that computation can be migrated to data http://www.kellytechno.com
  • 7. NameNode(Filename, replicationFactor, block-ids, …) /users/user1/data/part-0, r:2, {1,3}, … /users/user1/data/part-1, r:3, {2,4,5}, … Datanodes 1 1 3 3 2 2 2 4 4 4 55 5 http://www.kellytechno.com
  • 8.  Master-Slave architecture  DFS Master “Namenode”  Manages the filesystem namespace  Maintain file name to list blocks + location mapping  Manages block allocation/replication  Checkpoints namespace and journals namespace changes for reliability  Control access to namespace  DFS Slaves “Datanodes” handle block storage  Stores blocks using the underlying OS’s files  Clients access the blocks directly from datanodes  Periodically sends block reports to Namenode  Periodically check block integrity http://www.kellytechno.com
  • 9. Read Metadata ops Client Metadata (Name, #replicas, …): /users/foo/data, 3, … Namenode Client Datanodes Rack 1 Rack 2 Replication Block ops Datanodes Write Blocks http://www.kellytechno.com
  • 10.  A file’s replication factor can be set per file (default 3)  Block placement is rack aware  Guarantee placement on two racks  1st replica is on local node, 2rd/3rd replicas are on remote rack  Avoid hot spots: balance I/O traffic  Writes are pipelined to block replicas  Minimize bandwidth usage  Overlapping disk writes and network writes  Reads are from the nearest replica  Block under-replication & over-replication is detected by Namenode  Balancer application rebalances blocks to balance DN utilization http://www.kellytechno.com
  • 11.  Scale cluster size  Scale number of clients  Scale namespace size (total number of files, amount of data)  Possible solutions  Multiple namenodes  Read-only secondary namenode  Separate cluster management and namespace management  Dynamic Partition namespace  Mounting http://www.kellytechno.com
  • 12.  Map/Reduce is a programming model for efficient distributed computing  It works like a Unix pipeline:  cat input | grep | sort | uniq -c | cat > output  Input | Map | Shuffle & Sort | Reduce | Output  A simple model but good for a lot of applications  Log processing  Web index building http://www.kellytechno.com
  • 14.  Mapper  Input: value: lines of text of input  Output: key: word, value: 1  Reducer  Input: key: word, value: set of counts  Output: key: word, value: sum  Launching program  Defines the job  Submits job to cluster http://www.kellytechno.com
  • 15.  Fine grained Map and Reduce tasks  Improved load balancing  Faster recovery from failed tasks  Automatic re-execution on failure  In a large cluster, some nodes are always slow or flaky  Framework re-executes failed tasks  Locality optimizations  With large data, bandwidth to data is a problem  Map-Reduce + HDFS is a very effective solution  Map-Reduce queries HDFS for locations of input data  Map tasks are scheduled close to the inputs when possible http://www.kellytechno.com
  • 16. • Hadoop Wiki – Introduction • http://hadoop.apache.org/core/ – Getting Started • http://wiki.apache.org/hadoop/GettingStartedWithHadoop – Map/Reduce Overview • http://wiki.apache.org/hadoop/HadoopMapReduce – DFS • http://hadoop.apache.org/core/docs/current/hdfs_design.html • Javadoc – http://hadoop.apache.org/core/docs/current/api/index.html http://www.kellytechno.com

Editor's Notes

  1. 60M objects on 16G machine (e.g. 20M files with 2 blocks each)