SlideShare a Scribd company logo
1 of 40
HDFS and MapReduce
Derek Chen
2018.09.17
Profile
Image
Derek Chen
Server Side Engineer, GLN
LINE Corporation
Agenda
• Big Data
• Hadoop
• HDFS
• MapReduce
Big Data
Growth of Big Data
90%
OF THE
WORLD‘S DATA
WAS CREATED IN
THE LAST
2
YEARS
50X
GROWTH
FROM
2010
TO 2020
Who Generates Big Data?
Have you ever wondered how Google, Facebook, or LinkedIn manages to store and utilize the huge data?
Today, it is becoming a problem for all of us to manage such BIG DATA…
Hadoop
Hadoop History
What is Hadoop?
• The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers
using simple programming models
• It is designed to scale out from single servers to thousands of machines,
each offering local computation and storage
• Rather than rely on single server, the library itself is designed to detect
and handle failures at the application layer, so delivering a high-available
service on top of a cluster of computers, each of which may be prone to
failures
Hadoop and Its Characteristics
• It is an open source data management technology with scale-
out storage and distributed processing
Hadoop Ecosystem
HDFS
What Is HDFS
• Hadoop Distributed File System
• Stores files in blocks across many nodes in a cluster
• Replicates the blocks across nodes for durability
• Master/Slave architecture
HDFS Master
• Name Node
–Runs on a single node as a master process
–Holds file metadata (which blocks are where)
–Directs client access to files in HDFS
–Receives heartbeat and block report from all data nodes
• Secondary Name Node
–Not a hot failover
–Maintains a copy of the Name Node metadata
HDFS Slaves
• Data Node
–Generally runs on all nodes in the cluster
–Block creation / replication / deletion / reads
–Takes orders from the Name Node
HDFS Architecture
HDFS File Read
HDFS File Write
Replication and Rack Awareness
Design of HDFS
• HDFS is designed for storing very large files with streaming data access
patterns, running on clusters of commodity hardware
• Very Large Files
–Files in the sizes of hundreds of gigabytes (GB) or terabytes (TB)
• Streaming Data Access
–HDFS is built around the idea that most efficient data processing pattern is a write-
once, read-many-times pattern. A dataset is typically generated or coped from source,
and then various analyses are performed on that dataset over time
• Commodity Hardware
–Commonly available hardware which is cheap (not enterprise level)
When Not to Use HDFS
• When you need low-latency access to data
–Applications that require faster data access will not work well with HDFS. HDFS is
optimized for high throughput of data
• When you have lots of small files
–Name node holds the filesystem metadata in memory. Every block, file, directory
occupies around 150 bytes. Hence having large number of small files will cause
burden on the name node
• When random writes in the file is needed
–Files in the HDFS may be written to by a single writer. Writers are always made at the
end of file in append-only fashion. There is no support for multiple writers or for
modifications at random offsets (positions)
HDFS Shell
• Easy to use command line interface
• Create, copy, move, and delete files
• Administrative duties – chmod, chown, chgrp
• Set replication factor for a file
• Head, tail, cat to view files
MapReduce
MapReduce Scenario
Let us consider a real life scenario to understand the importance of “Map Reduce” in hadoop
MapReduce Scenario
Similarly
MapReduce Components
• Job Tracker
–Coordinates all the jobs run on the system by scheduling tasks
–Keeps a record of overall progress of each job
–If a jobs fails, reschedules the job on a different task tracker
• Task Tracker
–Slave daemon which accepts tasks to be run a block of data
–Sends progress reports as heart beat signals to the job tracker at
regular intervals
MapReduce Job
Let’s look at how MapReduce actually works in
Hadoop, using Word Count
PageRank
PageRank in MapReduce
Reference
• Slides
–HDFS & MapReduce
–Intro to HDFS and MapReduce
–Hadoop and HDFS
• Articles
–Apache Hadoop Official Site
–HDFS Tutorial: Introduction to HDFS & its Features
Any Questions?

More Related Content

What's hot

Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSBouquet
 
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...AyeeshaParveen
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadooplarsgeorge
 
Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2TarjeiRomtveit
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 

What's hot (20)

Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Hadoop – big deal
Hadoop – big dealHadoop – big deal
Hadoop – big deal
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
 
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 

Similar to Introduction to HDFS and MapReduce

Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyJay Nagar
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFSKavyaGo
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
Hadoop, Map Reduce and Apache Pig tutorial
Hadoop, Map Reduce and Apache Pig tutorialHadoop, Map Reduce and Apache Pig tutorial
Hadoop, Map Reduce and Apache Pig tutorialPranamesh Chakraborty
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
Lecture10_CloudServicesModel_MapReduceHDFS.pptx
Lecture10_CloudServicesModel_MapReduceHDFS.pptxLecture10_CloudServicesModel_MapReduceHDFS.pptx
Lecture10_CloudServicesModel_MapReduceHDFS.pptxNIKHILGR3
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015 clairvoyantllc
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete informationbhargavi804095
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 

Similar to Introduction to HDFS and MapReduce (20)

Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Hadoop, Map Reduce and Apache Pig tutorial
Hadoop, Map Reduce and Apache Pig tutorialHadoop, Map Reduce and Apache Pig tutorial
Hadoop, Map Reduce and Apache Pig tutorial
 
Hadoop
HadoopHadoop
Hadoop
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Hadoop
HadoopHadoop
Hadoop
 
Lecture10_CloudServicesModel_MapReduceHDFS.pptx
Lecture10_CloudServicesModel_MapReduceHDFS.pptxLecture10_CloudServicesModel_MapReduceHDFS.pptx
Lecture10_CloudServicesModel_MapReduceHDFS.pptx
 
Presentation
PresentationPresentation
Presentation
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015
 
Anju
AnjuAnju
Anju
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 

Recently uploaded

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 

Introduction to HDFS and MapReduce

  • 1. HDFS and MapReduce Derek Chen 2018.09.17
  • 2. Profile Image Derek Chen Server Side Engineer, GLN LINE Corporation
  • 3. Agenda • Big Data • Hadoop • HDFS • MapReduce
  • 5. Growth of Big Data 90% OF THE WORLD‘S DATA WAS CREATED IN THE LAST 2 YEARS 50X GROWTH FROM 2010 TO 2020
  • 6. Who Generates Big Data? Have you ever wondered how Google, Facebook, or LinkedIn manages to store and utilize the huge data? Today, it is becoming a problem for all of us to manage such BIG DATA…
  • 9. What is Hadoop? • The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models • It is designed to scale out from single servers to thousands of machines, each offering local computation and storage • Rather than rely on single server, the library itself is designed to detect and handle failures at the application layer, so delivering a high-available service on top of a cluster of computers, each of which may be prone to failures
  • 10. Hadoop and Its Characteristics • It is an open source data management technology with scale- out storage and distributed processing
  • 12. HDFS
  • 13. What Is HDFS • Hadoop Distributed File System • Stores files in blocks across many nodes in a cluster • Replicates the blocks across nodes for durability • Master/Slave architecture
  • 14. HDFS Master • Name Node –Runs on a single node as a master process –Holds file metadata (which blocks are where) –Directs client access to files in HDFS –Receives heartbeat and block report from all data nodes • Secondary Name Node –Not a hot failover –Maintains a copy of the Name Node metadata
  • 15. HDFS Slaves • Data Node –Generally runs on all nodes in the cluster –Block creation / replication / deletion / reads –Takes orders from the Name Node
  • 19. Replication and Rack Awareness
  • 20. Design of HDFS • HDFS is designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware • Very Large Files –Files in the sizes of hundreds of gigabytes (GB) or terabytes (TB) • Streaming Data Access –HDFS is built around the idea that most efficient data processing pattern is a write- once, read-many-times pattern. A dataset is typically generated or coped from source, and then various analyses are performed on that dataset over time • Commodity Hardware –Commonly available hardware which is cheap (not enterprise level)
  • 21. When Not to Use HDFS • When you need low-latency access to data –Applications that require faster data access will not work well with HDFS. HDFS is optimized for high throughput of data • When you have lots of small files –Name node holds the filesystem metadata in memory. Every block, file, directory occupies around 150 bytes. Hence having large number of small files will cause burden on the name node • When random writes in the file is needed –Files in the HDFS may be written to by a single writer. Writers are always made at the end of file in append-only fashion. There is no support for multiple writers or for modifications at random offsets (positions)
  • 22. HDFS Shell • Easy to use command line interface • Create, copy, move, and delete files • Administrative duties – chmod, chown, chgrp • Set replication factor for a file • Head, tail, cat to view files
  • 24. MapReduce Scenario Let us consider a real life scenario to understand the importance of “Map Reduce” in hadoop
  • 26. MapReduce Components • Job Tracker –Coordinates all the jobs run on the system by scheduling tasks –Keeps a record of overall progress of each job –If a jobs fails, reschedules the job on a different task tracker • Task Tracker –Slave daemon which accepts tasks to be run a block of data –Sends progress reports as heart beat signals to the job tracker at regular intervals
  • 28. Let’s look at how MapReduce actually works in Hadoop, using Word Count
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 39. Reference • Slides –HDFS & MapReduce –Intro to HDFS and MapReduce –Hadoop and HDFS • Articles –Apache Hadoop Official Site –HDFS Tutorial: Introduction to HDFS & its Features