SlideShare a Scribd company logo
Md. Hasan Basri
Technology Enthusiast
linkedin.com/in/pothiq
twitter.com/pothiq
pothiq@gmail.com
"The name my kid gave a stuffed yellow
elephant. Short, relatively easy to spell and
pronounce, meaningless and not used
elsewhere: those are my naming criteria.
Kids are good at generating such."
- Doug Cutting, Creator of Hadoop
“Hadoop is the popular open
source implementation of
MapReduce, a powerful tool
designed for deep analysis
and transformation of very
large data sets.”
https://hadoop.apache.org/
When to Use Hadoop?
1. For Processing Really BIG Data.
2. For Storing a Diverse Set of Data.
3. For Parallel Data Processing.
When NOT to Use Hadoop?
1. For Real-Time Data Analysis.
2. For a Relational Database System.
3. For a General Network File System.
4. For Non-Parallel Data Processing.
Hadoop feature releases
Map-Reduce vs YARN Architecture
Hadoop Core Components:
What is JobTracker?
JobTracker is a daemon which
runs on Apache Hadoop's
MapReduce engine.
JobTracker is an essential
service which farms out all
MapReduce tasks to the
different nodes in the cluster,
ideally to those nodes which
already contain the data, or at
the very least are located in
the same rack as nodes
containing the data.
What is NameNode?
NameNode- It is also known as Master in Hadoop cluster.
Below listed are the main function performed by NameNode:
 NameNode stores metadata of actual data. e.g. filename,
path, No. of Blocks, Block IDs, Block location, no. of
replicas, and also Slave related configuration.
 It manages Filesystem namespace.
 NameNode regulates client access to files.
 It assigns work to Slaves (DataNode).
 It executes file system namespace operation like
opening/closing files, renaming files/directories.
 As NameNode keep metadata in memory for fast retrieval.
So it requires the huge amount of memory for its
operation.
What is Secondary NameNode?
Secondary NameNode, by its name we assume that it as a backup
node but its not. First let me give a brief about NameNode.
NameNode holds the metadata for HDFS like Block information,
size etc. This Information is stored in main memory as well as disk
for persistence storage.
The information is stored in 2 different files .They are
Editlogs- It keeps track of each and every changes to HDFS.
Fsimage- It stores the snapshot of the file system.
What is DataNode?
 DataNode is also known as Slave node.
 In Hadoop HDFS Architecture, DataNode stores
actual data in HDFS.
 DataNodes responsible for serving, read and write
requests for the clients.
 DataNodes can deploy on commodity hardware.
 DataNodes sends information to the NameNode
about the files and blocks stored in that node and
responds to the NameNode for all filesystem
operations.
 When a DataNode starts up it announce itself to
the NameNode along with the list of blocks it is
responsible for.
 DataNode is usually configured with a lot of hard
disk space. Because the actual data is stored in
the DataNode.
What is HDFS?
HDFS is a distributed file system allowing multiple files to be stored and
retrieved at the same time at an unprecedented speed. It is one of the basic
components of Hadoop framework.
Sequence Diagram for Hadoop-MapReduce
Programming Model
Big Data Hadoop Real Life Use Cases:
1. Healthcare
2. Wildlife
3. Retail Industry
4. Income Tax to scrutinize bank accounts
5. Fraud Detection
6. Sentimental Security
7. Networking Security
8. Education etc.
Companies Using Hadoop:
Why Hadoop?
1. Ability to store and process huge amounts of any kind of data, quickly.
2. Computing model processes big data fast
3. Fault tolerance
4. Flexibility
5. Low Cost
6. Scalability
 Vertical scaling doesn’t cut it
 Disk seek times
 Hardware failures
 Processing times
 Horizontal scaling is linear
7. It’s not just for batch processing anymore
Hadoop Timeline
• Google published GFS and MapReduce papers in 2003-2004.
• Yahoo! Was building “Nutch”, an open source web search engine at the same time.
• Hadoop was primarily driven by Doug Cutting and Tom White in 2006.
• It’s been evolving ever since
What is BIG-DATA?
Big data is a term that describes the
large volume of data – both
structured and unstructured – that
inundates a business on a day-to-day
basis. But it’s not the amount of data
that’s important. It’s what
organizations do with the data that
matters. Big data can be analyzed for
insights that lead to better decisions
and strategic business moves.
Big Data Current Considerations
Volume. Organizations collect data from a variety of sources, including business transactions, social media
and information from sensor or machine-to-machine data.
Velocity. Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags,
sensors and smart metering are driving the need to deal with torrents of data in near-real time.
Variety. Data comes in all types of formats – from structured, numeric data in traditional databases to
unstructured text documents, email, video, audio, stock ticker data and financial transactions.
Variability. In addition to the increasing velocities and varieties of data, data flows can be highly
inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal and event-triggered
peak data loads can be challenging to manage. Even more so with unstructured data.
Complexity. Today's data comes from multiple sources, which makes it difficult to link, match, cleanse and
transform data across systems. However, it’s necessary to connect and correlate relationships, hierarchies
and multiple data linkages or your data can quickly spiral out of control.
What is MapReduce?
MapReduce is a programming
model or pattern within the
Hadoop framework that is used to
access big data stored in the
Hadoop File System (HDFS). It is a
core component, integral to the
functioning of the Hadoop
framework.
MapReduce is a programming model
Major Components of Hadoop
Core Hadoop EcosystemQuery Engines External Data Storage
Core Hadoop Ecosystem
Query Engines
Real World Application Architecture
External Data Storage
Useful URLs
https://data-flair.training/blogs/hadoop-ecosystem-components/
https://www.quora.com/What-is-a-Hadoop-ecosystem
https://www.geeksforgeeks.org/hadoop-ecosystem/
https://www.edureka.co/blog/hadoop-ecosystem
https://www.simplilearn.com/big-data-and-hadoop-ecosystem-tutorial
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System

More Related Content

What's hot

Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Mohamed Ali Mahmoud khouder
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
Joud Khattab
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
Shubham Parmar
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
Hive
HiveHive
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
Arvind Kumar
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
alexbaranau
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Spark overview
Spark overviewSpark overview
Spark overview
Lisa Hua
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 

What's hot (20)

Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hive
HiveHive
Hive
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Spark overview
Spark overviewSpark overview
Spark overview
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 

Similar to Introduction to Apache Hadoop Eco-System

Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
Nitesh Ghosh
 
Big data
Big dataBig data
Big data
revathireddyb
 
Big data
Big dataBig data
Big data
revathireddyb
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
Sysfore Technologies
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
Kalyan Hadoop
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
saisreealekhya
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Mr. Ankit
 
Big Data
Big DataBig Data
Big Data
Kirubaburi R
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
Shivanee garg
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
Dux Chandegra
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
Nikita Sure
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
nandhiniarumugam619
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
Cognizant
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
Rahul Sharma
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
himanshu arora
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
Spotle.ai
 

Similar to Introduction to Apache Hadoop Eco-System (20)

Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
paper
paperpaper
paper
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data
Big DataBig Data
Big Data
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Hadoop
HadoopHadoop
Hadoop
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 

More from Md. Hasan Basri (Angel)

Information Security Engineering
Information Security EngineeringInformation Security Engineering
Information Security Engineering
Md. Hasan Basri (Angel)
 
Introduction to Blockchain Technology
Introduction to Blockchain TechnologyIntroduction to Blockchain Technology
Introduction to Blockchain Technology
Md. Hasan Basri (Angel)
 
MicroService Architecture
MicroService ArchitectureMicroService Architecture
MicroService Architecture
Md. Hasan Basri (Angel)
 
Test Driven Development
Test Driven DevelopmentTest Driven Development
Test Driven Development
Md. Hasan Basri (Angel)
 
Introduction to Bank Reconciliation
Introduction to Bank ReconciliationIntroduction to Bank Reconciliation
Introduction to Bank Reconciliation
Md. Hasan Basri (Angel)
 
Agile/Scrum Methodology Gains Your Productivity
Agile/Scrum Methodology Gains Your ProductivityAgile/Scrum Methodology Gains Your Productivity
Agile/Scrum Methodology Gains Your Productivity
Md. Hasan Basri (Angel)
 
ISO 8583 Financial Message Format
ISO 8583 Financial Message FormatISO 8583 Financial Message Format
ISO 8583 Financial Message Format
Md. Hasan Basri (Angel)
 
Signature based virus detection and protection system
Signature based virus detection and protection systemSignature based virus detection and protection system
Signature based virus detection and protection system
Md. Hasan Basri (Angel)
 
XML Key Management Protocol for Secure Web Service
XML Key Management Protocol for Secure Web ServiceXML Key Management Protocol for Secure Web Service
XML Key Management Protocol for Secure Web ServiceMd. Hasan Basri (Angel)
 

More from Md. Hasan Basri (Angel) (9)

Information Security Engineering
Information Security EngineeringInformation Security Engineering
Information Security Engineering
 
Introduction to Blockchain Technology
Introduction to Blockchain TechnologyIntroduction to Blockchain Technology
Introduction to Blockchain Technology
 
MicroService Architecture
MicroService ArchitectureMicroService Architecture
MicroService Architecture
 
Test Driven Development
Test Driven DevelopmentTest Driven Development
Test Driven Development
 
Introduction to Bank Reconciliation
Introduction to Bank ReconciliationIntroduction to Bank Reconciliation
Introduction to Bank Reconciliation
 
Agile/Scrum Methodology Gains Your Productivity
Agile/Scrum Methodology Gains Your ProductivityAgile/Scrum Methodology Gains Your Productivity
Agile/Scrum Methodology Gains Your Productivity
 
ISO 8583 Financial Message Format
ISO 8583 Financial Message FormatISO 8583 Financial Message Format
ISO 8583 Financial Message Format
 
Signature based virus detection and protection system
Signature based virus detection and protection systemSignature based virus detection and protection system
Signature based virus detection and protection system
 
XML Key Management Protocol for Secure Web Service
XML Key Management Protocol for Secure Web ServiceXML Key Management Protocol for Secure Web Service
XML Key Management Protocol for Secure Web Service
 

Recently uploaded

2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
ayushiqss
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 

Recently uploaded (20)

2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Why React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdfWhy React Native as a Strategic Advantage for Startup Innovation.pdf
Why React Native as a Strategic Advantage for Startup Innovation.pdf
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 

Introduction to Apache Hadoop Eco-System

  • 1.
  • 2. Md. Hasan Basri Technology Enthusiast linkedin.com/in/pothiq twitter.com/pothiq pothiq@gmail.com
  • 3. "The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless and not used elsewhere: those are my naming criteria. Kids are good at generating such." - Doug Cutting, Creator of Hadoop
  • 4. “Hadoop is the popular open source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets.” https://hadoop.apache.org/
  • 5. When to Use Hadoop? 1. For Processing Really BIG Data. 2. For Storing a Diverse Set of Data. 3. For Parallel Data Processing. When NOT to Use Hadoop? 1. For Real-Time Data Analysis. 2. For a Relational Database System. 3. For a General Network File System. 4. For Non-Parallel Data Processing.
  • 7. Map-Reduce vs YARN Architecture
  • 9. What is JobTracker? JobTracker is a daemon which runs on Apache Hadoop's MapReduce engine. JobTracker is an essential service which farms out all MapReduce tasks to the different nodes in the cluster, ideally to those nodes which already contain the data, or at the very least are located in the same rack as nodes containing the data.
  • 10. What is NameNode? NameNode- It is also known as Master in Hadoop cluster. Below listed are the main function performed by NameNode:  NameNode stores metadata of actual data. e.g. filename, path, No. of Blocks, Block IDs, Block location, no. of replicas, and also Slave related configuration.  It manages Filesystem namespace.  NameNode regulates client access to files.  It assigns work to Slaves (DataNode).  It executes file system namespace operation like opening/closing files, renaming files/directories.  As NameNode keep metadata in memory for fast retrieval. So it requires the huge amount of memory for its operation.
  • 11. What is Secondary NameNode? Secondary NameNode, by its name we assume that it as a backup node but its not. First let me give a brief about NameNode. NameNode holds the metadata for HDFS like Block information, size etc. This Information is stored in main memory as well as disk for persistence storage. The information is stored in 2 different files .They are Editlogs- It keeps track of each and every changes to HDFS. Fsimage- It stores the snapshot of the file system.
  • 12. What is DataNode?  DataNode is also known as Slave node.  In Hadoop HDFS Architecture, DataNode stores actual data in HDFS.  DataNodes responsible for serving, read and write requests for the clients.  DataNodes can deploy on commodity hardware.  DataNodes sends information to the NameNode about the files and blocks stored in that node and responds to the NameNode for all filesystem operations.  When a DataNode starts up it announce itself to the NameNode along with the list of blocks it is responsible for.  DataNode is usually configured with a lot of hard disk space. Because the actual data is stored in the DataNode.
  • 13. What is HDFS? HDFS is a distributed file system allowing multiple files to be stored and retrieved at the same time at an unprecedented speed. It is one of the basic components of Hadoop framework.
  • 14. Sequence Diagram for Hadoop-MapReduce Programming Model
  • 15. Big Data Hadoop Real Life Use Cases: 1. Healthcare 2. Wildlife 3. Retail Industry 4. Income Tax to scrutinize bank accounts 5. Fraud Detection 6. Sentimental Security 7. Networking Security 8. Education etc.
  • 17. Why Hadoop? 1. Ability to store and process huge amounts of any kind of data, quickly. 2. Computing model processes big data fast 3. Fault tolerance 4. Flexibility 5. Low Cost 6. Scalability  Vertical scaling doesn’t cut it  Disk seek times  Hardware failures  Processing times  Horizontal scaling is linear 7. It’s not just for batch processing anymore
  • 18. Hadoop Timeline • Google published GFS and MapReduce papers in 2003-2004. • Yahoo! Was building “Nutch”, an open source web search engine at the same time. • Hadoop was primarily driven by Doug Cutting and Tom White in 2006. • It’s been evolving ever since
  • 19. What is BIG-DATA? Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
  • 20. Big Data Current Considerations Volume. Organizations collect data from a variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. Velocity. Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Variety. Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions. Variability. In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal and event-triggered peak data loads can be challenging to manage. Even more so with unstructured data. Complexity. Today's data comes from multiple sources, which makes it difficult to link, match, cleanse and transform data across systems. However, it’s necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control.
  • 21. What is MapReduce? MapReduce is a programming model or pattern within the Hadoop framework that is used to access big data stored in the Hadoop File System (HDFS). It is a core component, integral to the functioning of the Hadoop framework.
  • 22. MapReduce is a programming model
  • 23. Major Components of Hadoop Core Hadoop EcosystemQuery Engines External Data Storage
  • 26. Real World Application Architecture