SlideShare a Scribd company logo
1 of 9
Big Data and Hadoop
Data Facts:-
 The New York Stock Exchange generates about 1 TB of trade data per day.
 Facebook hosts approximately 10 billion of photos, taking up one petabyte of storage.
 Ancestry.com, the genealogy site, stores around 2.5 petabytes of data.
 8 TB generated per day by Twitter.
 The internet Archive Stores around 2 petabytes of data, and is growing at a rate of 20
terabytes per month.
 The Large Hardon Collider near Geneva, Switzerland will produce about 15 petabytes of
data per year.
Big Data:-
It is commonly summarizeas 3Vs of data. Though there is another V which is also equally
important. They are as follows:-
Volume: - This clearly tells about the total size of data which could be in TB or PB or
Zettabytes of data which happens to be semi or multi-structure.
Variety: - Mostly generated data are messy because diverse data sources do not provide a
static structure enabling the traditional RDBMS timely manage.
Velocity: - It is the speed at which data is collected i.e. the rate at which the data is
becoming available to the organization and do the analysis of streaming data to enable
decision within very short time frame.
Veracity: - It is the uncertainty about the genuineness of huge data which is being
generated.
Pic: - Different levels of data generation
Market trends is having New Set of Questions like:-
Social and Web Analytics:-
 What is the social sentiment of my brand or products?
 How effective is our online campaign?
 How can I optimize my traffic to reach the target audience?
Live data feeds:-
 How can we optimize the fleet based on weather and traffic patterns?
Advanced Analytics:-
 How can we better predict our future outcomes?
Hadoop:-
 Big Data Processing Platform.
 Use the “MAP-Reduce” processing paradigm.
 Characteristics:
i>Highly Scalable (Scaled out).
ii>Commodity Hardware-based.
iii> Open source -> Very low cost for acquisition and storage costs.
Hadoop is consist of two different parts and they are Hadoop Distributed File System
(HDFS)and MapReduce Framework.
Hadoop Eco-System:-
HDFS Architecture:-
In HDFS, NameNode is the node which actually receive all the requests coming towards the
system and manages all the datanodes (datanodes are the commodity machine which does the
computation as well as storing of data) in the cluster. When data comes to NameNode it split
the incoming volumes into multiple blocks and evenly shared among datanodes. Data will be
replicated (for high availability) as per the policy (default value is 3) i.e. every block will be
copied N times and stored in different datanodes.
Secondary NameNode stores the metadata of Primary NameNode, so if at any point the
primary goes down also secondary NameNode can be used as an alternative option. As
automatic failover does not support, so we need to manually change the Secondary NameNode
to Primary NameNode.
MapReduce Framework:-
MapReduce consist of multiple functions which is being performed to come to the final stage of
any result set. Below diagram has depicted the same-
Pic:- Flow of MapReduce
Hadoop 1.x- In Summary:-
Limitationof Hadoop 1.x:-
 No Horizontal scalability of NameNode:-
Challenges:-
i. Metadata will store in NameNode memory i.e RAM.
ii. Bottleneck after ~4000 Nodes.
iii. Results in cascading failures of DataNode.
 Does not support NameNode High Aviability:-
Challenges:-
i. Secondary NameNode is not aHot Standby for the NameNode.
 Overburdened JobTracker:-
Challenges:-
i. CPU spends a very significant portion of time and effort managing the life cycle of
applications.
ii. Single Network Listener Thread to communicate with thousands of Map and Reduce
jobs.
 No possible to run Non-MapReduce Big Data Applications on HDFS:-
Challenges:-
i. Only MapReduce processing can be achieved.
ii. Alternate Data Storage is needed for other processing such as Real-time and Graph
Analysis.
 Does not support Multi-tenancy.
Hadoop 2.x:- Enhanced features are as follows-
 HDFS Federation.
 Support NameNode High Availability.
 YARN- Yet Another Resource Negotiator.
i. Better Processing Control.
ii. Support for non-MapReduce type of processing.
iii. Support for Multi-tenancy.
Hadoop 2.x- In Summary:-
Pic:- Structure of Hadoop 2.x
Yet Another Resource Negotiator (YARN):-It makes enable to run multiple types of workloads.
Multi-tenancy - Capacity Scheduler:-
Structure difference of Hadoop1.x and 2.x:-

More Related Content

What's hot

simple introduction to hadoop
simple introduction to hadoopsimple introduction to hadoop
simple introduction to hadoopvishnu rao
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisTrieu Nguyen
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Thomas Vanhove
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsOntotext
 
MongoDB and Hadoop Handling for Big Data
MongoDB and Hadoop Handling for Big DataMongoDB and Hadoop Handling for Big Data
MongoDB and Hadoop Handling for Big DataMuhammad zubair
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopRojaT4
 
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...Yuanyuan Tian
 
Big data presentation
Big data presentationBig data presentation
Big data presentationSreeSowmya7
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceKrishna Sangeeth KS
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1RojaT4
 
Microsoft on Big Data
Microsoft on Big DataMicrosoft on Big Data
Microsoft on Big DataYvette Teiken
 

What's hot (20)

INTRODUCTION OF BIG DATA
INTRODUCTION OF BIG DATAINTRODUCTION OF BIG DATA
INTRODUCTION OF BIG DATA
 
simple introduction to hadoop
simple introduction to hadoopsimple introduction to hadoop
simple introduction to hadoop
 
Big data computing
Big data computingBig data computing
Big data computing
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
 
Hadoop
HadoopHadoop
Hadoop
 
Big data PPT
Big data PPT Big data PPT
Big data PPT
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got Semantics
 
MongoDB and Hadoop Handling for Big Data
MongoDB and Hadoop Handling for Big DataMongoDB and Hadoop Handling for Big Data
MongoDB and Hadoop Handling for Big Data
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - Hadoop
 
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
 
Big data
Big dataBig data
Big data
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and Mapreduce
 
Big data no company
Big data   no companyBig data   no company
Big data no company
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Microsoft on Big Data
Microsoft on Big DataMicrosoft on Big Data
Microsoft on Big Data
 

Similar to Bigdata & Hadoop

Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigDataThanusha154
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET Journal
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniquesijsrd.com
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online trainingHarika583
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook AhmedDoukh
 

Similar to Bigdata & Hadoop (20)

Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
NoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and AnalyticsNoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and Analytics
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Big data
Big dataBig data
Big data
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Big Data
Big DataBig Data
Big Data
 
Data science big data and analytics
Data science big data and analyticsData science big data and analytics
Data science big data and analytics
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook
 
How Do I Learn Big Data
How Do I Learn Big DataHow Do I Learn Big Data
How Do I Learn Big Data
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 

Bigdata & Hadoop

  • 1. Big Data and Hadoop Data Facts:-  The New York Stock Exchange generates about 1 TB of trade data per day.  Facebook hosts approximately 10 billion of photos, taking up one petabyte of storage.  Ancestry.com, the genealogy site, stores around 2.5 petabytes of data.  8 TB generated per day by Twitter.  The internet Archive Stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month.  The Large Hardon Collider near Geneva, Switzerland will produce about 15 petabytes of data per year. Big Data:- It is commonly summarizeas 3Vs of data. Though there is another V which is also equally important. They are as follows:- Volume: - This clearly tells about the total size of data which could be in TB or PB or Zettabytes of data which happens to be semi or multi-structure. Variety: - Mostly generated data are messy because diverse data sources do not provide a static structure enabling the traditional RDBMS timely manage. Velocity: - It is the speed at which data is collected i.e. the rate at which the data is becoming available to the organization and do the analysis of streaming data to enable decision within very short time frame. Veracity: - It is the uncertainty about the genuineness of huge data which is being generated.
  • 2. Pic: - Different levels of data generation Market trends is having New Set of Questions like:- Social and Web Analytics:-  What is the social sentiment of my brand or products?  How effective is our online campaign?  How can I optimize my traffic to reach the target audience? Live data feeds:-  How can we optimize the fleet based on weather and traffic patterns? Advanced Analytics:-  How can we better predict our future outcomes? Hadoop:-  Big Data Processing Platform.  Use the “MAP-Reduce” processing paradigm.  Characteristics: i>Highly Scalable (Scaled out). ii>Commodity Hardware-based. iii> Open source -> Very low cost for acquisition and storage costs. Hadoop is consist of two different parts and they are Hadoop Distributed File System (HDFS)and MapReduce Framework.
  • 3. Hadoop Eco-System:- HDFS Architecture:- In HDFS, NameNode is the node which actually receive all the requests coming towards the system and manages all the datanodes (datanodes are the commodity machine which does the computation as well as storing of data) in the cluster. When data comes to NameNode it split the incoming volumes into multiple blocks and evenly shared among datanodes. Data will be replicated (for high availability) as per the policy (default value is 3) i.e. every block will be copied N times and stored in different datanodes. Secondary NameNode stores the metadata of Primary NameNode, so if at any point the primary goes down also secondary NameNode can be used as an alternative option. As
  • 4. automatic failover does not support, so we need to manually change the Secondary NameNode to Primary NameNode. MapReduce Framework:- MapReduce consist of multiple functions which is being performed to come to the final stage of any result set. Below diagram has depicted the same-
  • 5. Pic:- Flow of MapReduce Hadoop 1.x- In Summary:- Limitationof Hadoop 1.x:-  No Horizontal scalability of NameNode:-
  • 6. Challenges:- i. Metadata will store in NameNode memory i.e RAM. ii. Bottleneck after ~4000 Nodes. iii. Results in cascading failures of DataNode.  Does not support NameNode High Aviability:- Challenges:- i. Secondary NameNode is not aHot Standby for the NameNode.
  • 7.  Overburdened JobTracker:- Challenges:- i. CPU spends a very significant portion of time and effort managing the life cycle of applications. ii. Single Network Listener Thread to communicate with thousands of Map and Reduce jobs.  No possible to run Non-MapReduce Big Data Applications on HDFS:- Challenges:- i. Only MapReduce processing can be achieved. ii. Alternate Data Storage is needed for other processing such as Real-time and Graph Analysis.  Does not support Multi-tenancy. Hadoop 2.x:- Enhanced features are as follows-  HDFS Federation.  Support NameNode High Availability.  YARN- Yet Another Resource Negotiator. i. Better Processing Control. ii. Support for non-MapReduce type of processing. iii. Support for Multi-tenancy.
  • 8. Hadoop 2.x- In Summary:- Pic:- Structure of Hadoop 2.x Yet Another Resource Negotiator (YARN):-It makes enable to run multiple types of workloads. Multi-tenancy - Capacity Scheduler:-
  • 9. Structure difference of Hadoop1.x and 2.x:-