SlideShare a Scribd company logo
1 of 25
BigDataProcessing
usingHadoop
What is Big Data?
•Data which is very large in size and yet growing
exponentially with time is called as Big data.
Why we need Big Data?
• For any application that contains limited amount of data we normally
use SQL / PostgreSQL / Oracle / MySQL.
• What in case of large applications like Facebook, Google, YouTube?
• This data is so large and complex that none of the traditional data
management system is able to store and process it.
• Facebook generates 500+ TB data per day.
Where does Big Data come from?
•Social data
•Share Market
•E-commerce site
•Airplane
What is the need for storing such
huge amount of data?
• The main reason behind storing data is analysis.
• Data analysis
• More accurate analyses
Example
• When we search anything on e-commerce websites(eBay,
Amazon), we get some recommendations of product that we
search.
• Similarly, why Facebook stores our images ,videos?
• a) Global marketing
• b) Target marketing
Categories Of Big Data
• Structured Data:
• Example :Relational database management system.
• Unstructured Data:
• Example Text files ,images ,videos ,email ,customer service
interactions ,webpages ,PDF files ,PPT ,social media data etc.
What is Hadoop?
• Apache Hadoop is an open-source java framework that is used to
store ,analyze and process Big data in a distributed systems
environment across clusters of computers.
• It is used by Google, Facebook, Yahoo, YouTube, Twitter, LinkedIn and
many more.
• It is inspired by google MapReduce algorithm for running distributed
application.
• Hadoop is written in java programming language with some code in C
and some commands in shell-scripts.
Continued...
•Hadoop is not a database:
Architecture of Hadoop:
• Hadoop follows master-slave architecture
• Two important components of Hadoop are:
• 1.HDFS (Data storage)
• 2.Map-Reduce (Analyzing and Processing)
Hadoop Distributed File System
(HDFS)
• Hadoop Distributed File System is distributed file system
used to store very huge amount of data.
• HDFS follows master-slave architecture means there is
one master machine(Name Node) and multiple slave
machines(Data Node).
• The data that you give to hadoop is stored across these
machines in the cluster.
Various components of HDFS
• Blocks:
• Name Node:
• Data Node:
• Secondary Name Node:
Map reduce
• MapReduce is a framework and processing technique using which we can
write applications to process huge amounts of data, in parallel, on large
clusters of commodity hardware in a reliable manner.
• Map reduce program is written by default in java but we can use other
language also like pig, apache pig.
• The Map Reduce algorithm contains two important tasks:
• 1) Map
• 2) Reduce
Map
• Mapper takes a set of data and converts it into another set
of data, where individual elements are broken down into
tuples (key/value pairs).
Reduce
• Reducer takes the output from a map as input and
combines those data tuples into a smaller set of tuples.
• The reduce job is always performed after the map job.
Map Reduce Example
Components of MapReduce
• JobTracker:
• TaskTracker:
JobTracker process
• JobTracker receives the requests from the client.
• JobTracker talks to the NameNode to determine the location of the
data.
• JobTracker finds the best TaskTracker nodes to execute tasks.
• The JobTracker submits the work to the chosen TaskTracker nodes.
• The TaskTracker nodes are monitored. If they do not submit heartbeat
signals then work is scheduled on a different TaskTracker.
• When the work is completed, the JobTracker updates its status and
submit back the overall status of job back to the client.
• The JobTracker is a point of failure for the Hadoop MapReduce service.
If it goes down, all running jobs are halted.
TaskTracker process
• The JobTracker submits the work to the TaskTracker nodes.
• TaskTracker run the tasks and report the status of task to
JobTracker.
• It has function of following the orders of the job tracker and
updating the job tracker with its progress status periodically.
• TaskTracker will be in constant communication with the
JobTracker.
• TaskTracker failure is not considered fatal. When a TaskTracker
becomes unresponsive, JobTracker will assign the task to another
node.
Sqoop
• Sqoop is an open source framework provided by Apache.
• Used to import/export data to/from relational databases.
Why Sqoop is used?
• For Hadoop developers, the work starts after data is loaded
into HDFS.
• The data residing in the RDBMS need to be transferred to
HDFS and might need to transfer back to RDBMS.
• Sqoop uses MapReduce framework to import and export
the data, which provides parallel mechanism as well as fault
tolerance.
• Sqoop makes developers life easy by providing command
line interface.
• Developers just need to provide basic information like
source, destination and database authentication details in
the sqoop command.

More Related Content

What's hot

Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleSriram Krishnan
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformLessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformDatabricks
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveMike Frampton
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Databricks
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...Databricks
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing ArchitectureGang Tao
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataDatabricks
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Databricks
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkItai Yaffe
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS? WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS? nakshatraL
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & RedshiftDataKitchen
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesQubole
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGuang Xu
 
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache GiraphAvery Ching
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Databricks
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 

What's hot (20)

Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformLessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics Platform
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop Hive
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
 
Intro to Big Data - Spark
Intro to Big Data - SparkIntro to Big Data - Spark
Intro to Big Data - Spark
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS? WHAT IS HADOOP AND ITS COMPONENTS?
WHAT IS HADOOP AND ITS COMPONENTS?
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 

Similar to Big Data and Hadoop

Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCEHADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCEHarsha Siva Sai
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTAmmarHassan80
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With HadoopUmair Shafique
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystemmashoodsyed66
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 

Similar to Big Data and Hadoop (20)

Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCEHADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoT
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Big data technology
Big data technology Big data technology
Big data technology
 
Big data applications
Big data applicationsBig data applications
Big data applications
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data
Big DataBig Data
Big Data
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 

Recently uploaded

Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 

Big Data and Hadoop

  • 2. What is Big Data? •Data which is very large in size and yet growing exponentially with time is called as Big data.
  • 3. Why we need Big Data? • For any application that contains limited amount of data we normally use SQL / PostgreSQL / Oracle / MySQL. • What in case of large applications like Facebook, Google, YouTube? • This data is so large and complex that none of the traditional data management system is able to store and process it. • Facebook generates 500+ TB data per day.
  • 4. Where does Big Data come from? •Social data •Share Market •E-commerce site •Airplane
  • 5. What is the need for storing such huge amount of data? • The main reason behind storing data is analysis. • Data analysis • More accurate analyses
  • 6. Example • When we search anything on e-commerce websites(eBay, Amazon), we get some recommendations of product that we search. • Similarly, why Facebook stores our images ,videos? • a) Global marketing • b) Target marketing
  • 7. Categories Of Big Data • Structured Data: • Example :Relational database management system. • Unstructured Data: • Example Text files ,images ,videos ,email ,customer service interactions ,webpages ,PDF files ,PPT ,social media data etc.
  • 8.
  • 9. What is Hadoop? • Apache Hadoop is an open-source java framework that is used to store ,analyze and process Big data in a distributed systems environment across clusters of computers. • It is used by Google, Facebook, Yahoo, YouTube, Twitter, LinkedIn and many more. • It is inspired by google MapReduce algorithm for running distributed application. • Hadoop is written in java programming language with some code in C and some commands in shell-scripts.
  • 11. Architecture of Hadoop: • Hadoop follows master-slave architecture • Two important components of Hadoop are: • 1.HDFS (Data storage) • 2.Map-Reduce (Analyzing and Processing)
  • 12.
  • 13. Hadoop Distributed File System (HDFS) • Hadoop Distributed File System is distributed file system used to store very huge amount of data. • HDFS follows master-slave architecture means there is one master machine(Name Node) and multiple slave machines(Data Node). • The data that you give to hadoop is stored across these machines in the cluster.
  • 14.
  • 15. Various components of HDFS • Blocks: • Name Node: • Data Node: • Secondary Name Node:
  • 16. Map reduce • MapReduce is a framework and processing technique using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity hardware in a reliable manner. • Map reduce program is written by default in java but we can use other language also like pig, apache pig. • The Map Reduce algorithm contains two important tasks: • 1) Map • 2) Reduce
  • 17. Map • Mapper takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs).
  • 18. Reduce • Reducer takes the output from a map as input and combines those data tuples into a smaller set of tuples. • The reduce job is always performed after the map job.
  • 20. Components of MapReduce • JobTracker: • TaskTracker:
  • 21. JobTracker process • JobTracker receives the requests from the client. • JobTracker talks to the NameNode to determine the location of the data. • JobTracker finds the best TaskTracker nodes to execute tasks. • The JobTracker submits the work to the chosen TaskTracker nodes. • The TaskTracker nodes are monitored. If they do not submit heartbeat signals then work is scheduled on a different TaskTracker. • When the work is completed, the JobTracker updates its status and submit back the overall status of job back to the client. • The JobTracker is a point of failure for the Hadoop MapReduce service. If it goes down, all running jobs are halted.
  • 22. TaskTracker process • The JobTracker submits the work to the TaskTracker nodes. • TaskTracker run the tasks and report the status of task to JobTracker. • It has function of following the orders of the job tracker and updating the job tracker with its progress status periodically. • TaskTracker will be in constant communication with the JobTracker. • TaskTracker failure is not considered fatal. When a TaskTracker becomes unresponsive, JobTracker will assign the task to another node.
  • 23.
  • 24. Sqoop • Sqoop is an open source framework provided by Apache. • Used to import/export data to/from relational databases.
  • 25. Why Sqoop is used? • For Hadoop developers, the work starts after data is loaded into HDFS. • The data residing in the RDBMS need to be transferred to HDFS and might need to transfer back to RDBMS. • Sqoop uses MapReduce framework to import and export the data, which provides parallel mechanism as well as fault tolerance. • Sqoop makes developers life easy by providing command line interface. • Developers just need to provide basic information like source, destination and database authentication details in the sqoop command.