SlideShare a Scribd company logo
1 of 17
Support of Hadoop for Big data
Apache Hadoop
• Apache Hadoop is an open-source software
framework that supports data-intensive distributed
applications, licensed under the Apache v2 license.
• Hadoop was created by Doug Cutting and Mike
Cafarella in 2005.
• Named it after his son's toy elephant. It was
originally developed to support distribution for the
Nutch search engine project.
5/18/2022 2
Hadoop is an open source project initiated by Apache
foundation that enables processing of large data sets in a
distributed manner. The core of Hadoop mainly consists of
two things:
• MapReduce
• HDFS (Hadoop Distributed File System)
• MapReduce is a framework or a programming model that allows carrying
out tasks in parallel across a large cluster of computers. It mainly consists
of two functions namely Map and Reduce.
TaskTracker
Map-Reduce Architecture
JobTracker - Master
Client
TaskTracker
TaskTracker -
Slaves
Run Map and Reduce task
Manage intermediate output
UI for submitting jobs
Polls status information
Accepts MR jobs
Assigns tasks to slaves
Monitors tasks
Handles failures
Task
Run the Map and Reduce
functions
Report progress
1. Client submits the job to JobTracker running on the Namenode of the Hadoop Cluster.
2. The Jobtracker generates and returns a job id for the submitted MapReduce task to the
client. This id is used by the client or the Namenode to stop or kill the job if needed.
3. The job resources such as the required jar files, metadata files, input files to the MapReduce
tasks are copied from the client to the HDFS that can be accessed by the Namenode as well
as Datanodes for processing.
4. The Job Tracker is now schedules the job to the Tasktracker running on different Datanodes.
5. The Tasktracker runs either the Map tasks or Reduce tasks as assigned by the Jobtracker.
Once the job is finished the results are returned to the Jobtracker. It keeps sending the
heartbeat messages to the Jobtracker indicating the Datanode is up and running.
6. The Jobtracker collects the final result from all the Datanodes and returns to the client in a
prescribed format.
Support of Hadoop for Big data
• HBase: It is a distributed, column-oriented database that is on
top of HDFS. The data model of HBase allows scalability of
data beyond the traditional relational database systems by
grouping the columns of data into Columnfamilie.
• Hive: It is data warehouse that allows querying on large
datasets stored in HDFS using SQL like language interface
called HiveQL. Hive is used for ad-hoc queries, data-
summarization analysis of large data sets stored in HDFS.
5/18/2022 10
• Pig: Apache pig is one the Hadoop platforms that helps in
analyzing large data sets stored in the Hadoop file system. Pig
latin, a high level procedural language facilitates in analyzing
the data sets. It provides Hadoop users to query on the data
sets without Map reduce knowledge by allowing simple
queries similar to SQ .
• Sqoop: It is a command line interface tool that allows transfer
of data between the structural databases and Hadoop
platforms which might be either of HDFS or Hive or HBase. It
also allows exporting data back to the relational databases.
5/18/2022 11
• Mahout: It is a library where the primary goal is to
build or create scalable machine learning algorithms.
• Ambari: A web user interface that helps in
monitoring, provisioning and managing Hadoop
clusters with RESTful APIs. The components of
Hadoop that are supported by Hadoop are HDFS,
Mapreduce, Hive, Sqoop, Pig, Ozzie, Zookeeper,
Hcatalog.
5/18/2022 12
• Chukwa: It is an open source data collection system that helps
in monitoring large distributed systems. It is built on top of
HDFS with support of map reduce and thus inherits
robustness and scalability.
• Avro: It is a framework that helps in performing data
serialization and remote procedure calls. It is most favorable
for scripting languages such as Pig as it facilitates in
transferring the data from one program or language to other
(such as from C to Pig).
5/18/2022 13
• Cassandra: A multi-master database with high
availability,scalability and performance. It can serve as both
real-time operational datastore as well as a read-intensive
database for business intelligence applications. It supports
replication across multiple data centres and is a perfect
platform for mission-critical data.
• Zookeeper: It is a centralized service that maintains all the
configuration details of distributed file system. The
configuration details include the naming, distribution and
synchronization of the services.
5/18/2022 14
• Oozie: It is a scheduler that helps in managing Hadoop jobs.
An application may require multiple map reduce jobs to run.
Oozie helps in managing the workflows between these jobs by
managing workflow instances, its variables and the control
dependencies among them.
• Flume: It is a distributed service which helps in collecting and
aggregating large amounts of log data. It seems similar to
Chukwa but the difference is Flume is used for near –real time
analytics while Chukwa is used for batch oriented or periodic
analytics.
5/18/2022 15
• BigSQL: It is SQL interface developed by IBM for its Hadoop
platform Infosphere Biginsights. It does not turn the Hadoop
into a relational database but rather provides the developers
with SQL knowledge to create tables for the data stored in
Hive, HBase and in the distributed file system.
5/18/2022 16
• Stinger: It’s an initiative from HortonWorks and
Microsoft to improve SQL interface of Hive and to
improve the speed of Hive queries execution much
faster.
• Apache drill: It aims at providing real-time query
execution on the data stored in Hadoop. The goal of
this project is to provide the results of query on
Hadoop with petabytes to trillions of data in less
than a second.
5/18/2022 17

More Related Content

Similar to Hadoop.pptx

Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptxVIJAYAPRABAP
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Muthu Natarajan
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Abdul Nasir
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoopManoj Jangalva
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellKhalid Imran
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxraghavanand36
 

Similar to Hadoop.pptx (20)

Anju
AnjuAnju
Anju
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
Hadoop
HadoopHadoop
Hadoop
 

Recently uploaded

VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...Call Girls in Nagpur High Profile
 
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /WhatsappsBeautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsappssapnasaifi408
 
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Pooja Nehwal
 
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样qaffana
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一ga6c6bdl
 
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurVIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurSuhani Kapoor
 
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝soniya singh
 
Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...
Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...
Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...srsj9000
 
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...ur8mqw8e
 
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...Call Girls in Nagpur High Profile
 
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls KolkataCall Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up Number
Call Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up NumberCall Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up Number
Call Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up NumberMs Riya
 
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》o8wvnojp
 
Vip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts ServiceVip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts Serviceankitnayak356677
 
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,Pooja Nehwal
 
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai GapedCall Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gapedkojalkojal131
 
9004554577, Get Adorable Call Girls service. Book call girls & escort service...
9004554577, Get Adorable Call Girls service. Book call girls & escort service...9004554577, Get Adorable Call Girls service. Book call girls & escort service...
9004554577, Get Adorable Call Girls service. Book call girls & escort service...Pooja Nehwal
 
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Alambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service LucknowAlambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service Lucknowmakika9823
 

Recently uploaded (20)

VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
 
9953330565 Low Rate Call Girls In Jahangirpuri Delhi NCR
9953330565 Low Rate Call Girls In Jahangirpuri  Delhi NCR9953330565 Low Rate Call Girls In Jahangirpuri  Delhi NCR
9953330565 Low Rate Call Girls In Jahangirpuri Delhi NCR
 
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /WhatsappsBeautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
 
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
Call Girls in Thane 9892124323, Vashi cAll girls Serivces Juhu Escorts, powai...
 
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
 
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurVIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
 
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
 
Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...
Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...
Hifi Defence Colony Call Girls Service WhatsApp -> 9999965857 Available 24x7 ...
 
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
 
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
 
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls KolkataCall Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up Number
Call Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up NumberCall Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up Number
Call Girls Delhi {Rs-10000 Laxmi Nagar] 9711199012 Whats Up Number
 
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
 
Vip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts ServiceVip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts Service
 
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
Call Girls In Andheri East Call 9892124323 Book Hot And Sexy Girls,
 
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai GapedCall Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
 
9004554577, Get Adorable Call Girls service. Book call girls & escort service...
9004554577, Get Adorable Call Girls service. Book call girls & escort service...9004554577, Get Adorable Call Girls service. Book call girls & escort service...
9004554577, Get Adorable Call Girls service. Book call girls & escort service...
 
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
 
Alambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service LucknowAlambagh Call Girl 9548273370 , Call Girls Service Lucknow
Alambagh Call Girl 9548273370 , Call Girls Service Lucknow
 

Hadoop.pptx

  • 1. Support of Hadoop for Big data
  • 2. Apache Hadoop • Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. • Hadoop was created by Doug Cutting and Mike Cafarella in 2005. • Named it after his son's toy elephant. It was originally developed to support distribution for the Nutch search engine project. 5/18/2022 2
  • 3. Hadoop is an open source project initiated by Apache foundation that enables processing of large data sets in a distributed manner. The core of Hadoop mainly consists of two things: • MapReduce • HDFS (Hadoop Distributed File System)
  • 4. • MapReduce is a framework or a programming model that allows carrying out tasks in parallel across a large cluster of computers. It mainly consists of two functions namely Map and Reduce.
  • 5.
  • 6.
  • 7. TaskTracker Map-Reduce Architecture JobTracker - Master Client TaskTracker TaskTracker - Slaves Run Map and Reduce task Manage intermediate output UI for submitting jobs Polls status information Accepts MR jobs Assigns tasks to slaves Monitors tasks Handles failures Task Run the Map and Reduce functions Report progress
  • 8. 1. Client submits the job to JobTracker running on the Namenode of the Hadoop Cluster. 2. The Jobtracker generates and returns a job id for the submitted MapReduce task to the client. This id is used by the client or the Namenode to stop or kill the job if needed. 3. The job resources such as the required jar files, metadata files, input files to the MapReduce tasks are copied from the client to the HDFS that can be accessed by the Namenode as well as Datanodes for processing. 4. The Job Tracker is now schedules the job to the Tasktracker running on different Datanodes. 5. The Tasktracker runs either the Map tasks or Reduce tasks as assigned by the Jobtracker. Once the job is finished the results are returned to the Jobtracker. It keeps sending the heartbeat messages to the Jobtracker indicating the Datanode is up and running. 6. The Jobtracker collects the final result from all the Datanodes and returns to the client in a prescribed format.
  • 9.
  • 10. Support of Hadoop for Big data • HBase: It is a distributed, column-oriented database that is on top of HDFS. The data model of HBase allows scalability of data beyond the traditional relational database systems by grouping the columns of data into Columnfamilie. • Hive: It is data warehouse that allows querying on large datasets stored in HDFS using SQL like language interface called HiveQL. Hive is used for ad-hoc queries, data- summarization analysis of large data sets stored in HDFS. 5/18/2022 10
  • 11. • Pig: Apache pig is one the Hadoop platforms that helps in analyzing large data sets stored in the Hadoop file system. Pig latin, a high level procedural language facilitates in analyzing the data sets. It provides Hadoop users to query on the data sets without Map reduce knowledge by allowing simple queries similar to SQ . • Sqoop: It is a command line interface tool that allows transfer of data between the structural databases and Hadoop platforms which might be either of HDFS or Hive or HBase. It also allows exporting data back to the relational databases. 5/18/2022 11
  • 12. • Mahout: It is a library where the primary goal is to build or create scalable machine learning algorithms. • Ambari: A web user interface that helps in monitoring, provisioning and managing Hadoop clusters with RESTful APIs. The components of Hadoop that are supported by Hadoop are HDFS, Mapreduce, Hive, Sqoop, Pig, Ozzie, Zookeeper, Hcatalog. 5/18/2022 12
  • 13. • Chukwa: It is an open source data collection system that helps in monitoring large distributed systems. It is built on top of HDFS with support of map reduce and thus inherits robustness and scalability. • Avro: It is a framework that helps in performing data serialization and remote procedure calls. It is most favorable for scripting languages such as Pig as it facilitates in transferring the data from one program or language to other (such as from C to Pig). 5/18/2022 13
  • 14. • Cassandra: A multi-master database with high availability,scalability and performance. It can serve as both real-time operational datastore as well as a read-intensive database for business intelligence applications. It supports replication across multiple data centres and is a perfect platform for mission-critical data. • Zookeeper: It is a centralized service that maintains all the configuration details of distributed file system. The configuration details include the naming, distribution and synchronization of the services. 5/18/2022 14
  • 15. • Oozie: It is a scheduler that helps in managing Hadoop jobs. An application may require multiple map reduce jobs to run. Oozie helps in managing the workflows between these jobs by managing workflow instances, its variables and the control dependencies among them. • Flume: It is a distributed service which helps in collecting and aggregating large amounts of log data. It seems similar to Chukwa but the difference is Flume is used for near –real time analytics while Chukwa is used for batch oriented or periodic analytics. 5/18/2022 15
  • 16. • BigSQL: It is SQL interface developed by IBM for its Hadoop platform Infosphere Biginsights. It does not turn the Hadoop into a relational database but rather provides the developers with SQL knowledge to create tables for the data stored in Hive, HBase and in the distributed file system. 5/18/2022 16
  • 17. • Stinger: It’s an initiative from HortonWorks and Microsoft to improve SQL interface of Hive and to improve the speed of Hive queries execution much faster. • Apache drill: It aims at providing real-time query execution on the data stored in Hadoop. The goal of this project is to provide the results of query on Hadoop with petabytes to trillions of data in less than a second. 5/18/2022 17