SlideShare a Scribd company logo
Booking Hotel, Flight, Train, Event & Rental Car
Intro to Hadoop and MapReduce
Outline
• Big Data
• Hadoop
• Hadoop Cluster
• Hadoop Ecosystem
• HDFS
• MapReduce
• Demo
Big Data
• There’s no one definition for ‘big data’, it’s a very subjective term.
Big Data
• Most people would consider a data set of terabytes or more to be ‘big data’,
but there are certainly people using Hadoop with great success on smaller
chunks of data than that.
• One reasonable definition is that it’s data which can’t comfortably be
processed on a single machine.
The 3 V’s of Big Data
• Volume refers to the size of data that you’re dealing with.
• Variety refers to the fact that the data is often coming from lots of different
sources and in many different formats
• Velocity refers to the speed at which the data is being generated
Hadoop
• The logo and the name comes from Doug Cutting son’s elephant toy.
• Started as a search engine project called Nutch in 2003 by Doug Cutting
and Mike Cafarella.
• Implemented Google’s white paper about distributed file system.
• Invested by Yahoo in 2006 and become a open-source project.
• Also in 2016 Hadoop 0.1.0 released
Hadoop Cluster
The core Hadoop project consists of a way to store data, known as the
Hadoop Distributed File System, or HDFS, and a way to process the data,
called MapReduce. The key concept is that we split the data up and store it
across a collection of machines, known as a cluster. Then, when we want to
process the data, we process it where it’s actually stored. Rather than
retrieving the data from a central server, instead it’s already on the cluster,
and we can process it in place.
Store in HDFS
Process with
MapReduce
Hadoop Ecosystem
HDFS
MapReduce
Pig Hive
Impala HBase
• Sqoop
• Flume
SELECT *
FROM
• Hue
• Oozie
• Mahout
• And Many More!!
HDFS
150 MB
mydata.txt
64 MB
64 MB
22 MB
HDFS block size is much bigger than common file systems.
HDFS
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
NameNode
64 MB64 MB
22 MB
22 MB
22 MB
64 MB
64 MB
64 MB
64 MB
NameNode
(standby)
MapReduce
Booking Hotel, Flight, Train, Event & Rental Car
Demo masak

More Related Content

What's hot

Big data PPT
Big data PPT Big data PPT
Big data PPT
Nitesh Dubey
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processingroyans
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
Chad Richeson
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Prashanth Yennampelli
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
Mohanasundaram Ponnusamy
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Rahul Johari
 
Map reduce & HDFS with Hadoop
Map reduce & HDFS with HadoopMap reduce & HDFS with Hadoop
Map reduce & HDFS with Hadoop
Diego Pacheco
 
Checkupload1 140213043220-phpapp01
Checkupload1 140213043220-phpapp01Checkupload1 140213043220-phpapp01
Checkupload1 140213043220-phpapp01Nitish Bhardwaj
 
Introduction to Hadoop
Introduction to Hadoop Introduction to Hadoop
Introduction to Hadoop
Sudarshan Pant
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
 
Big Data
Big DataBig Data
Big Data
Amir Hossain
 
Big data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introductionBig data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introduction
kvaderlipa
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Roushan Sinha
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
shubham kuwar
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
Adeel Ahmad
 
Big data
Big dataBig data
Frequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoopFrequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoop
SWAMI06
 
Hadoop 130419075715-phpapp02(1)
Hadoop 130419075715-phpapp02(1)Hadoop 130419075715-phpapp02(1)
Hadoop 130419075715-phpapp02(1)Nitish Bhardwaj
 

What's hot (20)

Big data PPT
Big data PPT Big data PPT
Big data PPT
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Map reduce & HDFS with Hadoop
Map reduce & HDFS with HadoopMap reduce & HDFS with Hadoop
Map reduce & HDFS with Hadoop
 
Checkupload1 140213043220-phpapp01
Checkupload1 140213043220-phpapp01Checkupload1 140213043220-phpapp01
Checkupload1 140213043220-phpapp01
 
Introduction to Hadoop
Introduction to Hadoop Introduction to Hadoop
Introduction to Hadoop
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Big Data
Big DataBig Data
Big Data
 
Big data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introductionBig data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introduction
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
 
Anju
AnjuAnju
Anju
 
Big data
Big dataBig data
Big data
 
Frequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoopFrequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoop
 
Hadoop 130419075715-phpapp02(1)
Hadoop 130419075715-phpapp02(1)Hadoop 130419075715-phpapp02(1)
Hadoop 130419075715-phpapp02(1)
 
Pptx present
Pptx presentPptx present
Pptx present
 

Similar to Intro to Hadoop and MapReduce

Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
tcloudcomputing-tw
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
TIB Academy
 
Big data
Big dataBig data
Big data
Mayuri Verma
 
Big data
Big dataBig data
Big data
Alisha Roy
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
Harshdeep Kaur
 
Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copyMohammad_Tariq
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
Kunal Khanna
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
umapavankumar kethavarapu
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
6535ANURAGANURAG
 
Hadoop
HadoopHadoop
Hadoop
Oded Rotter
 
BW Tech Meetup: Hadoop and The rise of Big Data
BW Tech Meetup: Hadoop and The rise of Big Data BW Tech Meetup: Hadoop and The rise of Big Data
BW Tech Meetup: Hadoop and The rise of Big Data
Mindgrub Technologies
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
KrishnenduKrishh
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
Humoyun Ahmedov
 
Hadoop admiin demo
Hadoop admiin demoHadoop admiin demo
Hadoop admiin demo
sparrowAnalytics.com
 

Similar to Intro to Hadoop and MapReduce (20)

Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
Hadoop
HadoopHadoop
Hadoop
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copy
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Hadoop
HadoopHadoop
Hadoop
 
BW Tech Meetup: Hadoop and The rise of Big Data
BW Tech Meetup: Hadoop and The rise of Big Data BW Tech Meetup: Hadoop and The rise of Big Data
BW Tech Meetup: Hadoop and The rise of Big Data
 
Bw tech hadoop
Bw tech hadoopBw tech hadoop
Bw tech hadoop
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Hadoop admiin demo
Hadoop admiin demoHadoop admiin demo
Hadoop admiin demo
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 

Intro to Hadoop and MapReduce

  • 1. Booking Hotel, Flight, Train, Event & Rental Car Intro to Hadoop and MapReduce
  • 2. Outline • Big Data • Hadoop • Hadoop Cluster • Hadoop Ecosystem • HDFS • MapReduce • Demo
  • 3. Big Data • There’s no one definition for ‘big data’, it’s a very subjective term.
  • 4. Big Data • Most people would consider a data set of terabytes or more to be ‘big data’, but there are certainly people using Hadoop with great success on smaller chunks of data than that. • One reasonable definition is that it’s data which can’t comfortably be processed on a single machine.
  • 5. The 3 V’s of Big Data • Volume refers to the size of data that you’re dealing with. • Variety refers to the fact that the data is often coming from lots of different sources and in many different formats • Velocity refers to the speed at which the data is being generated
  • 6. Hadoop • The logo and the name comes from Doug Cutting son’s elephant toy. • Started as a search engine project called Nutch in 2003 by Doug Cutting and Mike Cafarella. • Implemented Google’s white paper about distributed file system. • Invested by Yahoo in 2006 and become a open-source project. • Also in 2016 Hadoop 0.1.0 released
  • 7. Hadoop Cluster The core Hadoop project consists of a way to store data, known as the Hadoop Distributed File System, or HDFS, and a way to process the data, called MapReduce. The key concept is that we split the data up and store it across a collection of machines, known as a cluster. Then, when we want to process the data, we process it where it’s actually stored. Rather than retrieving the data from a central server, instead it’s already on the cluster, and we can process it in place. Store in HDFS Process with MapReduce
  • 8. Hadoop Ecosystem HDFS MapReduce Pig Hive Impala HBase • Sqoop • Flume SELECT * FROM • Hue • Oozie • Mahout • And Many More!!
  • 9. HDFS 150 MB mydata.txt 64 MB 64 MB 22 MB HDFS block size is much bigger than common file systems.
  • 10. HDFS DataNode DataNode DataNode DataNode DataNode DataNode NameNode 64 MB64 MB 22 MB 22 MB 22 MB 64 MB 64 MB 64 MB 64 MB NameNode (standby)
  • 12. Booking Hotel, Flight, Train, Event & Rental Car Demo masak