SlideShare a Scribd company logo
Big Data and Hadoop Essentials
2
Hadoop Ecosystem
Agenda
Map Reduce Algorithm Exemplified
Hadoop Architecture
Brief History in time
Why Hadoop?
How Big is Big Data?
Demo
3
Brief History in time
In pioneer days they used oxen for heavy pulling, and when one ox couldn’t
budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for
bigger computers, but more systems of computers.
—Grace Hopper, American Computer Scientist
4
How Big is Big Data?
5
How Big is Big Data?
6
How Big is Big Data?
7
Why Hadoop?
8
The Problem
9
BIG
DATA
Volume
Big Data comes in on large
scale. Its on TB and even PB
Records, Transaction,
Tables , Files
Veracity
Quality, consistency,
reliability and provenance of
data
Good, bad, undefined,
inconsistency, incomplete.
Variety
Big Data extends structured,
including semi- structured and
unstructured data of all variety
text, log, xml, audio, video, stream,
flat files
Velocity
Data flown continues, time
sensitive, streaming flow
Batch, Real time, Streams,
Historic
Challenges in managing Big Data
10
To overcome Big Data challenges Hadoop evolves
• Cost Effective – Commodity HW
• Big Cluster – (1000 Nodes) --- Provides Storage &
Processing
• Parallel Processing – Map reduce
• Big Storage – Memory per node * no of Nodes / RF
• Fail over mechanism – Automatic Failover
• Data Distribution
• Moving Code to data
• Heterogeneous Hardware System
(IBM,HP,AIX,Oracle Machine of any
memory and CPU configuration)
• Scalable
11
What Exactly is Hadoop?
12
What’s in a name?
13
Hadoop Vendors
14
Who uses Hadoop?
15
Why Hadoop is used for?
16
Stop and Ponder
• Is Hadoop an alternative for RDBMS?
• Hadoop is not replacing the traditional data systems used for building
analytic applications – the RDBMS, EDW and MPP systems – but rather is a
complement. & Works fine together with RDBMs.
• Hadoop is being used to distill large quantities of data into something more
manageable
17
Stop and Ponder
• But Don’t we know Coherence to be distributed too? Why Hadoop?
Coherence is the market leading In-Memory Data Grid. While Hadoop works fine
for large processing operations, i.e. requiring many TB of data, that can be
processed in a batch like way, there are use cases where the processing
requirements are more real-time and the data volumes are smaller, where
Coherence is a better choice than HDFS for storing the data
18
Hadoop vs. RDBMS
RDBMS MapReduce
Data size Gigabytes Petabytes
Access Interactive and batch Batch
Structure Fixed schema Unstructured schema
Language SQL Procedural (Java, C++, Ruby, etc)
Integrity High Low
Scaling Nonlinear Linear
Updates Read and write Write once, read many times
Latency Low High
19
Using Hadoop in Enterprise
20
Hadoop Architecture
• Hadoop Distributed File System (HDFS™): A distributed file system that
provides high-throughput access to application data.
• Hadoop MapReduce: A software framework for distributed processing of
large data sets on compute clusters.
HDFS
Map
Reduce
Hadoop
21
Hadoop Distributed File System(HDFS)
22
HDFS Architecture(Master-Slave)
Secondary
Name Node
Master
Book Keeper
Slave(s)
Periodic checkpoint
Data Block
23
The CORE
CLIENT
Data Analytics Jobs
Map Reduce
Data Storage Jobs
HDFS
MASTER
SLAVE
= HDFS
24
Hadoop Ecosystem
25
MAP REDUCE Algorithm exemplified!
Calculate the yearly average per state.
26
Group the city average temperatures by state
1
27
We don’t really care about the city names, so we will
discard those and keep only the state names and cities
Temperatures.
2
28
3
We’re going to get a list of temperatures averages for each
state.
29
That was Map/Reduce!
4
All we have to do is to calculate the average
temperature for each state.
30
Let’s do it again…
• Map/Reduce has 3 stages : Map/Shuffle/Reduce
• The Shuffle part is done automatically by Hadoop, you just need to
implement the Map and Reduce parts.
• You get input data as <Key,Value> for the Map part.
• In this example, the Key is the City name, and the Value is the set
of attributes : State and City yearly average temperature.
31
• Since you want to regroup your temperatures by state, you’re going to get
rid of the city name, and the State will become the Key, while the
Temperature will become the Value.
32
Shuffle
• Now, the shuffle task will run on the output of the Map task. It is going to
group all the values by Key, and you’ll get a List<Value>
33
Reduce
• The Reduce task is the one that does the logic on the data, in our case this
is the calculation of the State yearly average temperature.
• And that’s what we will get as final output
34
Hadoop AppStore
35
Ecosystem Matrix
36
Pig and HIVE in the Hadoop Ecosystem
37
Hadoop Ecosystem Development
38
Demo
39
References
• http://hadoop.apache.org/
• http://hadoop.apache.org/hive/
• Hadoop in Action
(http://www.manning.com/lam/)
• Definitive Guide to Hadoop, 2nd ed.
(http://oreilly.com/catalog/0636920010388)
• Yahoo! Hadoop blog
(http://developer.yahoo.net/blogs/hadoop/)
• Cloudera
(http://www.cloudera.com/)
40
Q & A
41
Thank You

More Related Content

What's hot

Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
Abbas Maazallahi
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
Asis Mohanty
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
Tomy Rhymond
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
Ahmed Salman
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
Mahmoud Yassin
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
Apache Apex
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
Roman Nikitchenko
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
Savvycom Savvycom
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh
 
BIG DATA
BIG DATABIG DATA
BIG DATA
Shashank Shetty
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
hybrid cloud
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
Sonal Tiwari
 

What's hot (20)

Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 

Similar to Hadoop and big data

Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupCsaba Toth
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at mylife
responseteam
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
Kunal Khanna
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Atul Kushwaha
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Mr. Ankit
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
Csaba Toth
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
Abhishek Roy
 
Big Data
Big DataBig Data
Big Data
Mahesh Bmn
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
Cloudera, Inc.
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and Mapreduce
Krishna Sangeeth KS
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keownCisco Canada
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
Cisco Canada
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
Bhavya Gulati
 
Big data
Big dataBig data
Big data
revathireddyb
 
Big data
Big dataBig data
Big data
revathireddyb
 

Similar to Hadoop and big data (20)

Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User Group
 
Hadoop by sunitha
Hadoop by sunithaHadoop by sunitha
Hadoop by sunitha
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at mylife
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Big Data
Big DataBig Data
Big Data
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and Mapreduce
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Hadoop
HadoopHadoop
Hadoop
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 

More from Yukti Kaura

Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedin
Yukti Kaura
 
Cloud computing saas
Cloud computing   saasCloud computing   saas
Cloud computing saas
Yukti Kaura
 
Cloud computing - Basics and Beyond
Cloud computing - Basics and BeyondCloud computing - Basics and Beyond
Cloud computing - Basics and Beyond
Yukti Kaura
 
NodeJS ecosystem
NodeJS ecosystemNodeJS ecosystem
NodeJS ecosystem
Yukti Kaura
 
Web services for Laymen
Web services for LaymenWeb services for Laymen
Web services for Laymen
Yukti Kaura
 
Clean code - Agile Software Craftsmanship
Clean code - Agile Software CraftsmanshipClean code - Agile Software Craftsmanship
Clean code - Agile Software Craftsmanship
Yukti Kaura
 
Maven overview
Maven overviewMaven overview
Maven overview
Yukti Kaura
 
Basics of Flex Components, Skinning
Basics of Flex Components, SkinningBasics of Flex Components, Skinning
Basics of Flex Components, Skinning
Yukti Kaura
 

More from Yukti Kaura (9)

Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedin
 
Cloud computing saas
Cloud computing   saasCloud computing   saas
Cloud computing saas
 
Cloud computing - Basics and Beyond
Cloud computing - Basics and BeyondCloud computing - Basics and Beyond
Cloud computing - Basics and Beyond
 
NodeJS ecosystem
NodeJS ecosystemNodeJS ecosystem
NodeJS ecosystem
 
Web services for Laymen
Web services for LaymenWeb services for Laymen
Web services for Laymen
 
Spring batch
Spring batch Spring batch
Spring batch
 
Clean code - Agile Software Craftsmanship
Clean code - Agile Software CraftsmanshipClean code - Agile Software Craftsmanship
Clean code - Agile Software Craftsmanship
 
Maven overview
Maven overviewMaven overview
Maven overview
 
Basics of Flex Components, Skinning
Basics of Flex Components, SkinningBasics of Flex Components, Skinning
Basics of Flex Components, Skinning
 

Recently uploaded

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 

Recently uploaded (20)

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 

Hadoop and big data

  • 1. Big Data and Hadoop Essentials
  • 2. 2 Hadoop Ecosystem Agenda Map Reduce Algorithm Exemplified Hadoop Architecture Brief History in time Why Hadoop? How Big is Big Data? Demo
  • 3. 3 Brief History in time In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but more systems of computers. —Grace Hopper, American Computer Scientist
  • 4. 4 How Big is Big Data?
  • 5. 5 How Big is Big Data?
  • 6. 6 How Big is Big Data?
  • 9. 9 BIG DATA Volume Big Data comes in on large scale. Its on TB and even PB Records, Transaction, Tables , Files Veracity Quality, consistency, reliability and provenance of data Good, bad, undefined, inconsistency, incomplete. Variety Big Data extends structured, including semi- structured and unstructured data of all variety text, log, xml, audio, video, stream, flat files Velocity Data flown continues, time sensitive, streaming flow Batch, Real time, Streams, Historic Challenges in managing Big Data
  • 10. 10 To overcome Big Data challenges Hadoop evolves • Cost Effective – Commodity HW • Big Cluster – (1000 Nodes) --- Provides Storage & Processing • Parallel Processing – Map reduce • Big Storage – Memory per node * no of Nodes / RF • Fail over mechanism – Automatic Failover • Data Distribution • Moving Code to data • Heterogeneous Hardware System (IBM,HP,AIX,Oracle Machine of any memory and CPU configuration) • Scalable
  • 15. 15 Why Hadoop is used for?
  • 16. 16 Stop and Ponder • Is Hadoop an alternative for RDBMS? • Hadoop is not replacing the traditional data systems used for building analytic applications – the RDBMS, EDW and MPP systems – but rather is a complement. & Works fine together with RDBMs. • Hadoop is being used to distill large quantities of data into something more manageable
  • 17. 17 Stop and Ponder • But Don’t we know Coherence to be distributed too? Why Hadoop? Coherence is the market leading In-Memory Data Grid. While Hadoop works fine for large processing operations, i.e. requiring many TB of data, that can be processed in a batch like way, there are use cases where the processing requirements are more real-time and the data volumes are smaller, where Coherence is a better choice than HDFS for storing the data
  • 18. 18 Hadoop vs. RDBMS RDBMS MapReduce Data size Gigabytes Petabytes Access Interactive and batch Batch Structure Fixed schema Unstructured schema Language SQL Procedural (Java, C++, Ruby, etc) Integrity High Low Scaling Nonlinear Linear Updates Read and write Write once, read many times Latency Low High
  • 19. 19 Using Hadoop in Enterprise
  • 20. 20 Hadoop Architecture • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. • Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters. HDFS Map Reduce Hadoop
  • 22. 22 HDFS Architecture(Master-Slave) Secondary Name Node Master Book Keeper Slave(s) Periodic checkpoint Data Block
  • 23. 23 The CORE CLIENT Data Analytics Jobs Map Reduce Data Storage Jobs HDFS MASTER SLAVE = HDFS
  • 25. 25 MAP REDUCE Algorithm exemplified! Calculate the yearly average per state.
  • 26. 26 Group the city average temperatures by state 1
  • 27. 27 We don’t really care about the city names, so we will discard those and keep only the state names and cities Temperatures. 2
  • 28. 28 3 We’re going to get a list of temperatures averages for each state.
  • 29. 29 That was Map/Reduce! 4 All we have to do is to calculate the average temperature for each state.
  • 30. 30 Let’s do it again… • Map/Reduce has 3 stages : Map/Shuffle/Reduce • The Shuffle part is done automatically by Hadoop, you just need to implement the Map and Reduce parts. • You get input data as <Key,Value> for the Map part. • In this example, the Key is the City name, and the Value is the set of attributes : State and City yearly average temperature.
  • 31. 31 • Since you want to regroup your temperatures by state, you’re going to get rid of the city name, and the State will become the Key, while the Temperature will become the Value.
  • 32. 32 Shuffle • Now, the shuffle task will run on the output of the Map task. It is going to group all the values by Key, and you’ll get a List<Value>
  • 33. 33 Reduce • The Reduce task is the one that does the logic on the data, in our case this is the calculation of the State yearly average temperature. • And that’s what we will get as final output
  • 36. 36 Pig and HIVE in the Hadoop Ecosystem
  • 39. 39 References • http://hadoop.apache.org/ • http://hadoop.apache.org/hive/ • Hadoop in Action (http://www.manning.com/lam/) • Definitive Guide to Hadoop, 2nd ed. (http://oreilly.com/catalog/0636920010388) • Yahoo! Hadoop blog (http://developer.yahoo.net/blogs/hadoop/) • Cloudera (http://www.cloudera.com/)