SlideShare a Scribd company logo
What mapreduce is ?
• Origin from Google (Operating Systems
Design and Implementation 04)
• A sample programming model for data
processing
• For large dataset processing
MapReduce feature
• Parallel
• Run on commodity hardware
• Fault Tolerance
Three phase of MR
• Map
• Shuffle
• Reduce
Example for map
• Let map(k, v) =
•
foreach char c in v:
•
emit(k, c)
• (“A”, “cats”) -> (“A”, “c”), (“A”, “a”),
(“A”, “t”), (“A”, “s”)
Double example
• Let map(k, v) =
•
emit(k.toUpper(), v.toUpper())
• (“foo”, “bar”) -> (“FOO”, “BAR”)
• (“Foo”, “other”) -> (“FOO”, “OTHER”)
Triple example
• Let map(k, v) =
•
if (isPrime(v)) then emit(k, v)
• (“foo”, 7) -> (“foo”, 7)
• (“test”, 10) -> (nothing)
Reduce example
let reduce(k, vals) =
sum = 0
foreach int v in vals:
sum +=
emit(k, sum)
(“A”, [42, 100, 312]) -> (“A”, 454)
(“B”, [12, 6, -2]) -> (“B”, 16)
Interface InputFormat
•
•

Two methods

getSplits
How to split the input data
• getRecordReader
How to read the input data
Caculate the map tasks we need
• Goalsize = Totalsize/mapred.map.tasks
• Mapred.map.tasks(defined in job
configuration ,just a hint)
Reduce number
• 0.95 ? 1.75 ?
• At 0.95 all of the reduces can launch
immediately and start transfering map
outputs as the maps finish.
• At 1.75 the faster nodes will finish their
first round of reduces and launch a
second round of reduces doing a much
better job of load balancing.
What HDFS is ?
• Origin from Google again [SOSP’03]
Symposium on Operating Systems
Principles
• Redundant storage of massive amounts of
data on cheap and unreliable computers
HDFS feature
• Files stored as blocks
• Reliability through replication
• Single master(NN) coordinates
access,metadata
• No data caching
• Familiar interface ,
NN SPOF and failure resistance
• Store metadata in different place
(local disk / share storage)
Secondary NN
Merge edit log with Fsimage
Reduce recovery time
NN HA
Resource & Event
• http://class10e.com/Cloudera/
• http://blog.cloudera.com/blog/
• Hadoop Summit
http://hadoopsummit.org/
• Hadoop World
http://www.hadoopworld.com/
hadoop introduce

More Related Content

What's hot

Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarrayBolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarray
Jen Aman
 
Python grass
Python grassPython grass
Python grass
Margherita Di Leo
 
peRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysispeRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysisVyacheslav Arbuzov
 
Ggmap Packages in R
Ggmap Packages in RGgmap Packages in R
Ggmap Packages in R
Dr. Volkan OBAN
 
7B_3_Matterhorn on the horizon
7B_3_Matterhorn on the horizon7B_3_Matterhorn on the horizon
7B_3_Matterhorn on the horizonGISRUK conference
 
確率的プログラミングライブラリEdward
確率的プログラミングライブラリEdward確率的プログラミングライブラリEdward
確率的プログラミングライブラリEdward
Yuta Kashino
 
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
Zalando adtech lab
 
800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理Tatsuya Sasaki
 
Maximum flow
Maximum flowMaximum flow
Maximum flow
Md. Shafiuzzaman Hira
 
L2 binomial operations
L2 binomial operationsL2 binomial operations
L2 binomial operations
Mohammad Umar Rehman
 
AIP seminar - ARCHES WP6
AIP seminar - ARCHES WP6AIP seminar - ARCHES WP6
AIP seminar - ARCHES WP6
Alexey Mints
 
Minimum cost maximum flow
Minimum cost maximum flowMinimum cost maximum flow
Minimum cost maximum flow
SaruarChowdhury
 
Num Integration
Num IntegrationNum Integration
Num Integrationmuhdisys
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
Yasuo Tabei
 
Max Flow Problem
Max Flow ProblemMax Flow Problem
Max Flow Problem
Guillaume Guérard
 
Open source adobe lightroom like
Open source adobe lightroom likeOpen source adobe lightroom like
Open source adobe lightroom like
Pierre Loic Chevillot
 
OpenTuesday: Neues aus der RRDtool Welt
OpenTuesday: Neues aus der RRDtool WeltOpenTuesday: Neues aus der RRDtool Welt
OpenTuesday: Neues aus der RRDtool Welt
Digicomp Academy AG
 

What's hot (19)

Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarrayBolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarray
 
Python grass
Python grassPython grass
Python grass
 
peRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysispeRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysis
 
Ggmap Packages in R
Ggmap Packages in RGgmap Packages in R
Ggmap Packages in R
 
7B_3_Matterhorn on the horizon
7B_3_Matterhorn on the horizon7B_3_Matterhorn on the horizon
7B_3_Matterhorn on the horizon
 
確率的プログラミングライブラリEdward
確率的プログラミングライブラリEdward確率的プログラミングライブラリEdward
確率的プログラミングライブラリEdward
 
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
 
800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理
 
Raster package jacob
Raster package jacobRaster package jacob
Raster package jacob
 
2
22
2
 
Maximum flow
Maximum flowMaximum flow
Maximum flow
 
L2 binomial operations
L2 binomial operationsL2 binomial operations
L2 binomial operations
 
AIP seminar - ARCHES WP6
AIP seminar - ARCHES WP6AIP seminar - ARCHES WP6
AIP seminar - ARCHES WP6
 
Minimum cost maximum flow
Minimum cost maximum flowMinimum cost maximum flow
Minimum cost maximum flow
 
Num Integration
Num IntegrationNum Integration
Num Integration
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
 
Max Flow Problem
Max Flow ProblemMax Flow Problem
Max Flow Problem
 
Open source adobe lightroom like
Open source adobe lightroom likeOpen source adobe lightroom like
Open source adobe lightroom like
 
OpenTuesday: Neues aus der RRDtool Welt
OpenTuesday: Neues aus der RRDtool WeltOpenTuesday: Neues aus der RRDtool Welt
OpenTuesday: Neues aus der RRDtool Welt
 

Similar to hadoop introduce

Clojure
ClojureClojure
Clojure
Yiguang Hu
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in R
Ilya Zhbannikov
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
mickey24
 
Tools for research plotting
Tools for research plottingTools for research plotting
Tools for research plotting
Nimrita Koul
 
Tools for research plotting
Tools for research plottingTools for research plotting
Tools for research plotting
Nimrita Koul
 
楽々Scalaプログラミング
楽々Scalaプログラミング楽々Scalaプログラミング
楽々ScalaプログラミングTomoharu ASAMI
 
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
Wanbok Choi
 
Programming the cloud with Skywriting
Programming the cloud with SkywritingProgramming the cloud with Skywriting
Programming the cloud with Skywriting
Derek Murray
 
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
Samir Bessalah
 
Hadoop london
Hadoop londonHadoop london
Coq to Rubyによる証明駆動開発@名古屋ruby会議02
Coq to Rubyによる証明駆動開発@名古屋ruby会議02Coq to Rubyによる証明駆動開発@名古屋ruby会議02
Coq to Rubyによる証明駆動開発@名古屋ruby会議02Hiroki Mizuno
 
Polimorfismo cosa?
Polimorfismo cosa?Polimorfismo cosa?
Polimorfismo cosa?
Filippo Vitale
 
A Survey Of R Graphics
A Survey Of R GraphicsA Survey Of R Graphics
A Survey Of R Graphics
Dataspora
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Lucidworks
 
Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streams
confluent
 
Type Checking Scala Spark Datasets: Dataset Transforms
Type Checking Scala Spark Datasets: Dataset TransformsType Checking Scala Spark Datasets: Dataset Transforms
Type Checking Scala Spark Datasets: Dataset Transforms
John Nestor
 
R training5
R training5R training5
R training5
Hellen Gakuruh
 
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Databricks
 

Similar to hadoop introduce (20)

Clojure
ClojureClojure
Clojure
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in R
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
 
Tools for research plotting
Tools for research plottingTools for research plotting
Tools for research plotting
 
Tools for research plotting
Tools for research plottingTools for research plotting
Tools for research plotting
 
楽々Scalaプログラミング
楽々Scalaプログラミング楽々Scalaプログラミング
楽々Scalaプログラミング
 
Lec2
Lec2Lec2
Lec2
 
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
 
Programming the cloud with Skywriting
Programming the cloud with SkywritingProgramming the cloud with Skywriting
Programming the cloud with Skywriting
 
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
Coq to Rubyによる証明駆動開発@名古屋ruby会議02
Coq to Rubyによる証明駆動開発@名古屋ruby会議02Coq to Rubyによる証明駆動開発@名古屋ruby会議02
Coq to Rubyによる証明駆動開発@名古屋ruby会議02
 
Polimorfismo cosa?
Polimorfismo cosa?Polimorfismo cosa?
Polimorfismo cosa?
 
Joclad 2010 d
Joclad 2010 dJoclad 2010 d
Joclad 2010 d
 
A Survey Of R Graphics
A Survey Of R GraphicsA Survey Of R Graphics
A Survey Of R Graphics
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streams
 
Type Checking Scala Spark Datasets: Dataset Transforms
Type Checking Scala Spark Datasets: Dataset TransformsType Checking Scala Spark Datasets: Dataset Transforms
Type Checking Scala Spark Datasets: Dataset Transforms
 
R training5
R training5R training5
R training5
 
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
 

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 

hadoop introduce

  • 1. What mapreduce is ? • Origin from Google (Operating Systems Design and Implementation 04) • A sample programming model for data processing • For large dataset processing
  • 2. MapReduce feature • Parallel • Run on commodity hardware • Fault Tolerance
  • 3. Three phase of MR • Map • Shuffle • Reduce
  • 4. Example for map • Let map(k, v) = • foreach char c in v: • emit(k, c) • (“A”, “cats”) -> (“A”, “c”), (“A”, “a”), (“A”, “t”), (“A”, “s”)
  • 5. Double example • Let map(k, v) = • emit(k.toUpper(), v.toUpper()) • (“foo”, “bar”) -> (“FOO”, “BAR”) • (“Foo”, “other”) -> (“FOO”, “OTHER”)
  • 6. Triple example • Let map(k, v) = • if (isPrime(v)) then emit(k, v) • (“foo”, 7) -> (“foo”, 7) • (“test”, 10) -> (nothing)
  • 7. Reduce example let reduce(k, vals) = sum = 0 foreach int v in vals: sum += emit(k, sum) (“A”, [42, 100, 312]) -> (“A”, 454) (“B”, [12, 6, -2]) -> (“B”, 16)
  • 8. Interface InputFormat • • Two methods getSplits How to split the input data • getRecordReader How to read the input data
  • 9. Caculate the map tasks we need • Goalsize = Totalsize/mapred.map.tasks • Mapred.map.tasks(defined in job configuration ,just a hint)
  • 10.
  • 11.
  • 12. Reduce number • 0.95 ? 1.75 ? • At 0.95 all of the reduces can launch immediately and start transfering map outputs as the maps finish. • At 1.75 the faster nodes will finish their first round of reduces and launch a second round of reduces doing a much better job of load balancing.
  • 13.
  • 14.
  • 15. What HDFS is ? • Origin from Google again [SOSP’03] Symposium on Operating Systems Principles • Redundant storage of massive amounts of data on cheap and unreliable computers
  • 16. HDFS feature • Files stored as blocks • Reliability through replication • Single master(NN) coordinates access,metadata • No data caching • Familiar interface ,
  • 17. NN SPOF and failure resistance • Store metadata in different place (local disk / share storage) Secondary NN Merge edit log with Fsimage Reduce recovery time NN HA
  • 18. Resource & Event • http://class10e.com/Cloudera/ • http://blog.cloudera.com/blog/ • Hadoop Summit http://hadoopsummit.org/ • Hadoop World http://www.hadoopworld.com/