SlideShare a Scribd company logo
1 of 16
Apache Hadoop 0.23
What it takes and what it means…
Page 1
Arun C. Murthy
Founder/Architect, Hortonworks
@acmurthy (@hortonworks)
Hello! I’m Arun
Page 2
• Founder/Architect at Hortonworks Inc.
– Formerly, Architect Hadoop MapReduce, Yahoo
– Responsible for running Hadoop MR as a service for all of Yahoo (50k nodes
footprint)
– Yes, I took the 3am calls! 
• Apache Hadoop, ASF
– VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC)
– Long-term Committer/PMC member (full time ~6 years)
– Release Manager - hadoop-0.23
Releases so far…
Page 3
• Started for Nutch… Yahoo picked it up in early 2006, hired Doug Cutting
• Initially, we did monthly releases (0.1, 0.2 …)
• Quarterly after hadoop-0.15 until hadoop-0.20 in 04/2009…
• hadoop-0.20 is still the basis of all current, stable, Hadoop distributions
– Apache Hadoop 0.20.2xx
– CDH3.*
– HDP1.*
• hadoop-0.20.203 (security) – 05/2011
• hadoop-0.20.205 (security + append -> hbase) – 10/2011
2006 2009 2012
hadoop-0.1.0 hadoop-0.10.0 hadoop-0.20.0 hadoop-0.23.0hadoop-0.20.205
hadoop-0.23
Page 4
• First stable release off Apache Hadoop trunk in over 30 months…
• Currently alpha (hadoop-0.23.0) is under voting by the Hadoop PMC
• Significant major features
• Several, several enhancements
HDFS - Federation
Page 5
• Significant scaling…
• Separation of Namespace mgmt and Block mgmt
• Suresh Srinivas (Hortonworks) – Wed 11am
MapReduce - YARN
Page 6
• NextGen Hadoop Data Processing Framework
• Support MR and other paradigms
• Mahadev Konar (Hortonworks) – Tue 4.30pm
Resource
Manager
Client
MapReduce Status
Job Submission
Client
Node
Manager
Container Container
Node
Manager
App Mstr Container
Node
Manager
Container App Mstr
Node Status
Resource Request
Performance
Page 7
• 2x+ across the board
• HDFS read/write
– CRC32
– fadvise
– Shortcut for local reads
• MapReduce
– Unlock lots of improvements from Terasort record (Owen/Arun, 2009)
– Shuffle 30%+
– Small Jobs – Uber AM
• Todd Lipcon (Cloudera) – Wed 10am
HDFS NameNode HA
Page 8
• The famous SPOF
• https://issues.apache.org/jira/browse/HDFS-1623
• Well on the way to fix in hadoop-0.23.½
• Suresh Srinivas (Hortonworks), Aaron Myers (Cloudera) – Tue 2.15pm
More…
Page 9
• HDFS Write pipeline improvements for Hbase
– Append/flush etc.
• Build - Full Mavenization
• EditLogs re-write
– https://issues.apache.org/jira/browse/HDFS-1073
• Tonnes more …
Deployment goals
Page 10
• Clusters of 6,000 machines
– Each machine with 16+ cores, 48G/96G RAM, 24TB/36TB disks
– 200+ PB (raw) per cluster
– 100,000+ concurrent tasks
– 10,000 concurrent jobs
• Yahoo: 50,000+ machines
What does it take to get there?
Page 11
• Testing, *lots* of it
• Benchmarks – At least as good as the last one
• Integration testing
– HBase
– Pig
– Hive
– Oozie
• Deployment discipline
Testing
Page 12
• Why is it hard?
– MapReduce is, effectively, very wide api
– Add Streaming
– Add Pipes
– Oh, Pig/Hive etc. etc.
• Functional tests
– Nightly
– Nearly 1000 functional tests for MapReduce alone
– Several hundred for Pig/Hive etc.
• Scale tests
– Simulation
• Longevity tests
• Stress tests
Benchmarks
Page 13
• Benchmark every part of the HDFS & MR pipeline
– HDFS read/write throughput
– NN operations
– Scan, Shuffle, Sort
• GridMixv3
– Run production traces in test clusters
– Thousands of jobs
– Stress mode v/s Replay mode
Integration Testing
Page 14
• Several projects in the ecosystem
– HBase
– Pig
– Hive
– Oozie
• Cycle
– Functional
– Scale
– Rinse, repeat
Deployment
Page 15
• Alpha/Test (early UAT)
– Starting Nov, 2011
– Small scale (500-800 nodes)
• Alpha
– Jan, 2012
– Majority of users
– 2000 nodes per cluster, > 10,000 nodes in all
• Beta
– Misnomer: 100s of PB, Millions of user applications
– Significantly wide variety of applications and load
– 4000+ nodes per cluster, > 20000 nodes in all
– Late Q1, 2012
• Production
– Well, it’s production
– Mid-to-late Q2 2012
Questions?
Page 16
Thank You.
@acmurthy
Release Candidate:
http://people.apache.org/~acmurthy/hadoop-0.23.0-rc2
Release Documentation:
http://people.apache.org/~acmurthy/hadoop-0.23

More Related Content

What's hot

Hortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks
 
What can-be-done-around-mesos
What can-be-done-around-mesosWhat can-be-done-around-mesos
What can-be-done-around-mesosZhou Weitao
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Mark Kerzner
 
Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Gruter
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Cosmin Lehene
 
Digital Library Collection Management using HBase
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBaseHBaseCon
 
Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!DataWorks Summit
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Kai Sasaki
 
Hadoop hbase introduction
Hadoop hbase introductionHadoop hbase introduction
Hadoop hbase introductionJakub Stransky
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystemAndrew Brust
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme MakeoverHBaseCon
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan
 

What's hot (19)

Hortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks HBase Meetup Presentation
Hortonworks HBase Meetup Presentation
 
What can-be-done-around-mesos
What can-be-done-around-mesosWhat can-be-done-around-mesos
What can-be-done-around-mesos
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
 
Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Apache Tajo - BWC 2014
Apache Tajo - BWC 2014
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015
 
Digital Library Collection Management using HBase
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBase
 
Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Hadoop description
Hadoop descriptionHadoop description
Hadoop description
 
Hadoop hbase introduction
Hadoop hbase introductionHadoop hbase introduction
Hadoop hbase introduction
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
Hadoop
HadoopHadoop
Hadoop
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
 

Viewers also liked

Comissió de govern 7 de setembre de 2011
Comissió de govern 7 de setembre de 2011Comissió de govern 7 de setembre de 2011
Comissió de govern 7 de setembre de 2011Ajuntament de Barcelona
 
Mobile marketing final
Mobile marketing finalMobile marketing final
Mobile marketing finalduyhien12
 
How did you use new media technologies
How did you use new media technologiesHow did you use new media technologies
How did you use new media technologiesmorocco1
 
Flash Iphone Fitc 2010
Flash Iphone Fitc 2010Flash Iphone Fitc 2010
Flash Iphone Fitc 2010Yagiz Gurgul
 
Căn hộ sunview 3 trung tâm gò vấp chỉ 614 tr căn lh 0989.707.653
Căn hộ sunview 3 trung tâm gò vấp chỉ 614 tr căn lh 0989.707.653Căn hộ sunview 3 trung tâm gò vấp chỉ 614 tr căn lh 0989.707.653
Căn hộ sunview 3 trung tâm gò vấp chỉ 614 tr căn lh 0989.707.653kimthoa3124
 
San pham hoc sinh nhom 1
San pham hoc sinh nhom 1San pham hoc sinh nhom 1
San pham hoc sinh nhom 1hiepmap_a2
 
Año de la Fe 2012 - 2013 Credo I / II
Año de la Fe 2012 - 2013 Credo I / IIAño de la Fe 2012 - 2013 Credo I / II
Año de la Fe 2012 - 2013 Credo I / IIGiovanni Dutan
 
Mua mũ cano, ca nô ở đâu rẻ, chất lượng tốt
Mua mũ cano, ca nô ở đâu rẻ, chất lượng tốtMua mũ cano, ca nô ở đâu rẻ, chất lượng tốt
Mua mũ cano, ca nô ở đâu rẻ, chất lượng tốttruonghoc2
 
George Toma Mucibabici - Parteneriatul public-privat poate salva turismul in ...
George Toma Mucibabici - Parteneriatul public-privat poate salva turismul in ...George Toma Mucibabici - Parteneriatul public-privat poate salva turismul in ...
George Toma Mucibabici - Parteneriatul public-privat poate salva turismul in ...Business Days
 

Viewers also liked (20)

Comissió de govern 7 de setembre de 2011
Comissió de govern 7 de setembre de 2011Comissió de govern 7 de setembre de 2011
Comissió de govern 7 de setembre de 2011
 
Redneckcrossing
RedneckcrossingRedneckcrossing
Redneckcrossing
 
Mobile marketing final
Mobile marketing finalMobile marketing final
Mobile marketing final
 
Caricatura
CaricaturaCaricatura
Caricatura
 
12021
1202112021
12021
 
How did you use new media technologies
How did you use new media technologiesHow did you use new media technologies
How did you use new media technologies
 
Flash Iphone Fitc 2010
Flash Iphone Fitc 2010Flash Iphone Fitc 2010
Flash Iphone Fitc 2010
 
AVON
AVONAVON
AVON
 
Căn hộ sunview 3 trung tâm gò vấp chỉ 614 tr căn lh 0989.707.653
Căn hộ sunview 3 trung tâm gò vấp chỉ 614 tr căn lh 0989.707.653Căn hộ sunview 3 trung tâm gò vấp chỉ 614 tr căn lh 0989.707.653
Căn hộ sunview 3 trung tâm gò vấp chỉ 614 tr căn lh 0989.707.653
 
San pham hoc sinh nhom 1
San pham hoc sinh nhom 1San pham hoc sinh nhom 1
San pham hoc sinh nhom 1
 
Linq
LinqLinq
Linq
 
Año de la Fe 2012 - 2013 Credo I / II
Año de la Fe 2012 - 2013 Credo I / IIAño de la Fe 2012 - 2013 Credo I / II
Año de la Fe 2012 - 2013 Credo I / II
 
Mua mũ cano, ca nô ở đâu rẻ, chất lượng tốt
Mua mũ cano, ca nô ở đâu rẻ, chất lượng tốtMua mũ cano, ca nô ở đâu rẻ, chất lượng tốt
Mua mũ cano, ca nô ở đâu rẻ, chất lượng tốt
 
ke
keke
ke
 
469 - La Baule-France
469 - La Baule-France469 - La Baule-France
469 - La Baule-France
 
Art And Mental Health
Art And Mental HealthArt And Mental Health
Art And Mental Health
 
LINQ
LINQLINQ
LINQ
 
George Toma Mucibabici - Parteneriatul public-privat poate salva turismul in ...
George Toma Mucibabici - Parteneriatul public-privat poate salva turismul in ...George Toma Mucibabici - Parteneriatul public-privat poate salva turismul in ...
George Toma Mucibabici - Parteneriatul public-privat poate salva turismul in ...
 
Cyberbullying Pp Bt28th
Cyberbullying Pp Bt28thCyberbullying Pp Bt28th
Cyberbullying Pp Bt28th
 
Cyberbullying
CyberbullyingCyberbullying
Cyberbullying
 

Similar to 4apachehadoop 0-23hadoopworld2011-111110151810-phpapp02

Apache Hadoop 0.23
Apache Hadoop 0.23Apache Hadoop 0.23
Apache Hadoop 0.23Hortonworks
 
Apache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's UpcomingApache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's Upcominghuguk
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxraghavanand36
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG
 
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerGunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerhdhappy001
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureVinod Kumar Vavilapalli
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureVinod Kumar Vavilapalli
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureDataWorks Summit
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Chris Mattmann
 

Similar to 4apachehadoop 0-23hadoopworld2011-111110151810-phpapp02 (20)

Apache Hadoop 0.23
Apache Hadoop 0.23Apache Hadoop 0.23
Apache Hadoop 0.23
 
Apache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's UpcomingApache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's Upcoming
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
Apache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other VersionsApache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other Versions
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerGunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Hadoop Summit 2010 Keynote
Hadoop Summit 2010 KeynoteHadoop Summit 2010 Keynote
Hadoop Summit 2010 Keynote
 
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
 
Hadoop and friends
Hadoop and friendsHadoop and friends
Hadoop and friends
 

More from Nitish Bhardwaj (20)

Doc document
Doc documentDoc document
Doc document
 
Pptx present
Pptx presentPptx present
Pptx present
 
Pp1t
Pp1tPp1t
Pp1t
 
Pdf info
Pdf infoPdf info
Pdf info
 
Pdf docu
Pdf docuPdf docu
Pdf docu
 
Hadoop 130419075715-phpapp02(1)
Hadoop 130419075715-phpapp02(1)Hadoop 130419075715-phpapp02(1)
Hadoop 130419075715-phpapp02(1)
 
Drive present
Drive presentDrive present
Drive present
 
Docx document
Docx documentDocx document
Docx document
 
Doc1x
Doc1xDoc1x
Doc1x
 
Doc document
Doc documentDoc document
Doc document
 
HAdoop presentation
HAdoop presentationHAdoop presentation
HAdoop presentation
 
Adaptivemagicbrekercmw2014 final-140901211811-phpapp01
Adaptivemagicbrekercmw2014 final-140901211811-phpapp01Adaptivemagicbrekercmw2014 final-140901211811-phpapp01
Adaptivemagicbrekercmw2014 final-140901211811-phpapp01
 
1
11
1
 
1
11
1
 
how
howhow
how
 
Check for it
Check for itCheck for it
Check for it
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation
 
Moon
MoonMoon
Moon
 
Heisenberg
HeisenbergHeisenberg
Heisenberg
 
Frankunderwoodslessonsslideshare 140825070130-phpapp02
Frankunderwoodslessonsslideshare 140825070130-phpapp02Frankunderwoodslessonsslideshare 140825070130-phpapp02
Frankunderwoodslessonsslideshare 140825070130-phpapp02
 

4apachehadoop 0-23hadoopworld2011-111110151810-phpapp02

  • 1. Apache Hadoop 0.23 What it takes and what it means… Page 1 Arun C. Murthy Founder/Architect, Hortonworks @acmurthy (@hortonworks)
  • 2. Hello! I’m Arun Page 2 • Founder/Architect at Hortonworks Inc. – Formerly, Architect Hadoop MapReduce, Yahoo – Responsible for running Hadoop MR as a service for all of Yahoo (50k nodes footprint) – Yes, I took the 3am calls!  • Apache Hadoop, ASF – VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC) – Long-term Committer/PMC member (full time ~6 years) – Release Manager - hadoop-0.23
  • 3. Releases so far… Page 3 • Started for Nutch… Yahoo picked it up in early 2006, hired Doug Cutting • Initially, we did monthly releases (0.1, 0.2 …) • Quarterly after hadoop-0.15 until hadoop-0.20 in 04/2009… • hadoop-0.20 is still the basis of all current, stable, Hadoop distributions – Apache Hadoop 0.20.2xx – CDH3.* – HDP1.* • hadoop-0.20.203 (security) – 05/2011 • hadoop-0.20.205 (security + append -> hbase) – 10/2011 2006 2009 2012 hadoop-0.1.0 hadoop-0.10.0 hadoop-0.20.0 hadoop-0.23.0hadoop-0.20.205
  • 4. hadoop-0.23 Page 4 • First stable release off Apache Hadoop trunk in over 30 months… • Currently alpha (hadoop-0.23.0) is under voting by the Hadoop PMC • Significant major features • Several, several enhancements
  • 5. HDFS - Federation Page 5 • Significant scaling… • Separation of Namespace mgmt and Block mgmt • Suresh Srinivas (Hortonworks) – Wed 11am
  • 6. MapReduce - YARN Page 6 • NextGen Hadoop Data Processing Framework • Support MR and other paradigms • Mahadev Konar (Hortonworks) – Tue 4.30pm Resource Manager Client MapReduce Status Job Submission Client Node Manager Container Container Node Manager App Mstr Container Node Manager Container App Mstr Node Status Resource Request
  • 7. Performance Page 7 • 2x+ across the board • HDFS read/write – CRC32 – fadvise – Shortcut for local reads • MapReduce – Unlock lots of improvements from Terasort record (Owen/Arun, 2009) – Shuffle 30%+ – Small Jobs – Uber AM • Todd Lipcon (Cloudera) – Wed 10am
  • 8. HDFS NameNode HA Page 8 • The famous SPOF • https://issues.apache.org/jira/browse/HDFS-1623 • Well on the way to fix in hadoop-0.23.½ • Suresh Srinivas (Hortonworks), Aaron Myers (Cloudera) – Tue 2.15pm
  • 9. More… Page 9 • HDFS Write pipeline improvements for Hbase – Append/flush etc. • Build - Full Mavenization • EditLogs re-write – https://issues.apache.org/jira/browse/HDFS-1073 • Tonnes more …
  • 10. Deployment goals Page 10 • Clusters of 6,000 machines – Each machine with 16+ cores, 48G/96G RAM, 24TB/36TB disks – 200+ PB (raw) per cluster – 100,000+ concurrent tasks – 10,000 concurrent jobs • Yahoo: 50,000+ machines
  • 11. What does it take to get there? Page 11 • Testing, *lots* of it • Benchmarks – At least as good as the last one • Integration testing – HBase – Pig – Hive – Oozie • Deployment discipline
  • 12. Testing Page 12 • Why is it hard? – MapReduce is, effectively, very wide api – Add Streaming – Add Pipes – Oh, Pig/Hive etc. etc. • Functional tests – Nightly – Nearly 1000 functional tests for MapReduce alone – Several hundred for Pig/Hive etc. • Scale tests – Simulation • Longevity tests • Stress tests
  • 13. Benchmarks Page 13 • Benchmark every part of the HDFS & MR pipeline – HDFS read/write throughput – NN operations – Scan, Shuffle, Sort • GridMixv3 – Run production traces in test clusters – Thousands of jobs – Stress mode v/s Replay mode
  • 14. Integration Testing Page 14 • Several projects in the ecosystem – HBase – Pig – Hive – Oozie • Cycle – Functional – Scale – Rinse, repeat
  • 15. Deployment Page 15 • Alpha/Test (early UAT) – Starting Nov, 2011 – Small scale (500-800 nodes) • Alpha – Jan, 2012 – Majority of users – 2000 nodes per cluster, > 10,000 nodes in all • Beta – Misnomer: 100s of PB, Millions of user applications – Significantly wide variety of applications and load – 4000+ nodes per cluster, > 20000 nodes in all – Late Q1, 2012 • Production – Well, it’s production – Mid-to-late Q2 2012
  • 16. Questions? Page 16 Thank You. @acmurthy Release Candidate: http://people.apache.org/~acmurthy/hadoop-0.23.0-rc2 Release Documentation: http://people.apache.org/~acmurthy/hadoop-0.23