SlideShare a Scribd company logo
1 of 16
Apache Hadoop 0.23
What it takes and what it means…
Page 1
Arun C. Murthy
Founder/Architect, Hortonworks
@acmurthy (@hortonworks)
Hello! I’m Arun
Page 2
• Founder/Architect at Hortonworks Inc.
– Formerly, Architect Hadoop MapReduce, Yahoo
– Responsible for running Hadoop MR as a service for all of Yahoo (50k nodes
footprint)
– Yes, I took the 3am calls! 
• Apache Hadoop, ASF
– VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC)
– Long-term Committer/PMC member (full time ~6 years)
– Release Manager - hadoop-0.23
Releases so far…
Page 3
• Started for Nutch… Yahoo picked it up in early 2006, hired Doug Cutting
• Initially, we did monthly releases (0.1, 0.2 …)
• Quarterly after hadoop-0.15 until hadoop-0.20 in 04/2009…
• hadoop-0.20 is still the basis of all current, stable, Hadoop distributions
– Apache Hadoop 0.20.2xx
– CDH3.*
– HDP1.*
• hadoop-0.20.203 (security) – 05/2011
• hadoop-0.20.205 (security + append -> hbase) – 10/2011
2006 2009 2012
hadoop-0.1.0 hadoop-0.10.0 hadoop-0.20.0 hadoop-0.23.0hadoop-0.20.205
hadoop-0.23
Page 4
• First stable release off Apache Hadoop trunk in over 30 months…
• Currently alpha (hadoop-0.23.0) is under voting by the Hadoop PMC
• Significant major features
• Several, several enhancements
HDFS - Federation
Page 5
• Significant scaling…
• Separation of Namespace mgmt and Block mgmt
• Suresh Srinivas (Hortonworks) – Wed 11am
MapReduce - YARN
Page 6
• NextGen Hadoop Data Processing Framework
• Support MR and other paradigms
• Mahadev Konar (Hortonworks) – Tue 4.30pm
Resource
Manager
Client
MapReduce Status
Job Submission
Client
Node
Manager
Container Container
Node
Manager
App Mstr Container
Node
Manager
Container App Mstr
Node Status
Resource Request
Performance
Page 7
• 2x+ across the board
• HDFS read/write
– CRC32
– fadvise
– Shortcut for local reads
• MapReduce
– Unlock lots of improvements from Terasort record (Owen/Arun, 2009)
– Shuffle 30%+
– Small Jobs – Uber AM
• Todd Lipcon (Cloudera) – Wed 10am
HDFS NameNode HA
Page 8
• The famous SPOF
• https://issues.apache.org/jira/browse/HDFS-1623
• Well on the way to fix in hadoop-0.23.½
• Suresh Srinivas (Hortonworks), Aaron Myers (Cloudera) – Tue 2.15pm
More…
Page 9
• HDFS Write pipeline improvements for Hbase
– Append/flush etc.
• Build - Full Mavenization
• EditLogs re-write
– https://issues.apache.org/jira/browse/HDFS-1073
• Tonnes more …
Deployment goals
Page 10
• Clusters of 6,000 machines
– Each machine with 16+ cores, 48G/96G RAM, 24TB/36TB disks
– 200+ PB (raw) per cluster
– 100,000+ concurrent tasks
– 10,000 concurrent jobs
• Yahoo: 50,000+ machines
What does it take to get there?
Page 11
• Testing, *lots* of it
• Benchmarks – At least as good as the last one
• Integration testing
– HBase
– Pig
– Hive
– Oozie
• Deployment discipline
Testing
Page 12
• Why is it hard?
– MapReduce is, effectively, very wide api
– Add Streaming
– Add Pipes
– Oh, Pig/Hive etc. etc.
• Functional tests
– Nightly
– Nearly 1000 functional tests for MapReduce alone
– Several hundred for Pig/Hive etc.
• Scale tests
– Simulation
• Longevity tests
• Stress tests
Benchmarks
Page 13
• Benchmark every part of the HDFS & MR pipeline
– HDFS read/write throughput
– NN operations
– Scan, Shuffle, Sort
• GridMixv3
– Run production traces in test clusters
– Thousands of jobs
– Stress mode v/s Replay mode
Integration Testing
Page 14
• Several projects in the ecosystem
– HBase
– Pig
– Hive
– Oozie
• Cycle
– Functional
– Scale
– Rinse, repeat
Deployment
Page 15
• Alpha/Test (early UAT)
– Starting Nov, 2011
– Small scale (500-800 nodes)
• Alpha
– Jan, 2012
– Majority of users
– 2000 nodes per cluster, > 10,000 nodes in all
• Beta
– Misnomer: 100s of PB, Millions of user applications
– Significantly wide variety of applications and load
– 4000+ nodes per cluster, > 20000 nodes in all
– Late Q1, 2012
• Production
– Well, it’s production
– Mid-to-late Q2 2012
Questions?
Page 16
Thank You.
@acmurthy
Release Candidate:
http://people.apache.org/~acmurthy/hadoop-0.23.0-rc2
Release Documentation:
http://people.apache.org/~acmurthy/hadoop-0.23

More Related Content

What's hot

Hortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks
 
What can-be-done-around-mesos
What can-be-done-around-mesosWhat can-be-done-around-mesos
What can-be-done-around-mesosZhou Weitao
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Mark Kerzner
 
Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Gruter
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Cosmin Lehene
 
Digital Library Collection Management using HBase
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBaseHBaseCon
 
Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!DataWorks Summit
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Kai Sasaki
 
Hadoop hbase introduction
Hadoop hbase introductionHadoop hbase introduction
Hadoop hbase introductionJakub Stransky
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystemAndrew Brust
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme MakeoverHBaseCon
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan
 

What's hot (19)

Hortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks HBase Meetup Presentation
Hortonworks HBase Meetup Presentation
 
What can-be-done-around-mesos
What can-be-done-around-mesosWhat can-be-done-around-mesos
What can-be-done-around-mesos
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
 
Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Apache Tajo - BWC 2014
Apache Tajo - BWC 2014
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015
 
Digital Library Collection Management using HBase
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBase
 
Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Hadoop description
Hadoop descriptionHadoop description
Hadoop description
 
Hadoop hbase introduction
Hadoop hbase introductionHadoop hbase introduction
Hadoop hbase introduction
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
Hadoop
HadoopHadoop
Hadoop
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
 

Viewers also liked

земля героев доклад
земля героев докладземля героев доклад
земля героев докладcenter-si
 
портфолио Центра СИ
портфолио Центра СИпортфолио Центра СИ
портфолио Центра СИcenter-si
 
соц.опрос
соц.опроссоц.опрос
соц.опросcenter-si
 
доклад города воинской славы
доклад города воинской славыдоклад города воинской славы
доклад города воинской славыcenter-si
 
Journal TSM n°3
Journal TSM n°3Journal TSM n°3
Journal TSM n°3tsmassy
 
Communiqué de presse : le Trophée de la Parisienne
Communiqué de presse : le Trophée de la Parisienne Communiqué de presse : le Trophée de la Parisienne
Communiqué de presse : le Trophée de la Parisienne Montaigne Golf Communication
 
adrianalimabikinibodyworkoutdietplan-140813113633-phpapp01 (1)
adrianalimabikinibodyworkoutdietplan-140813113633-phpapp01 (1)adrianalimabikinibodyworkoutdietplan-140813113633-phpapp01 (1)
adrianalimabikinibodyworkoutdietplan-140813113633-phpapp01 (1)gr0Jmmd7Q_qarobo gr0Jmmd7Q_qarobo
 
Narayanicollegeforprofessionalstudies2 131219231751-phpapp01
Narayanicollegeforprofessionalstudies2 131219231751-phpapp01Narayanicollegeforprofessionalstudies2 131219231751-phpapp01
Narayanicollegeforprofessionalstudies2 131219231751-phpapp01archana sawant
 
ATMA Session 2. discovering the root
ATMA Session 2. discovering the root ATMA Session 2. discovering the root
ATMA Session 2. discovering the root Deepak Chandani
 
Composantes psychologiques des troubles sphinctériens
Composantes psychologiques des troubles sphinctériensComposantes psychologiques des troubles sphinctériens
Composantes psychologiques des troubles sphinctériensFrançoise Soros
 
Process Efficiency Tour 2012 - France
Process Efficiency Tour 2012 - FranceProcess Efficiency Tour 2012 - France
Process Efficiency Tour 2012 - FranceBonitasoft
 
Introducción a la historia y las Ciencias Sociales (I Medio)
Introducción a la historia y las Ciencias Sociales (I Medio)Introducción a la historia y las Ciencias Sociales (I Medio)
Introducción a la historia y las Ciencias Sociales (I Medio)Ignacio Muñoz Muñoz
 
BonitaSoft at Solutions Linux 2010
BonitaSoft  at Solutions Linux 2010BonitaSoft  at Solutions Linux 2010
BonitaSoft at Solutions Linux 2010Bonitasoft
 
Bonitasoft - Process Efficiency World Tour 2013 - Paris
Bonitasoft - Process Efficiency World Tour 2013 - ParisBonitasoft - Process Efficiency World Tour 2013 - Paris
Bonitasoft - Process Efficiency World Tour 2013 - ParisBonitasoft
 

Viewers also liked (19)

земля героев доклад
земля героев докладземля героев доклад
земля героев доклад
 
CSI
CSICSI
CSI
 
портфолио Центра СИ
портфолио Центра СИпортфолио Центра СИ
портфолио Центра СИ
 
соц.опрос
соц.опроссоц.опрос
соц.опрос
 
доклад города воинской славы
доклад города воинской славыдоклад города воинской славы
доклад города воинской славы
 
1. introduction to ATMA
1. introduction to ATMA1. introduction to ATMA
1. introduction to ATMA
 
Journal TSM n°3
Journal TSM n°3Journal TSM n°3
Journal TSM n°3
 
Communiqué de presse : le Trophée de la Parisienne
Communiqué de presse : le Trophée de la Parisienne Communiqué de presse : le Trophée de la Parisienne
Communiqué de presse : le Trophée de la Parisienne
 
adrianalimabikinibodyworkoutdietplan-140813113633-phpapp01 (1)
adrianalimabikinibodyworkoutdietplan-140813113633-phpapp01 (1)adrianalimabikinibodyworkoutdietplan-140813113633-phpapp01 (1)
adrianalimabikinibodyworkoutdietplan-140813113633-phpapp01 (1)
 
Narayanicollegeforprofessionalstudies2 131219231751-phpapp01
Narayanicollegeforprofessionalstudies2 131219231751-phpapp01Narayanicollegeforprofessionalstudies2 131219231751-phpapp01
Narayanicollegeforprofessionalstudies2 131219231751-phpapp01
 
Enuresie
EnuresieEnuresie
Enuresie
 
News 3
News 3News 3
News 3
 
ATMA Session 2. discovering the root
ATMA Session 2. discovering the root ATMA Session 2. discovering the root
ATMA Session 2. discovering the root
 
Composantes psychologiques des troubles sphinctériens
Composantes psychologiques des troubles sphinctériensComposantes psychologiques des troubles sphinctériens
Composantes psychologiques des troubles sphinctériens
 
Process Efficiency Tour 2012 - France
Process Efficiency Tour 2012 - FranceProcess Efficiency Tour 2012 - France
Process Efficiency Tour 2012 - France
 
Introducción a la historia y las Ciencias Sociales (I Medio)
Introducción a la historia y las Ciencias Sociales (I Medio)Introducción a la historia y las Ciencias Sociales (I Medio)
Introducción a la historia y las Ciencias Sociales (I Medio)
 
La grèce
La grèceLa grèce
La grèce
 
BonitaSoft at Solutions Linux 2010
BonitaSoft  at Solutions Linux 2010BonitaSoft  at Solutions Linux 2010
BonitaSoft at Solutions Linux 2010
 
Bonitasoft - Process Efficiency World Tour 2013 - Paris
Bonitasoft - Process Efficiency World Tour 2013 - ParisBonitasoft - Process Efficiency World Tour 2013 - Paris
Bonitasoft - Process Efficiency World Tour 2013 - Paris
 

Similar to 4apachehadoop-0-23hadoopworld2011-111110151810-phpapp02

Apache Hadoop 0.23
Apache Hadoop 0.23Apache Hadoop 0.23
Apache Hadoop 0.23Hortonworks
 
Apache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's UpcomingApache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's Upcominghuguk
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxraghavanand36
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environmentDelhi/NCR HUG
 
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerGunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerhdhappy001
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureVinod Kumar Vavilapalli
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureVinod Kumar Vavilapalli
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureDataWorks Summit
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Chris Mattmann
 

Similar to 4apachehadoop-0-23hadoopworld2011-111110151810-phpapp02 (20)

Apache Hadoop 0.23
Apache Hadoop 0.23Apache Hadoop 0.23
Apache Hadoop 0.23
 
Apache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's UpcomingApache HBase: Where We've Been and What's Upcoming
Apache HBase: Where We've Been and What's Upcoming
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
Apache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other VersionsApache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other Versions
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stingerGunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Hadoop Summit 2010 Keynote
Hadoop Summit 2010 KeynoteHadoop Summit 2010 Keynote
Hadoop Summit 2010 Keynote
 
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
 
Hadoop and friends
Hadoop and friendsHadoop and friends
Hadoop and friends
 

More from gr0Jmmd7Q_qarobo gr0Jmmd7Q_qarobo (11)

2 page doc
2 page doc2 page doc
2 page doc
 
sample upload file
sample upload filesample upload file
sample upload file
 
2 slide deck
2 slide deck2 slide deck
2 slide deck
 
Do Not Delete
Do Not DeleteDo Not Delete
Do Not Delete
 
exploratory-testing - Read only not hidden
exploratory-testing - Read only not hiddenexploratory-testing - Read only not hidden
exploratory-testing - Read only not hidden
 
getting-started-1-638
getting-started-1-638getting-started-1-638
getting-started-1-638
 
Environment costs & Benifits
Environment costs & BenifitsEnvironment costs & Benifits
Environment costs & Benifits
 
Environment costs & Benifits
Environment costs & BenifitsEnvironment costs & Benifits
Environment costs & Benifits
 
Plain text presentation for slideshare
Plain text presentation for slidesharePlain text presentation for slideshare
Plain text presentation for slideshare
 
4apachehadoop 0-23hadoopworld2011-111110151810-phpapp02
4apachehadoop 0-23hadoopworld2011-111110151810-phpapp024apachehadoop 0-23hadoopworld2011-111110151810-phpapp02
4apachehadoop 0-23hadoopworld2011-111110151810-phpapp02
 
Chinese hans
Chinese hansChinese hans
Chinese hans
 

4apachehadoop-0-23hadoopworld2011-111110151810-phpapp02

  • 1. Apache Hadoop 0.23 What it takes and what it means… Page 1 Arun C. Murthy Founder/Architect, Hortonworks @acmurthy (@hortonworks)
  • 2. Hello! I’m Arun Page 2 • Founder/Architect at Hortonworks Inc. – Formerly, Architect Hadoop MapReduce, Yahoo – Responsible for running Hadoop MR as a service for all of Yahoo (50k nodes footprint) – Yes, I took the 3am calls!  • Apache Hadoop, ASF – VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC) – Long-term Committer/PMC member (full time ~6 years) – Release Manager - hadoop-0.23
  • 3. Releases so far… Page 3 • Started for Nutch… Yahoo picked it up in early 2006, hired Doug Cutting • Initially, we did monthly releases (0.1, 0.2 …) • Quarterly after hadoop-0.15 until hadoop-0.20 in 04/2009… • hadoop-0.20 is still the basis of all current, stable, Hadoop distributions – Apache Hadoop 0.20.2xx – CDH3.* – HDP1.* • hadoop-0.20.203 (security) – 05/2011 • hadoop-0.20.205 (security + append -> hbase) – 10/2011 2006 2009 2012 hadoop-0.1.0 hadoop-0.10.0 hadoop-0.20.0 hadoop-0.23.0hadoop-0.20.205
  • 4. hadoop-0.23 Page 4 • First stable release off Apache Hadoop trunk in over 30 months… • Currently alpha (hadoop-0.23.0) is under voting by the Hadoop PMC • Significant major features • Several, several enhancements
  • 5. HDFS - Federation Page 5 • Significant scaling… • Separation of Namespace mgmt and Block mgmt • Suresh Srinivas (Hortonworks) – Wed 11am
  • 6. MapReduce - YARN Page 6 • NextGen Hadoop Data Processing Framework • Support MR and other paradigms • Mahadev Konar (Hortonworks) – Tue 4.30pm Resource Manager Client MapReduce Status Job Submission Client Node Manager Container Container Node Manager App Mstr Container Node Manager Container App Mstr Node Status Resource Request
  • 7. Performance Page 7 • 2x+ across the board • HDFS read/write – CRC32 – fadvise – Shortcut for local reads • MapReduce – Unlock lots of improvements from Terasort record (Owen/Arun, 2009) – Shuffle 30%+ – Small Jobs – Uber AM • Todd Lipcon (Cloudera) – Wed 10am
  • 8. HDFS NameNode HA Page 8 • The famous SPOF • https://issues.apache.org/jira/browse/HDFS-1623 • Well on the way to fix in hadoop-0.23.½ • Suresh Srinivas (Hortonworks), Aaron Myers (Cloudera) – Tue 2.15pm
  • 9. More… Page 9 • HDFS Write pipeline improvements for Hbase – Append/flush etc. • Build - Full Mavenization • EditLogs re-write – https://issues.apache.org/jira/browse/HDFS-1073 • Tonnes more …
  • 10. Deployment goals Page 10 • Clusters of 6,000 machines – Each machine with 16+ cores, 48G/96G RAM, 24TB/36TB disks – 200+ PB (raw) per cluster – 100,000+ concurrent tasks – 10,000 concurrent jobs • Yahoo: 50,000+ machines
  • 11. What does it take to get there? Page 11 • Testing, *lots* of it • Benchmarks – At least as good as the last one • Integration testing – HBase – Pig – Hive – Oozie • Deployment discipline
  • 12. Testing Page 12 • Why is it hard? – MapReduce is, effectively, very wide api – Add Streaming – Add Pipes – Oh, Pig/Hive etc. etc. • Functional tests – Nightly – Nearly 1000 functional tests for MapReduce alone – Several hundred for Pig/Hive etc. • Scale tests – Simulation • Longevity tests • Stress tests
  • 13. Benchmarks Page 13 • Benchmark every part of the HDFS & MR pipeline – HDFS read/write throughput – NN operations – Scan, Shuffle, Sort • GridMixv3 – Run production traces in test clusters – Thousands of jobs – Stress mode v/s Replay mode
  • 14. Integration Testing Page 14 • Several projects in the ecosystem – HBase – Pig – Hive – Oozie • Cycle – Functional – Scale – Rinse, repeat
  • 15. Deployment Page 15 • Alpha/Test (early UAT) – Starting Nov, 2011 – Small scale (500-800 nodes) • Alpha – Jan, 2012 – Majority of users – 2000 nodes per cluster, > 10,000 nodes in all • Beta – Misnomer: 100s of PB, Millions of user applications – Significantly wide variety of applications and load – 4000+ nodes per cluster, > 20000 nodes in all – Late Q1, 2012 • Production – Well, it’s production – Mid-to-late Q2 2012
  • 16. Questions? Page 16 Thank You. @acmurthy Release Candidate: http://people.apache.org/~acmurthy/hadoop-0.23.0-rc2 Release Documentation: http://people.apache.org/~acmurthy/hadoop-0.23