SlideShare a Scribd company logo
1 of 17
The Next Generation of
 Hadoop Map-Reduce
        Sharad Agarwal
     sharadag@yahoo-inc.com
        sharad@apache.org
About Me

   Hadoop Committer and PMC member
   Architect at Yahoo!
Hadoop Map-Reduce Today
   JobTracker
    - Manages cluster resources
      and job scheduling
   TaskTracker
    - Per-node agent
    - Manage tasks
Current Limitations
   Scalability
    - Maximum Cluster size – 4,000 nodes
    - Maximum concurrent tasks – 40,000
    - Coarse synchronization in JobTracker
   Single point of failure
    - Failure kills all queued and running jobs
    - Jobs need to be re-submitted by users
   Restart is very tricky due to complex state
   Hard partition of resources into map and reduce
    slots
Current Limitations

   Lacks support for alternate paradigms
    - Iterative applications implemented using Map-Reduce
      are 10x slower.
    - Example: K-Means, PageRank
   Lack of wire-compatible protocols
    - Client and cluster must be of same version
    - Applications and workflows cannot migrate to
      different clusters
Next Generation Map-Reduce Requirements
   Reliability
   Availability
   Scalability - Clusters of 6,000 machines
    - Each machine with 16 cores, 48G RAM, 24TB disks
    - 100,000 concurrent tasks
    - 10,000 concurrent jobs
   Wire Compatibility
   Agility & Evolution – Ability for customers to
    control upgrades to the grid software stack.
Next Generation Map-Reduce – Design
Centre

   Split up the two major functions of JobTracker
    - Cluster resource management
    - Application life-cycle management
   Map-Reduce becomes user-land library
Architecture
Architecture
   Resource Manager
    - Global resource scheduler
    - Hierarchical queues
   Node Manager
    - Per-machine agent
    - Manages the life-cycle of container
    - Container resource monitoring
   Application Master
    - Per-application
    - Manages application scheduling and task execution
    - E.g. Map-Reduce Application Master
Improvements vis-à-vis current Map-Reduce
     Scalability
      - Application life-cycle management is very
        expensive
      - Partition resource management and application
        life-cycle management
      - Application management is distributed
      - Hardware trends - Currently run clusters of 4,000
        machines
          • 6,000 2012 machines > 12,000 2009 machines
          • <8 cores, 16G, 4TB> v/s <16+ cores, 48/96G, 24TB>
Improvements vis-à-vis current Map-Reduce
     Availability
      - Application Master
          • Optional failover via application-specific checkpoint
          • Map-Reduce applications pick up where they left off
      - Resource Manager
          • No single point of failure - failover via ZooKeeper
          • Application Masters are restarted automatically
Improvements vis-à-vis current Map-Reduce
     Wire Compatibility
      - Protocols are wire-compatible
      - Old clients can talk to new servers
      - Rolling upgrades
Improvements vis-à-vis current Map-Reduce
     Agility / Evolution
      - Map-Reduce now becomes a user-land library
      - Multiple versions of Map-Reduce can run in the
        same cluster (ala Apache Pig)
          • Faster deployment cycles for improvements
      - Customers upgrade Map-Reduce versions on their
        schedule
Improvements vis-à-vis current Map-Reduce
     Utilization
      - Generic resource model
          •   Memory
          •   CPU
          •   Disk b/w
          •   Network b/w
      - Remove fixed partition of map and reduce slots
Improvements vis-à-vis current Map-Reduce
     Support for programming paradigms other
      than Map-Reduce
      - MPI
      - Master-Worker
      - Machine Learning
      - Iterative processing
      - Enabled by allowing use of paradigm-specific
        Application Master
      - Run all on the same Hadoop cluster
Summary
   The next generation of Map-Reduce takes
    Hadoop to the next level
    -   Scale-out even further
    -   High availability
    -   Cluster Utilization
    -   Support for paradigms other than Map-Reduce
Questions?

More Related Content

What's hot

Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance IssuesAntonios Katsarakis
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingOh Chan Kwon
 
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Adam Doyle
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
Migrating to Riak at Shareaholic
Migrating to Riak at ShareaholicMigrating to Riak at Shareaholic
Migrating to Riak at ShareaholicShareaholic
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...DataStax Academy
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCOlga Lavrentieva
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveJoydeep Sen Sarma
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Anti patterns in hadoop cluster deployment
Anti patterns in hadoop cluster deploymentAnti patterns in hadoop cluster deployment
Anti patterns in hadoop cluster deploymentNaganarasimha Garla
 
CaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterCaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterJen Aman
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...DataStax Academy
 
Apache Spark on Kubernetes
Apache Spark on KubernetesApache Spark on Kubernetes
Apache Spark on Kubernetesharidasnss
 
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersScalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersDatabricks
 
Spark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsSpark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsS N
 

What's hot (20)

Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
 
Resource scheduling
Resource schedulingResource scheduling
Resource scheduling
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Migrating to Riak at Shareaholic
Migrating to Riak at ShareaholicMigrating to Riak at Shareaholic
Migrating to Riak at Shareaholic
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
Philly DB MapR Overview
Philly DB MapR OverviewPhilly DB MapR Overview
Philly DB MapR Overview
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspective
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Anti patterns in hadoop cluster deployment
Anti patterns in hadoop cluster deploymentAnti patterns in hadoop cluster deployment
Anti patterns in hadoop cluster deployment
 
CaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterCaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark Cluster
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 
Apache Spark on Kubernetes
Apache Spark on KubernetesApache Spark on Kubernetes
Apache Spark on Kubernetes
 
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersScalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Spark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsSpark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloads
 

Viewers also liked

Sk rpt matematik tahun 3 by wahyu hidayat
Sk rpt matematik tahun 3 by wahyu hidayatSk rpt matematik tahun 3 by wahyu hidayat
Sk rpt matematik tahun 3 by wahyu hidayatizz_zafran
 
Gulliver al país de Li.liput
Gulliver al país de Li.liputGulliver al país de Li.liput
Gulliver al país de Li.liputlelescd
 
Skoleni golfovych rozhodcich III. tridy
Skoleni golfovych rozhodcich  III. tridySkoleni golfovych rozhodcich  III. tridy
Skoleni golfovych rozhodcich III. tridyBoleslav Bobcik
 
Lensfree Microscopy and Tomography
Lensfree Microscopy and TomographyLensfree Microscopy and Tomography
Lensfree Microscopy and TomographySERHAN ISIKMAN
 
Scvaf 2011 rooms presentation
Scvaf 2011 rooms presentationScvaf 2011 rooms presentation
Scvaf 2011 rooms presentationmduhe2
 
Organizational Wholeness and Growth
Organizational Wholeness and GrowthOrganizational Wholeness and Growth
Organizational Wholeness and GrowthChery Gegelman
 
Tax Assist Budget Summary2011
Tax Assist Budget Summary2011Tax Assist Budget Summary2011
Tax Assist Budget Summary2011Paul_Chillman
 
Grade 6_Kiểm tra trình độ
Grade 6_Kiểm tra trình độGrade 6_Kiểm tra trình độ
Grade 6_Kiểm tra trình độTa Hien
 
Speaker Kit - Gift Spotter
Speaker Kit - Gift SpotterSpeaker Kit - Gift Spotter
Speaker Kit - Gift SpotterKristyn Haywood
 
The global environment (2)
The global environment (2)The global environment (2)
The global environment (2)hi2mcfly
 
Describing trends
Describing trendsDescribing trends
Describing trendsTa Hien
 
New Accounting Standards Mcr Cpa 2009
New Accounting Standards Mcr Cpa 2009New Accounting Standards Mcr Cpa 2009
New Accounting Standards Mcr Cpa 2009mrittmayer
 
B&A Consumer Confidence Barometer Sept 2013
B&A Consumer Confidence Barometer Sept 2013B&A Consumer Confidence Barometer Sept 2013
B&A Consumer Confidence Barometer Sept 2013lukereaper
 
100% Funding Exec Summary
100%  Funding Exec Summary100%  Funding Exec Summary
100% Funding Exec Summarydgc_finance
 
Downtown orlando, florida
Downtown orlando, floridaDowntown orlando, florida
Downtown orlando, floridaJennifer Degan
 
Nutricion celular mi herbalife
Nutricion celular mi herbalifeNutricion celular mi herbalife
Nutricion celular mi herbalifeafernandezh
 

Viewers also liked (20)

Sk rpt matematik tahun 3 by wahyu hidayat
Sk rpt matematik tahun 3 by wahyu hidayatSk rpt matematik tahun 3 by wahyu hidayat
Sk rpt matematik tahun 3 by wahyu hidayat
 
Gulliver al país de Li.liput
Gulliver al país de Li.liputGulliver al país de Li.liput
Gulliver al país de Li.liput
 
Skoleni golfovych rozhodcich III. tridy
Skoleni golfovych rozhodcich  III. tridySkoleni golfovych rozhodcich  III. tridy
Skoleni golfovych rozhodcich III. tridy
 
Lensfree Microscopy and Tomography
Lensfree Microscopy and TomographyLensfree Microscopy and Tomography
Lensfree Microscopy and Tomography
 
Asadsa s asd
Asadsa s asdAsadsa s asd
Asadsa s asd
 
Scvaf 2011 rooms presentation
Scvaf 2011 rooms presentationScvaf 2011 rooms presentation
Scvaf 2011 rooms presentation
 
Organizational Wholeness and Growth
Organizational Wholeness and GrowthOrganizational Wholeness and Growth
Organizational Wholeness and Growth
 
Tax Assist Budget Summary2011
Tax Assist Budget Summary2011Tax Assist Budget Summary2011
Tax Assist Budget Summary2011
 
Grade 6_Kiểm tra trình độ
Grade 6_Kiểm tra trình độGrade 6_Kiểm tra trình độ
Grade 6_Kiểm tra trình độ
 
Speaker Kit - Gift Spotter
Speaker Kit - Gift SpotterSpeaker Kit - Gift Spotter
Speaker Kit - Gift Spotter
 
The global environment (2)
The global environment (2)The global environment (2)
The global environment (2)
 
Describing trends
Describing trendsDescribing trends
Describing trends
 
New Accounting Standards Mcr Cpa 2009
New Accounting Standards Mcr Cpa 2009New Accounting Standards Mcr Cpa 2009
New Accounting Standards Mcr Cpa 2009
 
Steve Ventre
Steve VentreSteve Ventre
Steve Ventre
 
B&A Consumer Confidence Barometer Sept 2013
B&A Consumer Confidence Barometer Sept 2013B&A Consumer Confidence Barometer Sept 2013
B&A Consumer Confidence Barometer Sept 2013
 
100% Funding Exec Summary
100%  Funding Exec Summary100%  Funding Exec Summary
100% Funding Exec Summary
 
Downtown orlando, florida
Downtown orlando, floridaDowntown orlando, florida
Downtown orlando, florida
 
Nutricion celular mi herbalife
Nutricion celular mi herbalifeNutricion celular mi herbalife
Nutricion celular mi herbalife
 
Murata power supply hotsell
Murata power supply hotsellMurata power supply hotsell
Murata power supply hotsell
 
Heavy metal
Heavy metalHeavy metal
Heavy metal
 

Similar to YARN Hadoop Summit Bangalore 2011

Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...Yahoo Developer Network
 
Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReducehuguk
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHortonworks
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopHortonworks
 
Apache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextApache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextDataWorks Summit
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioningStanley Wang
 
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATLBryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATLMLconf
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesYahoo Developer Network
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesVenu Ryali
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingTeddy Choi
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfWasyihunSema2
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...Pete Siddall
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Caserta
 

Similar to YARN Hadoop Summit Bangalore 2011 (20)

Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
 
Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduce
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
Apache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextApache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's Next
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
 
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATLBryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
Bryan Thompson, Chief Scientist and Founder, SYSTAP, LLC at MLconf ATL
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and Insides
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and Kubernetes
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Yarn
YarnYarn
Yarn
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

YARN Hadoop Summit Bangalore 2011

  • 1. The Next Generation of Hadoop Map-Reduce Sharad Agarwal sharadag@yahoo-inc.com sharad@apache.org
  • 2. About Me  Hadoop Committer and PMC member  Architect at Yahoo!
  • 3. Hadoop Map-Reduce Today  JobTracker - Manages cluster resources and job scheduling  TaskTracker - Per-node agent - Manage tasks
  • 4. Current Limitations  Scalability - Maximum Cluster size – 4,000 nodes - Maximum concurrent tasks – 40,000 - Coarse synchronization in JobTracker  Single point of failure - Failure kills all queued and running jobs - Jobs need to be re-submitted by users  Restart is very tricky due to complex state  Hard partition of resources into map and reduce slots
  • 5. Current Limitations  Lacks support for alternate paradigms - Iterative applications implemented using Map-Reduce are 10x slower. - Example: K-Means, PageRank  Lack of wire-compatible protocols - Client and cluster must be of same version - Applications and workflows cannot migrate to different clusters
  • 6. Next Generation Map-Reduce Requirements  Reliability  Availability  Scalability - Clusters of 6,000 machines - Each machine with 16 cores, 48G RAM, 24TB disks - 100,000 concurrent tasks - 10,000 concurrent jobs  Wire Compatibility  Agility & Evolution – Ability for customers to control upgrades to the grid software stack.
  • 7. Next Generation Map-Reduce – Design Centre  Split up the two major functions of JobTracker - Cluster resource management - Application life-cycle management  Map-Reduce becomes user-land library
  • 9. Architecture  Resource Manager - Global resource scheduler - Hierarchical queues  Node Manager - Per-machine agent - Manages the life-cycle of container - Container resource monitoring  Application Master - Per-application - Manages application scheduling and task execution - E.g. Map-Reduce Application Master
  • 10. Improvements vis-à-vis current Map-Reduce  Scalability - Application life-cycle management is very expensive - Partition resource management and application life-cycle management - Application management is distributed - Hardware trends - Currently run clusters of 4,000 machines • 6,000 2012 machines > 12,000 2009 machines • <8 cores, 16G, 4TB> v/s <16+ cores, 48/96G, 24TB>
  • 11. Improvements vis-à-vis current Map-Reduce  Availability - Application Master • Optional failover via application-specific checkpoint • Map-Reduce applications pick up where they left off - Resource Manager • No single point of failure - failover via ZooKeeper • Application Masters are restarted automatically
  • 12. Improvements vis-à-vis current Map-Reduce  Wire Compatibility - Protocols are wire-compatible - Old clients can talk to new servers - Rolling upgrades
  • 13. Improvements vis-à-vis current Map-Reduce  Agility / Evolution - Map-Reduce now becomes a user-land library - Multiple versions of Map-Reduce can run in the same cluster (ala Apache Pig) • Faster deployment cycles for improvements - Customers upgrade Map-Reduce versions on their schedule
  • 14. Improvements vis-à-vis current Map-Reduce  Utilization - Generic resource model • Memory • CPU • Disk b/w • Network b/w - Remove fixed partition of map and reduce slots
  • 15. Improvements vis-à-vis current Map-Reduce  Support for programming paradigms other than Map-Reduce - MPI - Master-Worker - Machine Learning - Iterative processing - Enabled by allowing use of paradigm-specific Application Master - Run all on the same Hadoop cluster
  • 16. Summary  The next generation of Map-Reduce takes Hadoop to the next level - Scale-out even further - High availability - Cluster Utilization - Support for paradigms other than Map-Reduce