SlideShare a Scribd company logo
1 of 27
Apache Hadoop MapReduce
What is next?


Arun C. Murthy
Founder & Architect
@acmurthy (@hortonworks)




                           Page 1
Hello! I’m Arun
• Founder/Architect at Hortonworks Inc.
  – Lead, Map-Reduce
  – Formerly, Architect Hadoop MapReduce, Yahoo
  – Responsible for running Hadoop MR as a service for all of Yahoo
    (50k nodes footprint)

• Apache Hadoop, ASF
  – VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC)
  – Long-term Committer/PMC member (full time >6 years)
  – Release Manager for hadoop-2




                                                                 Page 2
Agenda

• Yesterday: Hadoop MapReduce, circa 2011
• Today: Hadoop YARN
   – Overview
   – State of the art

•  Art of the possible
   – YARN Runtime
   – MapReduce Framework

• Q&A



                                            Page 3
Hadoop MapReduce
circa 2011




                   Page 4
Hadoop MapReduce Classic
•  JobTracker
   –  Manages cluster resources and job scheduling
•  TaskTracker
   –  Per-node agent

   –  Manage tasks
Current Limitations
•  Utilization

•  Scalability
   –  Maximum Cluster size – 4,000 nodes
   –  Maximum concurrent tasks – 40,000
   –  Coarse synchronization in JobTracker

•  Single point of failure
   –  Failure kills all queued and running jobs
   –  Jobs restarted on bounce




                             6
Current Limitations
•  Hard partition of resources into map and reduce slots
   –  Low resource utilization

•  Lacks support for alternate paradigms
   –  Iterative applications implemented using MapReduce are
      10x slower
   –  Hacks for the likes of MPI/Graph Processing

•  Lack of wire-compatible protocols
   –  Client and cluster must be of same version
   –  Applications and workflows cannot migrate to different
      clusters



                             7
Hadoop YARN
Overview




              Page 8
Requirements
•  Reliability
•  Availability
•  Utilization
•  Wire Compatibility
•  Agility & Evolution – Ability for customers to control
   upgrades to the grid software stack.
•  Scalability - Clusters of 6,000-10,000 machines
   –  Each machine with 16 cores, 48G/96G RAM, 24TB/36TB
      disks
   –  100,000+ concurrent tasks
   –  10,000 concurrent jobs

                            9
Design Centre
•  Split up the two major functions of JobTracker
   –  Cluster resource management
   –  Application life-cycle management

•  MapReduce becomes user-land library




                           10
Architecture
•  Application
   –  Application is a job submitted to the framework
   –  Example – Map Reduce Job

•  Container
   –  Basic unit of allocation
   –  Example – container A = 2GB, 1CPU
   –  Replaces the fixed map/reduce slots




                            11
Architecture
•  Resource Manager
   –  Global resource scheduler
   –  Hierarchical queues

•  Node Manager
   –  Per-machine agent
   –  Manages the life-cycle of container
   –  Container resource monitoring

•  Application Master
   –  Per-application
   –  Manages application scheduling and task execution
   –  E.g. MapReduce Application Master

                            12
Architecture

                                             Node
                                             Node
                                            Manager
                                            Manager


                                      Container   App Mstr
                                                  App Mstr


       Client

                           Resource          Node
                                             Node
                           Resource
                           Manager
                           Manager          Manager
                                            Manager
       Client
        Client

                                      App Mstr    Container
                                                  Container




        MapReduce Status                     Node
                                             Node
        MapReduce Status
                                            Manager
                                            Manager
          Job Submission
         Job Submission
           Node Status
          Node Status
        Resource Request
        Resource Request              Container   Container
How do I get it?

•  Available in hadoop-2.0.0-alpha release




                          14
Performance

• 2x+ across the board
• MapReduce
    – Unlock lots of improvements from Terasort record (Owen/Arun,
      2009)
          – Shuffle 30%+
          – Merge improvements
    – Small Jobs – Uber AM
    – Re-use task slots (containers)
More details: http://hortonworks.com/delivering-on-hadoop-next-benchmarking-performance/




                                                                                           Page 15
Resources

hadoop-2.0.0 (alpha release):
http://hadoop.apache.org/common/releases.html

Release Documentation:
http://hadoop.apache.org/common/docs/r2.0.0-alpha/




                                                     Page 16
Art of the possible
YARN Runtime
MapReduce Framework




                      Page 17
Looking ahead

• YARN
  – Runtime Improvements
  – Alternate programming models
  – Long(er) running services

• MapReduce
  – Framework enhancements
  – Unpack!




                                   Page 18
YARN - Roadmap

• Scheduler
  – Multi-dimensional resource scheduling (MAPREDUCE-4327)
  – Preemption (MAPREDUCE-3938)
  – Gang scheduling



• Runtime improvements
  – Container Isolation (MAPREDUCE-4334)




                                                        Page 19
YARN - Data Processing Applications

• OpenMPI on Hadoop
• Spark (UC Berkeley)
   – Shark is Hive-on-Spark
• Real-time data processing
   – Storm (Twitter)
   – Apache S4
•  Graph processing – Apache Giraph




                                      Page 20
YARN - Beyond Data Processing Apps

• Apache Hbase
  – Deployment via YARN (HBASE-4329)
  – Co-processors via YARN (HBASE-4047)
• Simple deployment for cluster services




                                           Page 21
MapReduce – Way Forward

• MapReduce Framework Runtime
  – Monolithic software
• MR Runtime?
  – Sort, Merge, Shuffle et al
• Unpack into smaller building blocks!
  – Allow applications and Pig/Hive to ‘plug-n-play’
  – MR framework, as we know today, becomes a particular
    configuration of the building blocks




                                                           Page 22
MapReduce – Pluggable Sort

• Pig & Hive benefit from hash-based aggregation
  – Several queries don’t need full-sort of map-outputs
  – Aggregation suffices
  – Allow for pluggable MapOutputBuffer in MapTask
  – Sort Avoidance - MAPREDUCE-4039
  – External sort plugin – MAPREDUCE-2454




                                                          Page 23
MapReduce – Pluggable Shuffle

• Push v/s Pull shuffle
• Plug shuffle implementation (already in hadoop-2)
   – E.g. RDMA for shuffle
   – MAPREDUCE-4049
• Collation tasks
   – Sailfish - Yahoo Research (includes auto-tuning of reduces)




                                                                   Page 24
MapReduce – More ideas

• Allow for Map-Reduce-Reduce
  – Allow for reduce output to be sorted/shuffled
  – JOIN followed by ORDER BY
  – Really big deal for Pig/Hive




                                                    Page 25
MapReduce – How do we get there?

• Multiple, concurrent implementations of MapReduce
  – YARN is a really big deal…
  – Allows for safe experiments, much less risky!
  – Exposure surface is highly limited




                                                      Page 26
Questions?




Thank You.
@acmurthy




             Page 27

More Related Content

What's hot

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Rohit Agrawal
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupRommel Garcia
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Cloudera, Inc.
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Cloudera, Inc.
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache TezGetInData
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Big Data Joe™ Rossi
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Sumeet Singh
 
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformTsuyoshi OZAWA
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNDataWorks Summit
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Rajarshi Guha
 

What's hot (20)

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache Tez
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
 
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platform
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Anatomy of Hadoop YARN
Anatomy of Hadoop YARNAnatomy of Hadoop YARN
Anatomy of Hadoop YARN
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 

Viewers also liked

*7* Marketing Basic Fundamentals on How to INCREASE Your Revenue
*7* Marketing Basic Fundamentals on How to INCREASE Your Revenue*7* Marketing Basic Fundamentals on How to INCREASE Your Revenue
*7* Marketing Basic Fundamentals on How to INCREASE Your RevenueJohn Di Lemme
 
5. ejercicios de respiración programada
5.  ejercicios de respiración programada5.  ejercicios de respiración programada
5. ejercicios de respiración programadaRossyPalmaM Palma M
 
Pirmais kredīts bezmaksas
Pirmais kredīts bezmaksasPirmais kredīts bezmaksas
Pirmais kredīts bezmaksasComparolv
 
Ātrie kredīti no 20 gadiem
Ātrie kredīti no 20 gadiemĀtrie kredīti no 20 gadiem
Ātrie kredīti no 20 gadiemMaruts Lapsiņš
 

Viewers also liked (10)

Harmony eng(1)
Harmony eng(1)Harmony eng(1)
Harmony eng(1)
 
My Career Profile
My Career ProfileMy Career Profile
My Career Profile
 
*7* Marketing Basic Fundamentals on How to INCREASE Your Revenue
*7* Marketing Basic Fundamentals on How to INCREASE Your Revenue*7* Marketing Basic Fundamentals on How to INCREASE Your Revenue
*7* Marketing Basic Fundamentals on How to INCREASE Your Revenue
 
5. ejercicios de respiración programada
5.  ejercicios de respiración programada5.  ejercicios de respiración programada
5. ejercicios de respiración programada
 
Augueries of rhapsodies
Augueries of rhapsodiesAugueries of rhapsodies
Augueries of rhapsodies
 
Pirmais kredīts bezmaksas
Pirmais kredīts bezmaksasPirmais kredīts bezmaksas
Pirmais kredīts bezmaksas
 
Humanity weeps
Humanity weepsHumanity weeps
Humanity weeps
 
Ātrie kredīti no 20 gadiem
Ātrie kredīti no 20 gadiemĀtrie kredīti no 20 gadiem
Ātrie kredīti no 20 gadiem
 
Fiestario
FiestarioFiestario
Fiestario
 
2. fonema r preparatorio
2.  fonema r preparatorio2.  fonema r preparatorio
2. fonema r preparatorio
 

Similar to Apache Hadoop MapReduce: What's Next

Hadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHortonworks
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopHortonworks
 
Hadoop bangalore-meetup-dec-2011-hadoop nextgen
Hadoop bangalore-meetup-dec-2011-hadoop nextgenHadoop bangalore-meetup-dec-2011-hadoop nextgen
Hadoop bangalore-meetup-dec-2011-hadoop nextgenInMobi
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton WorksHadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton WorksCloudera, Inc.
 
Apache Hadoop 0.23 at Hadoop World 2011
Apache Hadoop 0.23 at Hadoop World 2011Apache Hadoop 0.23 at Hadoop World 2011
Apache Hadoop 0.23 at Hadoop World 2011Hortonworks
 
Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReducehuguk
 
YARN Hadoop Summit Bangalore 2011
YARN Hadoop Summit Bangalore 2011YARN Hadoop Summit Bangalore 2011
YARN Hadoop Summit Bangalore 2011Sharad Agarwal
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceUwe Printz
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfWasyihunSema2
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Hakka Labs
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史Insight Technology, Inc.
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Steve Min
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureVinod Kumar Vavilapalli
 

Similar to Apache Hadoop MapReduce: What's Next (20)

Hadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
Hadoop bangalore-meetup-dec-2011-hadoop nextgen
Hadoop bangalore-meetup-dec-2011-hadoop nextgenHadoop bangalore-meetup-dec-2011-hadoop nextgen
Hadoop bangalore-meetup-dec-2011-hadoop nextgen
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton WorksHadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
Hadoop World 2011: Apache Hadoop 0.23 - Arun Murthy, Horton Works
 
Apache Hadoop 0.23 at Hadoop World 2011
Apache Hadoop 0.23 at Hadoop World 2011Apache Hadoop 0.23 at Hadoop World 2011
Apache Hadoop 0.23 at Hadoop World 2011
 
Next Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduce
 
MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
 
YARN Hadoop Summit Bangalore 2011
YARN Hadoop Summit Bangalore 2011YARN Hadoop Summit Bangalore 2011
YARN Hadoop Summit Bangalore 2011
 
Yarnthug2014
Yarnthug2014Yarnthug2014
Yarnthug2014
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
 
Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 

Recently uploaded (20)

COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 

Apache Hadoop MapReduce: What's Next

  • 1. Apache Hadoop MapReduce What is next? Arun C. Murthy Founder & Architect @acmurthy (@hortonworks) Page 1
  • 2. Hello! I’m Arun • Founder/Architect at Hortonworks Inc. – Lead, Map-Reduce – Formerly, Architect Hadoop MapReduce, Yahoo – Responsible for running Hadoop MR as a service for all of Yahoo (50k nodes footprint) • Apache Hadoop, ASF – VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC) – Long-term Committer/PMC member (full time >6 years) – Release Manager for hadoop-2 Page 2
  • 3. Agenda • Yesterday: Hadoop MapReduce, circa 2011 • Today: Hadoop YARN – Overview – State of the art •  Art of the possible – YARN Runtime – MapReduce Framework • Q&A Page 3
  • 5. Hadoop MapReduce Classic •  JobTracker –  Manages cluster resources and job scheduling •  TaskTracker –  Per-node agent –  Manage tasks
  • 6. Current Limitations •  Utilization •  Scalability –  Maximum Cluster size – 4,000 nodes –  Maximum concurrent tasks – 40,000 –  Coarse synchronization in JobTracker •  Single point of failure –  Failure kills all queued and running jobs –  Jobs restarted on bounce 6
  • 7. Current Limitations •  Hard partition of resources into map and reduce slots –  Low resource utilization •  Lacks support for alternate paradigms –  Iterative applications implemented using MapReduce are 10x slower –  Hacks for the likes of MPI/Graph Processing •  Lack of wire-compatible protocols –  Client and cluster must be of same version –  Applications and workflows cannot migrate to different clusters 7
  • 9. Requirements •  Reliability •  Availability •  Utilization •  Wire Compatibility •  Agility & Evolution – Ability for customers to control upgrades to the grid software stack. •  Scalability - Clusters of 6,000-10,000 machines –  Each machine with 16 cores, 48G/96G RAM, 24TB/36TB disks –  100,000+ concurrent tasks –  10,000 concurrent jobs 9
  • 10. Design Centre •  Split up the two major functions of JobTracker –  Cluster resource management –  Application life-cycle management •  MapReduce becomes user-land library 10
  • 11. Architecture •  Application –  Application is a job submitted to the framework –  Example – Map Reduce Job •  Container –  Basic unit of allocation –  Example – container A = 2GB, 1CPU –  Replaces the fixed map/reduce slots 11
  • 12. Architecture •  Resource Manager –  Global resource scheduler –  Hierarchical queues •  Node Manager –  Per-machine agent –  Manages the life-cycle of container –  Container resource monitoring •  Application Master –  Per-application –  Manages application scheduling and task execution –  E.g. MapReduce Application Master 12
  • 13. Architecture Node Node Manager Manager Container App Mstr App Mstr Client Resource Node Node Resource Manager Manager Manager Manager Client Client App Mstr Container Container MapReduce Status Node Node MapReduce Status Manager Manager Job Submission Job Submission Node Status Node Status Resource Request Resource Request Container Container
  • 14. How do I get it? •  Available in hadoop-2.0.0-alpha release 14
  • 15. Performance • 2x+ across the board • MapReduce – Unlock lots of improvements from Terasort record (Owen/Arun, 2009) – Shuffle 30%+ – Merge improvements – Small Jobs – Uber AM – Re-use task slots (containers) More details: http://hortonworks.com/delivering-on-hadoop-next-benchmarking-performance/ Page 15
  • 16. Resources hadoop-2.0.0 (alpha release): http://hadoop.apache.org/common/releases.html Release Documentation: http://hadoop.apache.org/common/docs/r2.0.0-alpha/ Page 16
  • 17. Art of the possible YARN Runtime MapReduce Framework Page 17
  • 18. Looking ahead • YARN – Runtime Improvements – Alternate programming models – Long(er) running services • MapReduce – Framework enhancements – Unpack! Page 18
  • 19. YARN - Roadmap • Scheduler – Multi-dimensional resource scheduling (MAPREDUCE-4327) – Preemption (MAPREDUCE-3938) – Gang scheduling • Runtime improvements – Container Isolation (MAPREDUCE-4334) Page 19
  • 20. YARN - Data Processing Applications • OpenMPI on Hadoop • Spark (UC Berkeley) – Shark is Hive-on-Spark • Real-time data processing – Storm (Twitter) – Apache S4 •  Graph processing – Apache Giraph Page 20
  • 21. YARN - Beyond Data Processing Apps • Apache Hbase – Deployment via YARN (HBASE-4329) – Co-processors via YARN (HBASE-4047) • Simple deployment for cluster services Page 21
  • 22. MapReduce – Way Forward • MapReduce Framework Runtime – Monolithic software • MR Runtime? – Sort, Merge, Shuffle et al • Unpack into smaller building blocks! – Allow applications and Pig/Hive to ‘plug-n-play’ – MR framework, as we know today, becomes a particular configuration of the building blocks Page 22
  • 23. MapReduce – Pluggable Sort • Pig & Hive benefit from hash-based aggregation – Several queries don’t need full-sort of map-outputs – Aggregation suffices – Allow for pluggable MapOutputBuffer in MapTask – Sort Avoidance - MAPREDUCE-4039 – External sort plugin – MAPREDUCE-2454 Page 23
  • 24. MapReduce – Pluggable Shuffle • Push v/s Pull shuffle • Plug shuffle implementation (already in hadoop-2) – E.g. RDMA for shuffle – MAPREDUCE-4049 • Collation tasks – Sailfish - Yahoo Research (includes auto-tuning of reduces) Page 24
  • 25. MapReduce – More ideas • Allow for Map-Reduce-Reduce – Allow for reduce output to be sorted/shuffled – JOIN followed by ORDER BY – Really big deal for Pig/Hive Page 25
  • 26. MapReduce – How do we get there? • Multiple, concurrent implementations of MapReduce – YARN is a really big deal… – Allows for safe experiments, much less risky! – Exposure surface is highly limited Page 26