NextGen Apache Hadoop MapReduce

Hortonworks
HortonworksHortonworks
Next Generation of Apache Hadoop MapReduce,[object Object],Arun C. Murthy - Hortonworks Founder and Architect,[object Object],@acmurthy (@hortonworks),[object Object],Formerly Architect, MapReduce @ Yahoo!,[object Object],8 years @ Yahoo!,[object Object],© Hortonworks Inc. 2011,[object Object],June 29, 2011,[object Object]
Hello! I’m Arun…,[object Object],Architect & Lead, Apache Hadoop MapReduce Development Team at Hortonworks (formerly at Yahoo!),[object Object],Apache Hadoop Committer and Member of PMC,[object Object],Full-time contributor to Apache Hadoop since early 2006,[object Object]
Hadoop MapReduce Today,[object Object],JobTracker,[object Object],Manages cluster resources and job scheduling,[object Object],TaskTracker,[object Object],Per-node agent,[object Object],Manage tasks,[object Object]
Current Limitations,[object Object],Scalability,[object Object],Maximum Cluster size – 4,000 nodes,[object Object],Maximum concurrent tasks – 40,000,[object Object],Coarse synchronization in JobTracker,[object Object],Single point of failure	,[object Object],Failure kills all queued and running jobs,[object Object],Jobs need to be re-submitted by users,[object Object],Restart is very tricky due to complex state,[object Object],Hard partition of resources into map and reduce slots,[object Object],© Hortonworks Inc. 2011,[object Object],5,[object Object]
Current Limitations,[object Object],Lacks support for alternate paradigms,[object Object],Iterative applications implemented using MapReduce are 10x slower. ,[object Object],Example: K-Means, PageRank,[object Object],Lack of wire-compatible protocols ,[object Object],Client and cluster must be of same version,[object Object],Applications and workflows cannot migrate to different clusters,[object Object],© Hortonworks Inc. 2011,[object Object],6,[object Object]
Requirements,[object Object],Reliability,[object Object],Availability,[object Object],Scalability - Clusters of 6,000-10,000 machines,[object Object],Each machine with 16 cores, 48G/96G RAM, 24TB/36TB disks,[object Object],100,000+ concurrent tasks,[object Object],10,000 concurrent jobs,[object Object],Wire Compatibility,[object Object],Agility & Evolution – Ability for customers to control upgrades to the grid software stack.,[object Object],© Hortonworks Inc. 2011,[object Object],7,[object Object]
Design Centre,[object Object],Split up the two major functions of JobTracker,[object Object],Cluster resource management,[object Object],Application life-cycle management,[object Object],MapReduce becomes user-land library,[object Object],© Hortonworks Inc. 2011,[object Object],8,[object Object]
Architecture,[object Object]
Architecture,[object Object],Resource Manager,[object Object],Global resource scheduler,[object Object],Hierarchical queues,[object Object],Node Manager,[object Object],Per-machine agent,[object Object],Manages the life-cycle of container,[object Object],Container resource monitoring,[object Object],Application Master,[object Object],Per-application,[object Object],Manages application scheduling and task execution,[object Object],E.g. MapReduce Application Master,[object Object],© Hortonworks Inc. 2011,[object Object],10,[object Object]
 Improvements vis-à-vis current MapReduce,[object Object],Scalability ,[object Object],Application life-cycle management is very expensive,[object Object],Partition resource management and application life-cycle management,[object Object],Application management is distributed,[object Object],Hardware trends - Currently run clusters of 4,000 machines,[object Object],6,000 2012 machines > 12,000 2009 machines,[object Object],<16+ cores, 48/96G, 24TB> v/s <8 cores, 16G, 4TB>,[object Object],© Hortonworks Inc. 2011,[object Object],11,[object Object]
Improvments vis-à-vis current MapReduce,[object Object],Fault Tolerance and Availability ,[object Object],Resource Manager,[object Object],No single point of failure – state saved in ZooKeeper,[object Object],Application Masters are restarted automatically on RM restart,[object Object],Applications continue to progress with existing resources during restart, new resources aren’t allocated,[object Object],Application Master,[object Object],Optional failover via application-specific checkpoint,[object Object],MapReduce applications pick up where they left off via state saved in HDFS,[object Object],© Hortonworks Inc. 2011,[object Object],12,[object Object]
 Improvements vis-à-vis current MapReduce,[object Object],Wire Compatibility ,[object Object],Protocols are wire-compatible,[object Object],Old clients can talk to new servers,[object Object],Rolling upgrades,[object Object],© Hortonworks Inc. 2011,[object Object],13,[object Object]
 Improvements vis-à-vis current MapReduce,[object Object],Innovation and Agility,[object Object],MapReduce now becomes a user-land library,[object Object],Multiple versions of MapReduce can run in the same cluster (a la Apache Pig),[object Object],Faster deployment cycles for improvements,[object Object],Customers upgrade MapReduce versions on their schedule,[object Object],Users can customize MapReduce e.g. HOP without affecting everyone!,[object Object],© Hortonworks Inc. 2011,[object Object],14,[object Object]
 Improvements vis-à-vis current MapReduce,[object Object],Utilization,[object Object],Generic resource model ,[object Object],Memory,[object Object],CPU,[object Object],Disk b/w,[object Object],Network b/w,[object Object],Remove fixed partition of map and reduce slots,[object Object],© Hortonworks Inc. 2011,[object Object],15,[object Object]
 Improvements vis-à-vis current MapReduce,[object Object],Support for programming paradigms other than MapReduce,[object Object],MPI,[object Object],Master-Worker,[object Object],Machine Learning,[object Object],Iterative processing,[object Object],Enabled by allowing use of paradigm-specific Application Master,[object Object],Run all on the same Hadoop cluster,[object Object],© Hortonworks Inc. 2011,[object Object],16,[object Object]
Summary,[object Object],MapReduce .Next takes Hadoop to the next level,[object Object],Scale-out even further,[object Object],High availability,[object Object],Cluster Utilization ,[object Object],Support for paradigms other than MapReduce,[object Object],© Hortonworks Inc. 2011,[object Object],17,[object Object]
Status – June, 2011,[object Object],Feature complete,[object Object],Rigorous testing cycle underway,[object Object],Scale testing at ~500 nodes,[object Object],Sort/Scan/Shuffle benchmarks,[object Object],GridMixV3!,[object Object],Integration testing,[object Object],Pig integration complete!,[object Object],Coming in the next release of Apache Hadoop!,[object Object],Beta deployments of next release of Apache Hadoop at Yahoo! in Q4, 2011,[object Object],© Hortonworks Inc. 2011,[object Object],18,[object Object]
Questions?,[object Object],http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/,[object Object],© Hortonworks Inc. 2011,[object Object],19,[object Object]
Thank You.,[object Object],© Hortonworks Inc. 2011,[object Object]
1 of 19

Recommended

Apache Hadoop YARN - Enabling Next Generation Data Applications by
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
60.4K views64 slides
Apache Hadoop YARN 2015: Present and Future by
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureDataWorks Summit
2K views39 slides
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys... by
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...DataWorks Summit
800 views13 slides
Apache Hadoop YARN - The Future of Data Processing with Hadoop by
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
5K views31 slides
Apache Tez - A New Chapter in Hadoop Data Processing by
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
18.3K views39 slides
Apache Hadoop YARN: Present and Future by
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
4.1K views41 slides

More Related Content

What's hot

Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future by
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureVinod Kumar Vavilapalli
2.2K views38 slides
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks by
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
743 views31 slides
Hadoop Summit Europe 2015 - YARN Present and Future by
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureVinod Kumar Vavilapalli
3.8K views39 slides
Writing Yarn Applications Hadoop Summit 2012 by
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Hortonworks
4.1K views25 slides
Towards SLA-based Scheduling on YARN Clusters by
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
4.9K views33 slides
Introduction to YARN Apps by
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN AppsCloudera, Inc.
3.9K views34 slides

What's hot(20)

Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future by Vinod Kumar Vavilapalli
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks by DataWorks Summit
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit743 views
Writing Yarn Applications Hadoop Summit 2012 by Hortonworks
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012
Hortonworks4.1K views
Towards SLA-based Scheduling on YARN Clusters by DataWorks Summit
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
DataWorks Summit4.9K views
Introduction to YARN Apps by Cloudera, Inc.
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
Cloudera, Inc.3.9K views
Enabling Diverse Workload Scheduling in YARN by DataWorks Summit
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
DataWorks Summit3.2K views
Apache Hadoop YARN: Past, Present and Future by DataWorks Summit
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit1.5K views
Flexible and Real-Time Stream Processing with Apache Flink by DataWorks Summit
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit2.2K views
Apache Hadoop YARN: best practices by DataWorks Summit
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
DataWorks Summit16.8K views
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN by DataWorks Summit
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit935 views
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations! by Mich Talebzadeh (Ph.D.)
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
YARN Ready: Apache Spark by Hortonworks
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
Hortonworks10.8K views
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem by Cloudera, Inc.
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.23.7K views
Apache Tez - Accelerating Hadoop Data Processing by hitesh1892
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
hitesh18923K views
Running Non-MapReduce Big Data Applications on Apache Hadoop by hitesh1892
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
hitesh18927.3K views

Viewers also liked

Hadoop MapReduce Fundamentals by
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
133.9K views86 slides
Hadoop Map Reduce by
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map ReduceVNIT-ACM Student Chapter
11.1K views20 slides
MapReduce in Simple Terms by
MapReduce in Simple TermsMapReduce in Simple Terms
MapReduce in Simple TermsSaliya Ekanayake
33K views9 slides
Millions of Regions in HBase: Size Matters by
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersDataWorks Summit
2.5K views26 slides
MapReduce: Simplified Data Processing on Large Clusters by
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersAshraf Uddin
3.5K views32 slides
What's new in Ambari by
What's new in AmbariWhat's new in Ambari
What's new in AmbariDataWorks Summit
1.1K views54 slides

Viewers also liked(20)

Hadoop MapReduce Fundamentals by Lynn Langit
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
Lynn Langit133.9K views
Millions of Regions in HBase: Size Matters by DataWorks Summit
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size Matters
DataWorks Summit2.5K views
MapReduce: Simplified Data Processing on Large Clusters by Ashraf Uddin
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
Ashraf Uddin3.5K views
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar... by Spark Summit
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Summit4.2K views
Map reduce - simplified data processing on large clusters by Cleverence Kombe
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
Cleverence Kombe879 views
Hadoop & MapReduce by Newvewm
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
Newvewm29.1K views
Apache Hadoop 0.23 by Hortonworks
Apache Hadoop 0.23Apache Hadoop 0.23
Apache Hadoop 0.23
Hortonworks4.5K views
Hadoop Map Reduce 程式設計 by Wei-Yu Chen
Hadoop Map Reduce 程式設計Hadoop Map Reduce 程式設計
Hadoop Map Reduce 程式設計
Wei-Yu Chen40.4K views
Application of MapReduce in Cloud Computing by Mohammad Mustaqeem
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
Mohammad Mustaqeem17.6K views
An Introduction to MapReduce by Frane Bandov
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
Frane Bandov7.4K views
Getting involved with Open Source at the ASF by Hortonworks
Getting involved with Open Source at the ASFGetting involved with Open Source at the ASF
Getting involved with Open Source at the ASF
Hortonworks2K views
Architecting next generation big data platform by hadooparchbook
Architecting next generation big data platformArchitecting next generation big data platform
Architecting next generation big data platform
hadooparchbook1.8K views
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise by DataWorks Summit
Data Science: Driving Smarter Finance and Workforce Decsions for the EnterpriseData Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
DataWorks Summit927 views
Internet of Things Crash Course Workshop at Hadoop Summit by DataWorks Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
DataWorks Summit3.6K views
How to shutdown and power up of the netapp cluster mode storage system by Saroj Sahu
How to shutdown and power up of the netapp cluster mode storage systemHow to shutdown and power up of the netapp cluster mode storage system
How to shutdown and power up of the netapp cluster mode storage system
Saroj Sahu13.5K views

Similar to NextGen Apache Hadoop MapReduce

Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce... by
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...Yahoo Developer Network
547 views17 slides
YARN - Next Generation Compute Platform fo Hadoop by
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopHortonworks
11.1K views22 slides
YARN: Future of Data Processing with Apache Hadoop by
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopHortonworks
4.7K views22 slides
Get Started Building YARN Applications by
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
12.7K views30 slides
Bikas saha:the next generation of hadoop– hadoop 2 and yarn by
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
806 views22 slides
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop by
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
53.3K views22 slides

Similar to NextGen Apache Hadoop MapReduce(20)

Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce... by Yahoo Developer Network
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
YARN - Next Generation Compute Platform fo Hadoop by Hortonworks
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
Hortonworks11.1K views
YARN: Future of Data Processing with Apache Hadoop by Hortonworks
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
Hortonworks4.7K views
Get Started Building YARN Applications by Hortonworks
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
Hortonworks12.7K views
Bikas saha:the next generation of hadoop– hadoop 2 and yarn by hdhappy001
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001806 views
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop by Hortonworks
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks53.3K views
YARN - Hadoop Next Generation Compute Platform by Bikas Saha
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
Bikas Saha945 views
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0 by Adam Muise
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise3.1K views
Internet of things Crash Course Workshop by DataWorks Summit
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit2.4K views
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG by skumpf
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
skumpf581 views
Storm Demo Talk - Denver Apr 2015 by Mac Moore
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
Mac Moore506 views
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics... by VMware Tanzu
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
VMware Tanzu1.1K views
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union by Wangda Tan
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan220 views
Apache Hadoop YARN: state of the union by DataWorks Summit
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit673 views
Next Generation of Hadoop MapReduce by huguk
Next Generation of Hadoop MapReduceNext Generation of Hadoop MapReduce
Next Generation of Hadoop MapReduce
huguk1.3K views

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level by
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
6.1K views30 slides
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy by
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
3.1K views40 slides
Getting the Most Out of Your Data in the Cloud with Cloudbreak by
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
1.1K views13 slides
Johns Hopkins - Using Hadoop to Secure Access Log Events by
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
1.1K views15 slides
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys by
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
748 views14 slides
HDF 3.2 - What's New by
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
1.1K views22 slides

More from Hortonworks(20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level by Hortonworks
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks6.1K views
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy by Hortonworks
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks3.1K views
Getting the Most Out of Your Data in the Cloud with Cloudbreak by Hortonworks
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks1.1K views
Johns Hopkins - Using Hadoop to Secure Access Log Events by Hortonworks
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks1.1K views
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys by Hortonworks
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks748 views
HDF 3.2 - What's New by Hortonworks
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks1.1K views
Curing Kafka Blindness with Hortonworks Streams Messaging Manager by Hortonworks
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks798 views
Interpretation Tool for Genomic Sequencing Data in Clinical Environments by Hortonworks
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks1.6K views
IBM+Hortonworks = Transformation of the Big Data Landscape by Hortonworks
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks2.1K views
Premier Inside-Out: Apache Druid by Hortonworks
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks3.4K views
Accelerating Data Science and Real Time Analytics at Scale by Hortonworks
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks948 views
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA by Hortonworks
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks1.1K views
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ... by Hortonworks
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks2.1K views
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense by Hortonworks
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks1K views
Making Enterprise Big Data Small with Ease by Hortonworks
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks732 views
Webinewbie to Webinerd in 30 Days - Webinar World Presentation by Hortonworks
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks492 views
Driving Digital Transformation Through Global Data Management by Hortonworks
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks4.3K views
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features by Hortonworks
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks908 views
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A... by Hortonworks
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks3.2K views
Unlock Value from Big Data with Apache NiFi and Streaming CDC by Hortonworks
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks4.5K views

Recently uploaded

20231123_Camunda Meetup Vienna.pdf by
20231123_Camunda Meetup Vienna.pdf20231123_Camunda Meetup Vienna.pdf
20231123_Camunda Meetup Vienna.pdfPhactum Softwareentwicklung GmbH
41 views73 slides
Future of Indian ConsumerTech by
Future of Indian ConsumerTechFuture of Indian ConsumerTech
Future of Indian ConsumerTechKapil Khandelwal (KK)
21 views68 slides
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院IttrainingIttraining
52 views8 slides
Design Driven Network Assurance by
Design Driven Network AssuranceDesign Driven Network Assurance
Design Driven Network AssuranceNetwork Automation Forum
15 views42 slides
HTTP headers that make your website go faster - devs.gent November 2023 by
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023Thijs Feryn
22 views151 slides
Democratising digital commerce in India-Report by
Democratising digital commerce in India-ReportDemocratising digital commerce in India-Report
Democratising digital commerce in India-ReportKapil Khandelwal (KK)
15 views161 slides

Recently uploaded(20)

【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
6g - REPORT.pdf by Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson85 views
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab19 views

NextGen Apache Hadoop MapReduce

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.