SlideShare a Scribd company logo
1 of 10
The State of the Apache
  Hadoop Ecosystem

         Doug Cutting
      Cloudera & Apache
Outline
● the ecosystem
    ○   why we need it
    ○   what it is
    ○   why its strong
    ○   how it can evolve
●   highlights
    ○ current
    ○ next
●   wrap up
Why are we here?

Hardware has improved
  ● exponentially for decades
  ● both storage and compute

We can now store and process much more!
  ○ yet have been slow to leverage


Analyzing more data makes us smarter.
  ○ Norvig's Unreasonable Effectiveness of Data
The Ecosystem is the System
● Hadoop has become the kernel
  ○ of the distributed operating system for Big Data
  ○ a de-facto industry standard


● No one uses the kernel alone

● A collection of projects at Apache
Strengths of Apache
Mandates diversity & transparency
  ○ you control your fate

Insures against vendor lock-in
   ○ can't buy the ASF

Allows competing projects
    ○ survival of the fittest

Ecosystem as loose federation
   ○ lets platform evolve
What's new?
● Apache Hadoop 0.20.205
    ○ append
    ○ security


●   CDH3
    ○ Mahout included
    ○ Avro support across components
What's next?
● Apache Hadoop 0.23
   ○ HDFS
     ■ performance
     ■ scalability (federation)
     ■ availability (HA)
   ○ MR2


● CDH4
   ○ includes Hadoop 0.23
   ○ BigTop-based


● S4, Giraph, Crunch, Blur, ...
Apache BigTop (incubating)
Ecosystem as a project
  ○   integration tests       Includes:
  ○   compatible versions     ●   Hadoop
  ○   common packaging        ●   HBase
  ○   release is a set        ●   Zookeeper
                              ●   Avro
                              ●   Hive
Basis for CDH                 ●   Pig
  ○ like Fedora is for RHEL   ●   Oozie
                              ●   Flume
                              ●   Mahout
Community driven              ●   ...
Join the community
Hadoop and Big Data are still young.
  Hardware trends will continue.

Hadoop started with just two developers.
  Now it has hundreds.
  You can be the next.
  What do you need?
Thanks!
Questions?

More Related Content

Viewers also liked

SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1
SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1
SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1
Tiago Rafael
 
Legal Hold and Data Preservation Best Practices
Legal Hold and Data Preservation Best PracticesLegal Hold and Data Preservation Best Practices
Legal Hold and Data Preservation Best Practices
Zapproved
 
Docker Based Hadoop Provisioning
Docker Based Hadoop ProvisioningDocker Based Hadoop Provisioning
Docker Based Hadoop Provisioning
DataWorks Summit
 

Viewers also liked (13)

SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1
SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1
SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1
 
Legal Hold and Data Preservation Best Practices
Legal Hold and Data Preservation Best PracticesLegal Hold and Data Preservation Best Practices
Legal Hold and Data Preservation Best Practices
 
Unik Slides
Unik SlidesUnik Slides
Unik Slides
 
4 infatec02
4 infatec024 infatec02
4 infatec02
 
El uso de la tecnología para aumentar el aprovechamiento académico en las cie...
El uso de la tecnología para aumentar el aprovechamiento académico en las cie...El uso de la tecnología para aumentar el aprovechamiento académico en las cie...
El uso de la tecnología para aumentar el aprovechamiento académico en las cie...
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
4 infatec03
4 infatec034 infatec03
4 infatec03
 
Docker Based Hadoop Provisioning
Docker Based Hadoop ProvisioningDocker Based Hadoop Provisioning
Docker Based Hadoop Provisioning
 
4 infatec06
4 infatec064 infatec06
4 infatec06
 
Mutação gênica
Mutação gênicaMutação gênica
Mutação gênica
 
Desigualdade de gênero
Desigualdade de gêneroDesigualdade de gênero
Desigualdade de gênero
 
Sceneries
SceneriesSceneries
Sceneries
 

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Recently uploaded (20)

AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 

Hadoop World 2011 Keynote: The State of the Apache Hadoop Ecosystem

  • 1. The State of the Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache
  • 2. Outline ● the ecosystem ○ why we need it ○ what it is ○ why its strong ○ how it can evolve ● highlights ○ current ○ next ● wrap up
  • 3. Why are we here? Hardware has improved ● exponentially for decades ● both storage and compute We can now store and process much more! ○ yet have been slow to leverage Analyzing more data makes us smarter. ○ Norvig's Unreasonable Effectiveness of Data
  • 4. The Ecosystem is the System ● Hadoop has become the kernel ○ of the distributed operating system for Big Data ○ a de-facto industry standard ● No one uses the kernel alone ● A collection of projects at Apache
  • 5. Strengths of Apache Mandates diversity & transparency ○ you control your fate Insures against vendor lock-in ○ can't buy the ASF Allows competing projects ○ survival of the fittest Ecosystem as loose federation ○ lets platform evolve
  • 6. What's new? ● Apache Hadoop 0.20.205 ○ append ○ security ● CDH3 ○ Mahout included ○ Avro support across components
  • 7. What's next? ● Apache Hadoop 0.23 ○ HDFS ■ performance ■ scalability (federation) ■ availability (HA) ○ MR2 ● CDH4 ○ includes Hadoop 0.23 ○ BigTop-based ● S4, Giraph, Crunch, Blur, ...
  • 8. Apache BigTop (incubating) Ecosystem as a project ○ integration tests Includes: ○ compatible versions ● Hadoop ○ common packaging ● HBase ○ release is a set ● Zookeeper ● Avro ● Hive Basis for CDH ● Pig ○ like Fedora is for RHEL ● Oozie ● Flume ● Mahout Community driven ● ...
  • 9. Join the community Hadoop and Big Data are still young. Hardware trends will continue. Hadoop started with just two developers. Now it has hundreds. You can be the next. What do you need?