SlideShare a Scribd company logo
1 of 24
Hadoop World 2011

Hadoop Stack: Then, Now and
Future
In the beginning

                                                                                      CORE HADOOP COMPONENTS
   Hadoop was a platform for data
   storage and processing that is…                                            Hadoop                                MapReduce
                                                                          Distributed File
        Scalable                                                         System (HDFS)
        Fault tolerant
        Open source                                                        File Sharing & Data
                                                                             Protection Across
                                                                                                                  Distributed Computing
                                                                                                                 Across Physical Servers
                                                                             Physical Servers




             Flexibility                                   Scalability                                           Low Cost
 A single repository for storing       Scale-out architecture divides                                Can be deployed on commodity
  processing & analyzing any type of     workloads across multiple nodes                                hardware
  data                                  Flexible file system eliminates ETL                           Open source platform guards
 Not bound by a single schema           bottlenecks                                                    against vendor lock




   2                                    ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                                       Reproduction or redistribution without written permission is
                                                               prohibited.
A good start

     Apache Hadoop




                                                           Shell / CLI
                      Data Processing                             Resource Management
                                                         File storage
                     Formats                                  RPC                      Compression




3                        ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                        Reproduction or redistribution without written permission is
                                                prohibited.
Core use cases

    • Data processing
      – Search index building
      – Click sessionization




4                   ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                   Reproduction or redistribution without written permission is
                                           prohibited.
We were here
             100%             100%
Core
Hadoop                                                                                  58%
                                                          37%                                            37%                31%
as % of
New
Patches
            2006             2007                       2008                          2009              2010               2011
           • Core Hadoop   • Core Hadoop            •   Core Hadoop              •   Core Hadoop   •   Core Hadoop   •   Core Hadoop
                                                    •   HBase                    •   HBase         •   HBase         •   HBase
                                                    •   Zookeeper                •   Pig           •   Pig           •   Pig
                                                    •   Mahout                   •   Zookeeper     •   Zookeeper     •   Zookeeper
                                                                                 •   Mahout        •   Mahout        •   Mahout
                                                                                 •   Hive          •   Hive          •   Hive
Relevant                                                                                           •   Avro          •   Avro
Projects                                                                                           •   Whirr         •   Whirr
                                                                                                   •   Sqoop         •   Sqoop
                                                                                                                     •   HCatalog
                                                                                                                     •   Mrunit
                                                                                                                     •   Bigtop
                                                                                                                     •   Oozie




  5                                ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                                  Reproduction or redistribution without written permission is
                                                          prohibited.
First cut at the system



                                                               Shell / CLI
                 Languages             Libraries         Workflow
                      Data Processing          Resource Management
                                   Metadata storage
                                                         Record storage
                                                          File storage
                                                          Coordination
                   Formats                                        RPC          Compression




6                ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                Reproduction or redistribution without written permission is
                                        prohibited.
Underlying projects & communities
                        Apache Pig,
Apache Hadoop
                        Hive, Mahout




                                                                           Shell / CLI
                             Languages             Libraries         Workflow
Apache Hive                       Data Processing          Resource Management
                                               Metadata storage
                                                                     Record storage
                                                                      File storage
Apache                                                                Coordination
HBase
                               Formats                                        RPC          Compression
                Apache            Apache
                Zookeeper         Avro

 7                           ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                            Reproduction or redistribution without written permission is
                                                    prohibited.
Core use cases

    • Data processing
      – Search index building
      – Click sessionization
      – Data processing pipelines
    • Analytics
      – Machine learning
      – Batch reporting
    • Live content serving (for the braver folks)

8                   ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                   Reproduction or redistribution without written permission is
                                           prohibited.
We were here
             100%             100%
Core
Hadoop                                                                                  58%
                                                          37%                                            37%                31%
as % of
New
Patches
            2006             2007                       2008                          2009              2010               2011
           • Core Hadoop   • Core Hadoop            •   Core Hadoop              •   Core Hadoop   •   Core Hadoop   •   Core Hadoop
                                                    •   HBase                    •   HBase         •   HBase         •   HBase
                                                    •   Zookeeper                •   Pig           •   Pig           •   Pig
                                                    •   Mahout                   •   Zookeeper     •   Zookeeper     •   Zookeeper
                                                                                 •   Mahout        •   Mahout        •   Mahout
                                                                                 •   Hive          •   Hive          •   Hive
Relevant                                                                                           •   Avro          •   Avro
Projects                                                                                           •   Whirr         •   Whirr
                                                                                                   •   Sqoop         •   Sqoop
                                                                                                                     •   HCatalog
                                                                                                                     •   Mrunit
                                                                                                                     •   Bigtop
                                                                                                                     •   Oozie




  9                                ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                                  Reproduction or redistribution without written permission is
                                                          prohibited.
Where we are today



                                     Web                                    Shell / CLI             Drivers
      Files
                            Languages        Libraries    Workflow   Scheduling
                                   Data Processing        Resource Management
              Integration



                                                Metadata storage
     RDBMS
                                                                     Record storage
                                                                      File storage
     Logs &                                                           Coordination
     events
                            Formats                         RPC                   Authentication   Compression




10                           ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                            Reproduction or redistribution without written permission is
                                                    prohibited.
Where we are today
                   Hue                 Apache Pig,                                                          Apache     JDBC /
Apache Hadoop
                                       Hive, Mahout                                                         Oozie      ODBC

Apache
Sqoop
                                                  Web                                    Shell / CLI             Drivers
                Files
                                        Languages         Libraries    Workflow   Scheduling
Apache Hive,                                    Data Processing        Resource Management
                         Integration



HCatalog
                                                             Metadata storage
               RDBMS
                                                                                  Record storage
                                                                                   File storage
Apache         Logs &                                                              Coordination
HBase          events
                                         Formats    RPC                                        Authentication   Compression
Apache           Apache                      Apache
Flume            Zookeeper                   Avro

 11                                       ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                                         Reproduction or redistribution without written permission is
                                                                 prohibited.
Core use cases

 • Data processing
     – Search index building
     – Click sessionization
     – Data processing pipelines
 • Analytics
     – Machine learning
     – Batch reporting
 • Real time applications
     – Content serving
     – System management
     – Real-time aggregates & counters
 • Storage
     – EDW archive



12                         ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                          Reproduction or redistribution without written permission is
                                                  prohibited.
Current state
             100%             100%
Core
Hadoop                                                                                  58%
                                                          37%                                            37%                31%
as % of
New
Patches
            2006             2007                       2008                          2009              2010               2011
           • Core Hadoop   • Core Hadoop            •   Core Hadoop              •   Core Hadoop   •   Core Hadoop   •   Core Hadoop
                                                    •   HBase                    •   HBase         •   HBase         •   HBase
                                                    •   Zookeeper                •   Pig           •   Pig           •   Pig
                                                    •   Mahout                   •   Zookeeper     •   Zookeeper     •   Zookeeper
                                                                                 •   Mahout        •   Mahout        •   Mahout
                                                                                 •   Hive          •   Hive          •   Hive
Relevant                                                                                           •   Avro          •   Avro
Projects                                                                                           •   Whirr         •   Whirr
                                                                                                   •   Sqoop         •   Sqoop
                                                                                                                     •   HCatalog
                                                                                                                     •   Mrunit
                                                                                                                     •   Bigtop
                                                                                                                     •   Oozie




  13                               ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                                  Reproduction or redistribution without written permission is
                                                          prohibited.
Limitations

     Redundancy - DAG, RPC, serialization, integration, etc.

     Uniformity - diff components require diff DBs, mgt interfaces,
     etc.

     Ease of use - improving but still an obstacle. Eg non-native
     file formats require integration.

     Multi-datacenter - cross-DC repl. for HBase but not HDFS.

     Interoperability - requires conversions, end-user integration.




14                        ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                         Reproduction or redistribution without written permission is
                                                 prohibited.
Ongoing work


 Metadata repos - shared schema and data
 types, table abstraction via Apache HCat
 (incubating) and Apache Hive.
 Self-describing data via Apache Avro.




15              ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
               Reproduction or redistribution without written permission is
                                       prohibited.
Ongoing work: Apache Bigtop

 Dedicated to Hadoop stack integration and testing.

 Integration - between projects, dependencies, hosts.

 Testing - interoperability, multi-component use cases.

 100% Apache projects, using upstream releases.
 Participants across the ecosystem - join us!
 http://incubator.apache.org/bigtop




16                    ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                     Reproduction or redistribution without written permission is
                                             prohibited.
Technical trends - software

 • Moving more forms of computation to
   Hadoop storage
 • Frameworks to make HBase more
   application and developer friendly
 • Taking advantage of pluggability to provide
   more optimized formats, schedulers,
   codecs, etc
 • More granular security models


17               ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                Reproduction or redistribution without written permission is
                                        prohibited.
Technical trends - hardware

 •Increasingly powerful hosts
    l# cores and memory

    lNetwork - 10/40 gige

    lStorage - 48/60 TB hosts. Flash.

 •Cloud - multi-tenancy and virtualization
 •Low power CPUs




18               ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                Reproduction or redistribution without written permission is
                                        prohibited.
Enable future use cases pt 1

 More valuable data
 •Cost = gravity. Data flows downhill to cheapest store.
 •High-value data not just generated but also consumed by
 the platform ie more processing is done within the system
 before leaving.

 Richer end user applications
 •Apps built directly on the platform (eBay’s Cassini,
 Facebook messages, etc)
 •Web 3.0 – data centric apps. Apps move over common
 data sources vs tightly coupled to their data.



19                   ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                    Reproduction or redistribution without written permission is
                                            prohibited.
Enable future use cases pt 2

 Lower latency / higher interactivity

 •Low latency response times for applications
 •Interactive - human-driven, correlated access, eg
 analytics
 •Low latency query execution and in-memory
 datasets.
 •Resource management - batch and interactive
 workloads



20                  ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                   Reproduction or redistribution without written permission is
                                           prohibited.
Enable future use cases pt 3

 Hadoop meets ILM

 Policy - access control, std mgt interfaces, SLAs. MDM,
 etc.

 Operation - disaster recovery, archive, etc.

 Traditional features - availability, snapshots, mirroring,
 ACLs, integration via standard protocols.




21                    ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                     Reproduction or redistribution without written permission is
                                             prohibited.
Things to look forward to


                                        Web                                    Shell / CLI             Drivers

      Files                   Languages                    Libraries    Workflow                       Scheduling
                             MapReduce                 Stream      Graph      MPI                         Other
                                                               Resource Management
               Integration


                                                                      Metadata storage
     RDBMS                    Time Series                               ORM                   OLAP       OLTP
                                                                        Record storage
                                                                         File storage
      Logs &                                                             Coordination
      events
                               Formats                         RPC                   Authentication   Compression




22                              ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                               Reproduction or redistribution without written permission is
                                                       prohibited.
Getting crowded…
                   Hue                 Apache Pig, Apache S4                                            X-Rime    Apache   JDBC /
Apache Hadoop
                                       Hive, Mahout Storm                                               Giraph    Oozie    ODBC

                                                  Web                                    Shell / CLI                 Drivers
Apache
Sqoop           Files                   Languages                    Libraries    Workflow                           Scheduling
                                       MapReduce                 Stream      Graph      MPI                             Other
                                                                         Resource Management
                         Integration

Apache Hive,                                                                    Metadata storage
HCatalog
               RDBMS                    Time Series                               ORM                      OLAP        OLTP
OpenTSDB                                                                          Record storage
                                                                                   File storage
Apache         Logs &                                                              Coordination
HBase          events
                                         Formats   RPC     Authentication                                           Compression
Apache           Apache                      Apache Apache
Flume            Zookeeper                   Avro   Gora                                                               Omid


 23                                       ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                                         Reproduction or redistribution without written permission is
                                                                 prohibited.
We appreciate your time and
                       interest in

              For Additional Information:


                +1 (888) 789-1488                                       twitter.com/
                                                                         cloudera
                  sales@cloudera.com                     cloudera.com
                                                                        facebook.com/
                                                                          cloudera




24    ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
     Reproduction or redistribution without written permission is
                             prohibited.

More Related Content

What's hot

Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleMapR Technologies
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drilltshiran
 
Deploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopDeploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopGeorge Ang
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionBenoit Perroud
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesyarapavan
 

What's hot (20)

Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
 
Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Presentation
PresentationPresentation
Presentation
 
Deploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopDeploying Grid Services Using Hadoop
Deploying Grid Services Using Hadoop
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 

Viewers also liked

A Generative Method for Infrastructure Emergence
A Generative Method for Infrastructure EmergenceA Generative Method for Infrastructure Emergence
A Generative Method for Infrastructure Emergencewhichlight
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariJayush Luniya
 
A complete hadoop stack
A complete hadoop stackA complete hadoop stack
A complete hadoop stackAbhra Pal
 
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to..."Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to...Cask Data
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
 
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3 Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3 Cask Data
 
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...Cask Data
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReducefvanvollenhoven
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionDataWorks Summit/Hadoop Summit
 
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop MeetupIntegrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetupgethue
 

Viewers also liked (20)

A Generative Method for Infrastructure Emergence
A Generative Method for Infrastructure EmergenceA Generative Method for Infrastructure Emergence
A Generative Method for Infrastructure Emergence
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
A complete hadoop stack
A complete hadoop stackA complete hadoop stack
A complete hadoop stack
 
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to..."Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
 
Big Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An Introduction
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
Amazon Elastic Computing 2
Amazon Elastic Computing 2Amazon Elastic Computing 2
Amazon Elastic Computing 2
 
Taller hadoop
Taller hadoopTaller hadoop
Taller hadoop
 
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3 Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
 
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
 
Hadoop administration
Hadoop administrationHadoop administration
Hadoop administration
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduceHadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
 
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop MeetupIntegrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
 

Similar to Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Collins & Charles Zedlewski, Cloudera

Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01eimhee
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014Hortonworks
 
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...Big Data Spain
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueDeep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueTimothy Spann
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
Strata feb2013
Strata feb2013Strata feb2013
Strata feb2013alanfgates
 
NYC-Meetup- Introduction to Hadoop Echosystem
NYC-Meetup- Introduction to Hadoop EchosystemNYC-Meetup- Introduction to Hadoop Echosystem
NYC-Meetup- Introduction to Hadoop EchosystemAL500745425
 
Searching conversations with hadoop
Searching conversations with hadoopSearching conversations with hadoop
Searching conversations with hadoopDataWorks Summit
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use casesJoey Echeverria
 

Similar to Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Collins & Charles Zedlewski, Cloudera (20)

Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
Coordinating the Many Tools of Big Data - Apache HCatalog, Apache Pig and Apa...
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueDeep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Strata feb2013
Strata feb2013Strata feb2013
Strata feb2013
 
NYC-Meetup- Introduction to Hadoop Echosystem
NYC-Meetup- Introduction to Hadoop EchosystemNYC-Meetup- Introduction to Hadoop Echosystem
NYC-Meetup- Introduction to Hadoop Echosystem
 
Searching conversations with hadoop
Searching conversations with hadoopSearching conversations with hadoop
Searching conversations with hadoop
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use cases
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Collins & Charles Zedlewski, Cloudera

  • 1. Hadoop World 2011 Hadoop Stack: Then, Now and Future
  • 2. In the beginning CORE HADOOP COMPONENTS Hadoop was a platform for data storage and processing that is… Hadoop MapReduce Distributed File  Scalable System (HDFS)  Fault tolerant  Open source File Sharing & Data Protection Across Distributed Computing Across Physical Servers Physical Servers Flexibility Scalability Low Cost  A single repository for storing  Scale-out architecture divides  Can be deployed on commodity processing & analyzing any type of workloads across multiple nodes hardware data  Flexible file system eliminates ETL  Open source platform guards  Not bound by a single schema bottlenecks against vendor lock 2 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 3. A good start Apache Hadoop Shell / CLI Data Processing Resource Management File storage Formats RPC Compression 3 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 4. Core use cases • Data processing – Search index building – Click sessionization 4 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 5. We were here 100% 100% Core Hadoop 58% 37% 37% 31% as % of New Patches 2006 2007 2008 2009 2010 2011 • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • HBase • HBase • HBase • HBase • Zookeeper • Pig • Pig • Pig • Mahout • Zookeeper • Zookeeper • Zookeeper • Mahout • Mahout • Mahout • Hive • Hive • Hive Relevant • Avro • Avro Projects • Whirr • Whirr • Sqoop • Sqoop • HCatalog • Mrunit • Bigtop • Oozie 5 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 6. First cut at the system Shell / CLI Languages Libraries Workflow Data Processing Resource Management Metadata storage Record storage File storage Coordination Formats RPC Compression 6 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 7. Underlying projects & communities Apache Pig, Apache Hadoop Hive, Mahout Shell / CLI Languages Libraries Workflow Apache Hive Data Processing Resource Management Metadata storage Record storage File storage Apache Coordination HBase Formats RPC Compression Apache Apache Zookeeper Avro 7 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 8. Core use cases • Data processing – Search index building – Click sessionization – Data processing pipelines • Analytics – Machine learning – Batch reporting • Live content serving (for the braver folks) 8 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 9. We were here 100% 100% Core Hadoop 58% 37% 37% 31% as % of New Patches 2006 2007 2008 2009 2010 2011 • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • HBase • HBase • HBase • HBase • Zookeeper • Pig • Pig • Pig • Mahout • Zookeeper • Zookeeper • Zookeeper • Mahout • Mahout • Mahout • Hive • Hive • Hive Relevant • Avro • Avro Projects • Whirr • Whirr • Sqoop • Sqoop • HCatalog • Mrunit • Bigtop • Oozie 9 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 10. Where we are today Web Shell / CLI Drivers Files Languages Libraries Workflow Scheduling Data Processing Resource Management Integration Metadata storage RDBMS Record storage File storage Logs & Coordination events Formats RPC Authentication Compression 10 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 11. Where we are today Hue Apache Pig, Apache JDBC / Apache Hadoop Hive, Mahout Oozie ODBC Apache Sqoop Web Shell / CLI Drivers Files Languages Libraries Workflow Scheduling Apache Hive, Data Processing Resource Management Integration HCatalog Metadata storage RDBMS Record storage File storage Apache Logs & Coordination HBase events Formats RPC Authentication Compression Apache Apache Apache Flume Zookeeper Avro 11 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 12. Core use cases • Data processing – Search index building – Click sessionization – Data processing pipelines • Analytics – Machine learning – Batch reporting • Real time applications – Content serving – System management – Real-time aggregates & counters • Storage – EDW archive 12 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 13. Current state 100% 100% Core Hadoop 58% 37% 37% 31% as % of New Patches 2006 2007 2008 2009 2010 2011 • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • Core Hadoop • HBase • HBase • HBase • HBase • Zookeeper • Pig • Pig • Pig • Mahout • Zookeeper • Zookeeper • Zookeeper • Mahout • Mahout • Mahout • Hive • Hive • Hive Relevant • Avro • Avro Projects • Whirr • Whirr • Sqoop • Sqoop • HCatalog • Mrunit • Bigtop • Oozie 13 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 14. Limitations Redundancy - DAG, RPC, serialization, integration, etc. Uniformity - diff components require diff DBs, mgt interfaces, etc. Ease of use - improving but still an obstacle. Eg non-native file formats require integration. Multi-datacenter - cross-DC repl. for HBase but not HDFS. Interoperability - requires conversions, end-user integration. 14 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 15. Ongoing work Metadata repos - shared schema and data types, table abstraction via Apache HCat (incubating) and Apache Hive. Self-describing data via Apache Avro. 15 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 16. Ongoing work: Apache Bigtop Dedicated to Hadoop stack integration and testing. Integration - between projects, dependencies, hosts. Testing - interoperability, multi-component use cases. 100% Apache projects, using upstream releases. Participants across the ecosystem - join us! http://incubator.apache.org/bigtop 16 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 17. Technical trends - software • Moving more forms of computation to Hadoop storage • Frameworks to make HBase more application and developer friendly • Taking advantage of pluggability to provide more optimized formats, schedulers, codecs, etc • More granular security models 17 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 18. Technical trends - hardware •Increasingly powerful hosts l# cores and memory lNetwork - 10/40 gige lStorage - 48/60 TB hosts. Flash. •Cloud - multi-tenancy and virtualization •Low power CPUs 18 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 19. Enable future use cases pt 1 More valuable data •Cost = gravity. Data flows downhill to cheapest store. •High-value data not just generated but also consumed by the platform ie more processing is done within the system before leaving. Richer end user applications •Apps built directly on the platform (eBay’s Cassini, Facebook messages, etc) •Web 3.0 – data centric apps. Apps move over common data sources vs tightly coupled to their data. 19 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 20. Enable future use cases pt 2 Lower latency / higher interactivity •Low latency response times for applications •Interactive - human-driven, correlated access, eg analytics •Low latency query execution and in-memory datasets. •Resource management - batch and interactive workloads 20 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 21. Enable future use cases pt 3 Hadoop meets ILM Policy - access control, std mgt interfaces, SLAs. MDM, etc. Operation - disaster recovery, archive, etc. Traditional features - availability, snapshots, mirroring, ACLs, integration via standard protocols. 21 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 22. Things to look forward to Web Shell / CLI Drivers Files Languages Libraries Workflow Scheduling MapReduce Stream Graph MPI Other Resource Management Integration Metadata storage RDBMS Time Series ORM OLAP OLTP Record storage File storage Logs & Coordination events Formats RPC Authentication Compression 22 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 23. Getting crowded… Hue Apache Pig, Apache S4 X-Rime Apache JDBC / Apache Hadoop Hive, Mahout Storm Giraph Oozie ODBC Web Shell / CLI Drivers Apache Sqoop Files Languages Libraries Workflow Scheduling MapReduce Stream Graph MPI Other Resource Management Integration Apache Hive, Metadata storage HCatalog RDBMS Time Series ORM OLAP OLTP OpenTSDB Record storage File storage Apache Logs & Coordination HBase events Formats RPC Authentication Compression Apache Apache Apache Apache Flume Zookeeper Avro Gora Omid 23 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 24. We appreciate your time and interest in For Additional Information: +1 (888) 789-1488 twitter.com/ cloudera sales@cloudera.com cloudera.com facebook.com/ cloudera 24 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.