Welcome to Hadoop World: NYC 2009
Hadoop is Everywhere
                          Presents:




Christophe Bisciglia
Founder christophe@cloudera.com
Hadoop World Details and Event Updates
Too Late to Print
▪   WiFi Details                    ▪   UI BOF
    ▪   SSID: HadoopWorld               ▪   Lead: Philip Zeyliger, Cloudera
    ▪   Password: hadoop09              ▪   Vanderbilt Suite, Afternoon Break
▪   Twitter: #hadoopworld           ▪   HBase BOF
                                        ▪   Lead: Michael Stack, Microsoft
▪   Break Out Sessions                  ▪   Terrace Ballroom, Afternoon Break
    ▪   Applications (This Room)
    ▪   Dev / Admin: Terrace Ballroom (Across Lobby)
    ▪   Extensions: Vanderbilt Suite (One Floor Up)
Hadoop World Sponsors
Thanks!
Why Hadoop World?
Time to Upgrade Your Data Management Strategy
▪   Hadoop isn’t just for Web Companies anymore
    ▪   Terabytes are common place
    ▪   Enables consumption of all enterprise data
    ▪   Wide adoption across verticals
▪   Hadoop is driven by the Community
    ▪   Most registrants are new to Hadoop
    ▪   Sharing experience is critical - and incredibly valuable
    ▪   Users and Developers exchanging needs and ideas
Growing Up with Hadoop
You’ve come a long way baby...
Growing Up with Hadoop
You’ve come a long way baby...

▪   Early Days
    ▪   2004: Google Publishes MapReduce/GFS
    ▪   2005: Hadoop Prototype
        ▪   Doug Cutting and Mike Cafarella
    ▪   2006: Hadoop Running on 20 nodes
        ▪   Internet Archive and UW



                                                  Doug Cutting
                                               Photo Credit: New York Times
Growing Up with Hadoop
You’ve come a long way baby...

▪   Formative Years
    ▪   2006: Yahoo! Begins Major Investment
    ▪   2007: Yahoo! Runs Hadoop on 2000 nodes
    ▪   2008: Yahoo! uses Hadoop to claim Terasort
        Benchmark
Growing Up with Hadoop
You’ve come a long way baby...



▪   5 Major Releases for Hadoop in last year
    ▪   More Reliable
    ▪   More Scalable
    ▪   More Manageable
Growing Up with Hadoop
You’ve come a long way baby...




▪   New Sub-Projects Embrace New Users
    ▪   Hive: SQL Data Warehouse for Hadoop
    ▪   Pig: Data Analysis Language
Growing Up with Hadoop
You’ve come a long way baby...




▪   Sqoop: Database import for Hadoop
    ▪   Developer by Aaron Kimball, Cloudera
    ▪   Works over JDBC
    ▪   Extensible for better pefromance
Growing Up with Hadoop
You’ve come a long way baby...




▪   RDBMS Vendors Embrace Hadoop
    ▪   MapReduce is great for Analytics
    ▪   Hadoop is the MapReduce Standard
    ▪           integrates directly with Hadoop
Growing Up with Hadoop
You’ve come a long way baby...




▪   Adoption Spanning Globe
    ▪   HUGs outside the US
    ▪   Over 10x Companies “PoweredBy”
    ▪   Not Just for Web Companies Anymore
Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community
Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community




 Hadoop Community
Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community


Latest Stable Hadoop Release

Stable Upcoming Features       Distribution for Hadoop
  (by customer request)




  Hadoop Community
Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community

                                                             Source Code Powering Y!
Latest Stable Hadoop Release
                                                         Improvements for EC2 and S3
Stable Upcoming Features       Distribution for Hadoop
  (by customer request)
                                                          New Features from Cloudera




  Hadoop Community
Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community

                                                             Source Code Powering Y!
Latest Stable Hadoop Release
                                                         Improvements for EC2 and S3
Stable Upcoming Features       Distribution for Hadoop
  (by customer request)
                                                          New Features from Cloudera


                               Cloudera Enhancements
                                      Bug Fixes

  Hadoop Community               Contributed to Apache
Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community



                 Distribution for Hadoop

                  Cross-Platform Packaging,
                  Integration and Testing

                     Hive, Pig, Sqoop, ...

                          Support
Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community



   Private Cloud
                              Distribution for Hadoop

                              Cross-Platform Packaging,
                               Integration and Testing

                                 Hive, Pig, Sqoop, ...

                                      Support


                   Pac
                      kag
                         es
Cloudera’s Distribution for Hadoop
Delivering Hadoop to a Larger Community



   Private Cloud                                                   Public Cloud
                              Distribution for Hadoop

                              Cross-Platform Packaging,
                               Integration and Testing

                                 Hive, Pig, Sqoop, ...

                                      Support


                   Pac
                      kag                                    ges
                         es                               Ima
Comparing Growth Rates since March 2009
Standard Packaging Drives Adoption

▪   Consistent Downloads                      Cloudera Downloads

    from Apache                               Apache Downloads
                                                                                                                           1,835%




    Cloudera Packages
                                                                                                            1,392%
▪

    Drive New Usage
                                                                                             1,026%




                                                                               762%

▪   Enables New Hadoop
    Applications                                                  384%


                                                     238%


                                       100%
                                                                                      133%
                                              100%          96%          95%                          93%            97%            95%


                                      March 2009       May 2009           July 09 Aug 09 Sept 09
Normalized by unique users accessing hadoop.apache.org/core/releases.html and Cloudera Package
Repositories in March 2009
Cloudera’s Business to Date
Support, Training and Professional Services
▪   Dozens of Support Customers
    ▪   Using Hadoop for real enterprise workloads

▪   Training and Certification
    ▪   100’s of engineers trained
    ▪   Sysadmin and Manager programs launched at Hadoop World

▪   Professional Services

Hw09 Welcome To Hadoop World

  • 1.
    Welcome to HadoopWorld: NYC 2009 Hadoop is Everywhere Presents: Christophe Bisciglia Founder christophe@cloudera.com
  • 2.
    Hadoop World Detailsand Event Updates Too Late to Print ▪ WiFi Details ▪ UI BOF ▪ SSID: HadoopWorld ▪ Lead: Philip Zeyliger, Cloudera ▪ Password: hadoop09 ▪ Vanderbilt Suite, Afternoon Break ▪ Twitter: #hadoopworld ▪ HBase BOF ▪ Lead: Michael Stack, Microsoft ▪ Break Out Sessions ▪ Terrace Ballroom, Afternoon Break ▪ Applications (This Room) ▪ Dev / Admin: Terrace Ballroom (Across Lobby) ▪ Extensions: Vanderbilt Suite (One Floor Up)
  • 3.
  • 4.
    Why Hadoop World? Timeto Upgrade Your Data Management Strategy ▪ Hadoop isn’t just for Web Companies anymore ▪ Terabytes are common place ▪ Enables consumption of all enterprise data ▪ Wide adoption across verticals ▪ Hadoop is driven by the Community ▪ Most registrants are new to Hadoop ▪ Sharing experience is critical - and incredibly valuable ▪ Users and Developers exchanging needs and ideas
  • 5.
    Growing Up withHadoop You’ve come a long way baby...
  • 6.
    Growing Up withHadoop You’ve come a long way baby... ▪ Early Days ▪ 2004: Google Publishes MapReduce/GFS ▪ 2005: Hadoop Prototype ▪ Doug Cutting and Mike Cafarella ▪ 2006: Hadoop Running on 20 nodes ▪ Internet Archive and UW Doug Cutting Photo Credit: New York Times
  • 7.
    Growing Up withHadoop You’ve come a long way baby... ▪ Formative Years ▪ 2006: Yahoo! Begins Major Investment ▪ 2007: Yahoo! Runs Hadoop on 2000 nodes ▪ 2008: Yahoo! uses Hadoop to claim Terasort Benchmark
  • 8.
    Growing Up withHadoop You’ve come a long way baby... ▪ 5 Major Releases for Hadoop in last year ▪ More Reliable ▪ More Scalable ▪ More Manageable
  • 9.
    Growing Up withHadoop You’ve come a long way baby... ▪ New Sub-Projects Embrace New Users ▪ Hive: SQL Data Warehouse for Hadoop ▪ Pig: Data Analysis Language
  • 10.
    Growing Up withHadoop You’ve come a long way baby... ▪ Sqoop: Database import for Hadoop ▪ Developer by Aaron Kimball, Cloudera ▪ Works over JDBC ▪ Extensible for better pefromance
  • 11.
    Growing Up withHadoop You’ve come a long way baby... ▪ RDBMS Vendors Embrace Hadoop ▪ MapReduce is great for Analytics ▪ Hadoop is the MapReduce Standard ▪ integrates directly with Hadoop
  • 12.
    Growing Up withHadoop You’ve come a long way baby... ▪ Adoption Spanning Globe ▪ HUGs outside the US ▪ Over 10x Companies “PoweredBy” ▪ Not Just for Web Companies Anymore
  • 13.
    Cloudera’s Distribution forHadoop Delivering Hadoop to a Larger Community
  • 14.
    Cloudera’s Distribution forHadoop Delivering Hadoop to a Larger Community Hadoop Community
  • 15.
    Cloudera’s Distribution forHadoop Delivering Hadoop to a Larger Community Latest Stable Hadoop Release Stable Upcoming Features Distribution for Hadoop (by customer request) Hadoop Community
  • 16.
    Cloudera’s Distribution forHadoop Delivering Hadoop to a Larger Community Source Code Powering Y! Latest Stable Hadoop Release Improvements for EC2 and S3 Stable Upcoming Features Distribution for Hadoop (by customer request) New Features from Cloudera Hadoop Community
  • 17.
    Cloudera’s Distribution forHadoop Delivering Hadoop to a Larger Community Source Code Powering Y! Latest Stable Hadoop Release Improvements for EC2 and S3 Stable Upcoming Features Distribution for Hadoop (by customer request) New Features from Cloudera Cloudera Enhancements Bug Fixes Hadoop Community Contributed to Apache
  • 18.
    Cloudera’s Distribution forHadoop Delivering Hadoop to a Larger Community Distribution for Hadoop Cross-Platform Packaging, Integration and Testing Hive, Pig, Sqoop, ... Support
  • 19.
    Cloudera’s Distribution forHadoop Delivering Hadoop to a Larger Community Private Cloud Distribution for Hadoop Cross-Platform Packaging, Integration and Testing Hive, Pig, Sqoop, ... Support Pac kag es
  • 20.
    Cloudera’s Distribution forHadoop Delivering Hadoop to a Larger Community Private Cloud Public Cloud Distribution for Hadoop Cross-Platform Packaging, Integration and Testing Hive, Pig, Sqoop, ... Support Pac kag ges es Ima
  • 21.
    Comparing Growth Ratessince March 2009 Standard Packaging Drives Adoption ▪ Consistent Downloads Cloudera Downloads from Apache Apache Downloads 1,835% Cloudera Packages 1,392% ▪ Drive New Usage 1,026% 762% ▪ Enables New Hadoop Applications 384% 238% 100% 133% 100% 96% 95% 93% 97% 95% March 2009 May 2009 July 09 Aug 09 Sept 09 Normalized by unique users accessing hadoop.apache.org/core/releases.html and Cloudera Package Repositories in March 2009
  • 22.
    Cloudera’s Business toDate Support, Training and Professional Services ▪ Dozens of Support Customers ▪ Using Hadoop for real enterprise workloads ▪ Training and Certification ▪ 100’s of engineers trained ▪ Sysadmin and Manager programs launched at Hadoop World ▪ Professional Services