Hadoop Grows Up!

Doug Cutting – Hadoop project founder
June 29th 2010
Hadoop is Almost 5 Years Old!




                                2
A Buzzword…




              3
A Career…




            Copyright 2010 Cloudera Inc. All rights reserved   4
An Ecosystem…




           Copyright 2010 Cloudera Inc. All rights reserved   5
A Market…




            Copyright 2010 Cloudera Inc. All rights reserved   6
An Emerging Platform for Applications…
 Graph analysis     Machine learning                     Scientific         Archive   Security
Query & reporting    Complex ETL                     Search quality           Fraud detection
  Clickstream analysis          POS analysis                    Trade compliance      And more…




                         Copyright 2010 Cloudera Inc. All rights reserved                         7
Hadoop Started From Humble Beginnings….


• MapReduce and HDFS only

• Good for experienced Java
  programmers

• Limited application set



                 Copyright 2010 Cloudera Inc. All rights reserved   8
Innovation: the Secret to Hadoop Success

• Projects & components
  develop around Hadoop
                                                                   “Provide more levels of
                                            “Provide common
                                                                   abstraction & automation
                                            technical services”
                                                                   for job creation ”
• User base grows
                             “Make it                                       “Cover more data
                                                                            movements –
• More applications          easier to get
                             data in & out”                                 inserts, appends,
                                                                            etc”
  are made possible




                Copyright 2010 Cloudera Inc. All rights reserved                              9
But Innovation Isn’t Free

• For every release of MapReduce                             20
  and HDFS, there are >20
                                                             10
  releases of related projects
                                                               0

• Every component has its own
  schedule, versioning,                                     HBase 0.89 HDFS 0.20
  dependencies & patch                                               Pig 0.7
  requirements                                              Hive 0.6      Oozie 2.0


• Hadoop community likes to
  build 2-3 of everything
                   Copyright 2010 Cloudera Inc. All rights reserved                   10
Announcing Cloudera’s Distribution for Hadoop v3
• Open source – 100% Apache licensed
                                                 • Simplified – Cloudera manages
                                                   required versions &
                                                   dependencies
                                                 • Integrated – all components
                                                   interoperate
                                                 • Reliable – patched with fixes
                                                   from future releases to
                                                   improve stability

• Easy to consume – Debian, RPM, tarball, Virtual Machine, EC2,
  Rackspace, Softlayer

                   Copyright 2010 Cloudera Inc. All rights reserved            11
What’s New in CDH v3?
• Updates to existing Hadoop frameworks
   • Pig 0.7
   • Sqoop 1.0
   • Hadoop 0.20S (planned)
• Support for 3 new related components
   • HBase – with durability
   • Zookeeper
   • Oozie – run workflows + support for Hive & Sqoop actions
• Introducing 2 new components
   • Flume – collect streaming data with centralized
      configuration & guaranteed delivery
   • Hue – web UI and SDK for Hadoop web applications
                   Copyright 2010 Cloudera Inc. All rights reserved   12
Charles Zedlewski,
Cloudera Product Management




          Copyright 2010 Cloudera Inc. All rights reserved   13
Harnessing Hadoop Has Challenges

                Skill Set – experts only



 Complexity – more than ten components



          Manageability – hard to configure, monitor & administer



 Interoperability – limited support for DBMS &
                    analytic tools
                                                              14
Announcing Cloudera Enterprise
• Reduces the risks of running Hadoop in production
• Improves consistency, compliance and administrative overhead

                                                                Management tools
                                                                • Monitoring & config for
                                                                  data integration
                                                                • Authorization mgmt &
                                                                  provisioning
                                                                • Resource mgmt


• Production support for CDH & certified integrations (e.g. Oracle,
  Vertica)
                    Copyright 2010 Cloudera Inc. All rights reserved                        15
Demo


Copyright 2010 Cloudera Inc. All rights reserved   16
Some Announcements

• Party at our place
  • Hackathon on CDH3 – applications, enhancements, open
    source contributions
  • July 27th, 9:30am – 7:30pm
  • For invite: hackathon@cloudera.com
  • Free food & snacks


• Or stay home and read
  • Hadoop the Definitive Guide, second edition
  • Available on October 12th at Hadoop World
                 Copyright 2010 Cloudera Inc. All rights reserved   17
Thank You!

• Stop by our table if you have questions!




                 Copyright 2010 Cloudera Inc. All rights reserved   18
Copyright 2010 Cloudera Inc. All rights reserved   19

Hadoop summit cloudera keynote_v5

  • 1.
    Hadoop Grows Up! DougCutting – Hadoop project founder June 29th 2010
  • 2.
    Hadoop is Almost5 Years Old! 2
  • 3.
  • 4.
    A Career… Copyright 2010 Cloudera Inc. All rights reserved 4
  • 5.
    An Ecosystem… Copyright 2010 Cloudera Inc. All rights reserved 5
  • 6.
    A Market… Copyright 2010 Cloudera Inc. All rights reserved 6
  • 7.
    An Emerging Platformfor Applications… Graph analysis Machine learning Scientific Archive Security Query & reporting Complex ETL Search quality Fraud detection Clickstream analysis POS analysis Trade compliance And more… Copyright 2010 Cloudera Inc. All rights reserved 7
  • 8.
    Hadoop Started FromHumble Beginnings…. • MapReduce and HDFS only • Good for experienced Java programmers • Limited application set Copyright 2010 Cloudera Inc. All rights reserved 8
  • 9.
    Innovation: the Secretto Hadoop Success • Projects & components develop around Hadoop “Provide more levels of “Provide common abstraction & automation technical services” for job creation ” • User base grows “Make it “Cover more data movements – • More applications easier to get data in & out” inserts, appends, etc” are made possible Copyright 2010 Cloudera Inc. All rights reserved 9
  • 10.
    But Innovation Isn’tFree • For every release of MapReduce 20 and HDFS, there are >20 10 releases of related projects 0 • Every component has its own schedule, versioning, HBase 0.89 HDFS 0.20 dependencies & patch Pig 0.7 requirements Hive 0.6 Oozie 2.0 • Hadoop community likes to build 2-3 of everything Copyright 2010 Cloudera Inc. All rights reserved 10
  • 11.
    Announcing Cloudera’s Distributionfor Hadoop v3 • Open source – 100% Apache licensed • Simplified – Cloudera manages required versions & dependencies • Integrated – all components interoperate • Reliable – patched with fixes from future releases to improve stability • Easy to consume – Debian, RPM, tarball, Virtual Machine, EC2, Rackspace, Softlayer Copyright 2010 Cloudera Inc. All rights reserved 11
  • 12.
    What’s New inCDH v3? • Updates to existing Hadoop frameworks • Pig 0.7 • Sqoop 1.0 • Hadoop 0.20S (planned) • Support for 3 new related components • HBase – with durability • Zookeeper • Oozie – run workflows + support for Hive & Sqoop actions • Introducing 2 new components • Flume – collect streaming data with centralized configuration & guaranteed delivery • Hue – web UI and SDK for Hadoop web applications Copyright 2010 Cloudera Inc. All rights reserved 12
  • 13.
    Charles Zedlewski, Cloudera ProductManagement Copyright 2010 Cloudera Inc. All rights reserved 13
  • 14.
    Harnessing Hadoop HasChallenges Skill Set – experts only Complexity – more than ten components Manageability – hard to configure, monitor & administer Interoperability – limited support for DBMS & analytic tools 14
  • 15.
    Announcing Cloudera Enterprise •Reduces the risks of running Hadoop in production • Improves consistency, compliance and administrative overhead Management tools • Monitoring & config for data integration • Authorization mgmt & provisioning • Resource mgmt • Production support for CDH & certified integrations (e.g. Oracle, Vertica) Copyright 2010 Cloudera Inc. All rights reserved 15
  • 16.
    Demo Copyright 2010 ClouderaInc. All rights reserved 16
  • 17.
    Some Announcements • Partyat our place • Hackathon on CDH3 – applications, enhancements, open source contributions • July 27th, 9:30am – 7:30pm • For invite: hackathon@cloudera.com • Free food & snacks • Or stay home and read • Hadoop the Definitive Guide, second edition • Available on October 12th at Hadoop World Copyright 2010 Cloudera Inc. All rights reserved 17
  • 18.
    Thank You! • Stopby our table if you have questions! Copyright 2010 Cloudera Inc. All rights reserved 18
  • 19.
    Copyright 2010 ClouderaInc. All rights reserved 19