Your SlideShare is downloading. ×
Welcome toInside Cloudera’s Distribution including Apache Hadoop<br />Audio/Telephone: +1 (314) 627-1519<br />Access Code:...
Housekeeping<br />Ask questions at any time using the Questions panel<br />Problems? Use the Chat panel<br />Slides and re...
What Cloudera set out to do with CDH3<br />Give organizations an integrated, complete data management system that is 100% ...
An integrated data management system – what did Google do?<br />Dremel<br />Evenflow<br />Evenflow<br />Dremel<br />Sawzal...
The pattern repeats…<br />HiPal<br />Databee<br />Databee<br />Hive<br />Hive<br />HBase<br />Scribe<br />Zookeeper<br />
The pattern repeats…<br />Oozie<br />Oozie<br />Hive<br />Pig & Hive<br />HBase<br />Data Highway<br />Zookeeper<br />
The pattern repeats…<br />Azkaban<br />Azkaban<br />Pig<br />Voldemort<br />Sqoop<br />Kafka<br />Zookeeper<br />
CDH3 assembled the best of the Apache Hadoop ecosystem into an integrated system so you don’t have to<br />Cloudera’s Dist...
How CDH3 got created<br />Enhancements written and contributed to Apache projects<br />Customer and partner requirements<b...
CDH2 to CDH3<br />Copyright 2011.   Cloudera confidential and proprietary.  Redistribution without permission is not permi...
So what?<br />
Example 1 – clickstream sessions<br />Hive<br />Store table metadata<br />MapReduce<br />Sqoop<br />Reliably collect logs<...
Example 2 – fraud analysis<br />Sqoop<br />Hive<br />Analytics performed using HQL<br />Import regularly changing dimensio...
What Cloudera set out to do with CDH3<br />Give organizations an integrated, complete data management system that is 100% ...
Investing in interfacing with the Enterprise IT ecosystem<br />Drivers, language enhancements, testing<br />Cloudera’s Dis...
What Cloudera set out to do with CDH3<br />Give organizations an integrated, complete data management system that is 100% ...
Ease of adoption - making CDH a more enterprise quality artifact<br />Regular, non-disruptive updates<br />
There are new features for each component too (partial list)<br />
What’s next?<br /><ul><li>Development work for CDH4 has already begun
Key themes:
Improved availability
Upcoming SlideShare
Loading in...5
×

Webinar: Inside Cloudera's Distribution including Apache Hadoop v3

2,215

Published on

VP Product, Chalres Zedlewski takes use through what's inside Cloudera's Distribution including Apache Hadoop, version 3.

Published in: Technology

Transcript of "Webinar: Inside Cloudera's Distribution including Apache Hadoop v3"

  1. 1. Welcome toInside Cloudera’s Distribution including Apache Hadoop<br />Audio/Telephone: +1 (314) 627-1519<br />Access Code: 380-729-510<br />Audio PIN: Shown after joining the Webinar<br />Presenter: Charles Zedlewski, Cloudera VP of Product<br />
  2. 2. Housekeeping<br />Ask questions at any time using the Questions panel<br />Problems? Use the Chat panel<br />Slides and recording will be available<br />2<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  3. 3. What Cloudera set out to do with CDH3<br />Give organizations an integrated, complete data management system that is 100% Apache open source<br />Provide a platform that the rest of the enterprise IT ecosystem could integrate with<br />Continue to make Apache Hadoop even easier to adopt<br />Provide a level of release predictability that organizations can plan their maintenance and upgrade cycles on<br />
  4. 4. An integrated data management system – what did Google do?<br />Dremel<br />Evenflow<br />Evenflow<br />Dremel<br />Sawzall<br />Bigtable<br />MySQL<br />Gateway<br />MapReduce / GFS<br />Chubby<br />
  5. 5. The pattern repeats…<br />HiPal<br />Databee<br />Databee<br />Hive<br />Hive<br />HBase<br />Scribe<br />Zookeeper<br />
  6. 6. The pattern repeats…<br />Oozie<br />Oozie<br />Hive<br />Pig & Hive<br />HBase<br />Data Highway<br />Zookeeper<br />
  7. 7. The pattern repeats…<br />Azkaban<br />Azkaban<br />Pig<br />Voldemort<br />Sqoop<br />Kafka<br />Zookeeper<br />
  8. 8. CDH3 assembled the best of the Apache Hadoop ecosystem into an integrated system so you don’t have to<br />Cloudera’s Distribution Including Apache Hadoop<br />Hue<br />Hue<br />Oozie<br />Oozie<br />Hive<br />Hive / Pig<br />HBase<br />Sqoop<br />Flume<br />Zookeeper<br />
  9. 9. How CDH3 got created<br />Enhancements written and contributed to Apache projects<br />Customer and partner requirements<br />Integration, testing, & backporting<br />Releases selected or cut<br />Stable release!<br />Hadoop 0.20.2 +923<br />HBase 0.90.1 +15<br />Hive 0.7 +22<br />Pig 0.8 +20<br />Flume 0.9.3 +17<br />Oozie 20.2 +31<br />Hue 1.2.0 +0<br />Sqoop 1.2 +24<br />Zookeeper 3.3.3 +12<br />HDFS<br />Beta cycle, more backporting<br />Prioritization based on customer value, cost and (for CDH) community readiness <br />HBase<br />Flume, etc<br />
  10. 10. CDH2 to CDH3<br />Copyright 2011. Cloudera confidential and proprietary. Redistribution without permission is not permitted<br />
  11. 11. So what?<br />
  12. 12. Example 1 – clickstream sessions<br />Hive<br />Store table metadata<br />MapReduce<br />Sqoop<br />Reliably collect logs<br />Process into sessions<br />Export in EDW for BI reporting<br />Flume<br />HDFS<br />Store in the filesystem<br />
  13. 13. Example 2 – fraud analysis<br />Sqoop<br />Hive<br />Analytics performed using HQL<br />Import regularly changing dimension data<br />HBase<br />MapReduce<br />Sqoop<br />HDFS<br />Import of fact data into filesystem<br />
  14. 14. What Cloudera set out to do with CDH3<br />Give organizations an integrated, complete data management system that is 100% Apache open source<br />Provide a platform that the rest of the enterprise IT ecosystem could integrate with<br />Continue to make Apache Hadoop even easier to adopt<br />Provide a level of release predictability that organizations can plan their maintenance and upgrade cycles on<br />
  15. 15. Investing in interfacing with the Enterprise IT ecosystem<br />Drivers, language enhancements, testing<br />Cloudera’s Distribution Including Apache Hadoop<br />Sqoop frame-work, adapters<br />More coming…<br />Packaging, testing<br />
  16. 16. What Cloudera set out to do with CDH3<br />Give organizations an integrated, complete data management system that is 100% Apache open source<br />Provide a platform that the rest of the enterprise IT ecosystem could integrate with<br />Continue to make Apache Hadoop even easier to adopt<br />Provide a level of release predictability that organizations can plan their maintenance and upgrade cycles on<br />
  17. 17. Ease of adoption - making CDH a more enterprise quality artifact<br />Regular, non-disruptive updates<br />
  18. 18. There are new features for each component too (partial list)<br />
  19. 19. What’s next?<br /><ul><li>Development work for CDH4 has already begun
  20. 20. Key themes:
  21. 21. Improved availability
  22. 22. Lower TCO through harmonization
  23. 23. Improved performance
  24. 24. Expand the community of users that can work with Apache Hadoop
  25. 25. Improvements to release practices
  26. 26. Shorter public betas
  27. 27. Beefier quarterly updates for CDH3 with non-disruptive enhancements</li></li></ul><li>Q&A<br />Ask questions using the Questions panel<br />Thank you for participating!<br />Learn about upcoming events: www.cloudera.com/events<br />20<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  28. 28. 21<br />Copyright 2011 Cloudera Inc. All rights reserved<br />

×