• Like
  • Save

Webinar: Inside Cloudera's Distribution including Apache Hadoop v3

  • 2,069 views
Uploaded on

VP Product, Chalres Zedlewski takes use through what's inside Cloudera's Distribution including Apache Hadoop, version 3.

VP Product, Chalres Zedlewski takes use through what's inside Cloudera's Distribution including Apache Hadoop, version 3.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,069
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Welcome toInside Cloudera’s Distribution including Apache Hadoop
    Audio/Telephone: +1 (314) 627-1519
    Access Code: 380-729-510
    Audio PIN: Shown after joining the Webinar
    Presenter: Charles Zedlewski, Cloudera VP of Product
  • 2. Housekeeping
    Ask questions at any time using the Questions panel
    Problems? Use the Chat panel
    Slides and recording will be available
    2
    Copyright 2011 Cloudera Inc. All rights reserved
  • 3. What Cloudera set out to do with CDH3
    Give organizations an integrated, complete data management system that is 100% Apache open source
    Provide a platform that the rest of the enterprise IT ecosystem could integrate with
    Continue to make Apache Hadoop even easier to adopt
    Provide a level of release predictability that organizations can plan their maintenance and upgrade cycles on
  • 4. An integrated data management system – what did Google do?
    Dremel
    Evenflow
    Evenflow
    Dremel
    Sawzall
    Bigtable
    MySQL
    Gateway
    MapReduce / GFS
    Chubby
  • 5. The pattern repeats…
    HiPal
    Databee
    Databee
    Hive
    Hive
    HBase
    Scribe
    Zookeeper
  • 6. The pattern repeats…
    Oozie
    Oozie
    Hive
    Pig & Hive
    HBase
    Data Highway
    Zookeeper
  • 7. The pattern repeats…
    Azkaban
    Azkaban
    Pig
    Voldemort
    Sqoop
    Kafka
    Zookeeper
  • 8. CDH3 assembled the best of the Apache Hadoop ecosystem into an integrated system so you don’t have to
    Cloudera’s Distribution Including Apache Hadoop
    Hue
    Hue
    Oozie
    Oozie
    Hive
    Hive / Pig
    HBase
    Sqoop
    Flume
    Zookeeper
  • 9. How CDH3 got created
    Enhancements written and contributed to Apache projects
    Customer and partner requirements
    Integration, testing, & backporting
    Releases selected or cut
    Stable release!
    Hadoop 0.20.2 +923
    HBase 0.90.1 +15
    Hive 0.7 +22
    Pig 0.8 +20
    Flume 0.9.3 +17
    Oozie 20.2 +31
    Hue 1.2.0 +0
    Sqoop 1.2 +24
    Zookeeper 3.3.3 +12
    HDFS
    Beta cycle, more backporting
    Prioritization based on customer value, cost and (for CDH) community readiness
    HBase
    Flume, etc
  • 10. CDH2 to CDH3
    Copyright 2011. Cloudera confidential and proprietary. Redistribution without permission is not permitted
  • 11. So what?
  • 12. Example 1 – clickstream sessions
    Hive
    Store table metadata
    MapReduce
    Sqoop
    Reliably collect logs
    Process into sessions
    Export in EDW for BI reporting
    Flume
    HDFS
    Store in the filesystem
  • 13. Example 2 – fraud analysis
    Sqoop
    Hive
    Analytics performed using HQL
    Import regularly changing dimension data
    HBase
    MapReduce
    Sqoop
    HDFS
    Import of fact data into filesystem
  • 14. What Cloudera set out to do with CDH3
    Give organizations an integrated, complete data management system that is 100% Apache open source
    Provide a platform that the rest of the enterprise IT ecosystem could integrate with
    Continue to make Apache Hadoop even easier to adopt
    Provide a level of release predictability that organizations can plan their maintenance and upgrade cycles on
  • 15. Investing in interfacing with the Enterprise IT ecosystem
    Drivers, language enhancements, testing
    Cloudera’s Distribution Including Apache Hadoop
    Sqoop frame-work, adapters
    More coming…
    Packaging, testing
  • 16. What Cloudera set out to do with CDH3
    Give organizations an integrated, complete data management system that is 100% Apache open source
    Provide a platform that the rest of the enterprise IT ecosystem could integrate with
    Continue to make Apache Hadoop even easier to adopt
    Provide a level of release predictability that organizations can plan their maintenance and upgrade cycles on
  • 17. Ease of adoption - making CDH a more enterprise quality artifact
    Regular, non-disruptive updates
  • 18. There are new features for each component too (partial list)
  • 19. What’s next?
    • Development work for CDH4 has already begun
    • 20. Key themes:
    • 21. Improved availability
    • 22. Lower TCO through harmonization
    • 23. Improved performance
    • 24. Expand the community of users that can work with Apache Hadoop
    • 25. Improvements to release practices
    • 26. Shorter public betas
    • 27. Beefier quarterly updates for CDH3 with non-disruptive enhancements
  • Q&A
    Ask questions using the Questions panel
    Thank you for participating!
    Learn about upcoming events: www.cloudera.com/events
    20
    Copyright 2011 Cloudera Inc. All rights reserved
  • 28. 21
    Copyright 2011 Cloudera Inc. All rights reserved