Your SlideShare is downloading. ×
Cloudera’s Distribution Including Apache Hadoop & Cloudera Enterprise<br />
Who has big data?  Everyone!<br />Web<br /><ul><li>Social network analysis
Clickstreamsessionization</li></ul>Media<br /><ul><li>Content optimization
Clickstreamsessionization</li></ul>Advanced Analytics<br />Telco<br /><ul><li>Network analytics
Mediation</li></ul>Retail<br /><ul><li>Loyalty & promotions analysis
Data factory</li></ul>Data Processing<br />Financial<br /><ul><li>Fraud analysis
Trade reconciliation</li></ul>Federal<br />Biopharma<br /><ul><li>Entity analysis
SIGINT
Sequence analysis
Annotation</li></li></ul><li>When they started to get big data, what did Google build?<br />Dremel<br />Evenflow<br />Even...
When they started to get big data, what did Google build?<br />Store data<br />Dremel<br />Evenflow<br />Evenflow<br />Dre...
When they started to get big data, what did Google build?<br />Process data<br />Dremel<br />Evenflow<br />Evenflow<br />D...
When they started to get big data, what did Google build?<br />Ingest data<br />Dremel<br />Evenflow<br />Evenflow<br />Dr...
When they started to get big data, what did Google build?<br />Serve data<br />Dremel<br />Evenflow<br />Evenflow<br />Dre...
When they started to get big data, what did Google build?<br />High level domain specific language<br />Dremel<br />Evenfl...
When they started to get big data, what did Google build?<br />Chain together complex workloads<br />Dremel<br />Evenflow<...
When they started to get big data, what did Google build?<br />Schedule them<br />Dremel<br />Evenflow<br />Evenflow<br />...
When they started to get big data, what did Google build?<br />Columnar storage + metadata<br />Dremel<br />Evenflow<br />...
When they started to get big data, what did Google build?<br />End users query data<br />Dremel<br />Evenflow<br />Evenflo...
When they started to get big data, what did Google build?<br />Coordinate within system <br />Dremel<br />Evenflow<br />Ev...
The pattern repeats…<br />HiPal<br />Databee<br />Databee<br />Hive<br />Hive<br />HBase<br />Scribe<br />Zookeeper<br />
The pattern repeats…<br />Oozie<br />Oozie<br />Hive<br />Pig & Hive<br />HBase<br />Data Highway<br />Zookeeper<br />
The pattern repeats…<br />Azkaban<br />Azkaban<br />Pig<br />Voldemort<br />Sqoop<br />Kafka<br />Zookeeper<br />
Formalized in CDH<br />Cloudera’s Distribution Including Apache Hadoop<br />Hue<br />Hue<br />Oozie<br />Oozie<br />Hive<b...
Cloudera’s product strategy<br /><ul><li>Provide the reference distribution for the Apache Hadoop platform
Upcoming SlideShare
Loading in...5
×

Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloudera Enterprise

2,338

Published on

This session will discuss what's new in the recently released CDH3 and Enterprise 3.5 products. We'll review how usage of Hadoop has evolving in the enterprise and how CDH3 and Enterprise 3.5 meet these new challenges with advances in functionality, performance, security and manageability.

Published in: Technology, Business

Transcript of "Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloudera Enterprise"

  1. 1.
  2. 2. Cloudera’s Distribution Including Apache Hadoop & Cloudera Enterprise<br />
  3. 3. Who has big data? Everyone!<br />Web<br /><ul><li>Social network analysis
  4. 4. Clickstreamsessionization</li></ul>Media<br /><ul><li>Content optimization
  5. 5. Clickstreamsessionization</li></ul>Advanced Analytics<br />Telco<br /><ul><li>Network analytics
  6. 6. Mediation</li></ul>Retail<br /><ul><li>Loyalty & promotions analysis
  7. 7. Data factory</li></ul>Data Processing<br />Financial<br /><ul><li>Fraud analysis
  8. 8. Trade reconciliation</li></ul>Federal<br />Biopharma<br /><ul><li>Entity analysis
  9. 9. SIGINT
  10. 10. Sequence analysis
  11. 11. Annotation</li></li></ul><li>When they started to get big data, what did Google build?<br />Dremel<br />Evenflow<br />Evenflow<br />Dremel<br />Sawzall<br />Bigtable<br />MySQL<br />Gateway<br />MapReduce / GFS<br />Chubby<br />
  12. 12. When they started to get big data, what did Google build?<br />Store data<br />Dremel<br />Evenflow<br />Evenflow<br />Dremel<br />Sawzall<br />Bigtable<br />MySQL<br />Gateway<br />MapReduce / GFS<br />Chubby<br />
  13. 13. When they started to get big data, what did Google build?<br />Process data<br />Dremel<br />Evenflow<br />Evenflow<br />Dremel<br />Sawzall<br />Bigtable<br />MySQL<br />Gateway<br />MapReduce / GFS<br />Chubby<br />
  14. 14. When they started to get big data, what did Google build?<br />Ingest data<br />Dremel<br />Evenflow<br />Evenflow<br />Dremel<br />Sawzall<br />Bigtable<br />MySQL<br />Gateway<br />MapReduce / GFS<br />Chubby<br />
  15. 15. When they started to get big data, what did Google build?<br />Serve data<br />Dremel<br />Evenflow<br />Evenflow<br />Dremel<br />Sawzall<br />Bigtable<br />MySQL<br />Gateway<br />MapReduce / GFS<br />Chubby<br />
  16. 16. When they started to get big data, what did Google build?<br />High level domain specific language<br />Dremel<br />Evenflow<br />Evenflow<br />Dremel<br />Sawzall<br />Bigtable<br />MySQL<br />Gateway<br />MapReduce / GFS<br />Chubby<br />
  17. 17. When they started to get big data, what did Google build?<br />Chain together complex workloads<br />Dremel<br />Evenflow<br />Evenflow<br />Dremel<br />Sawzall<br />Bigtable<br />MySQL<br />Gateway<br />MapReduce / GFS<br />Chubby<br />
  18. 18. When they started to get big data, what did Google build?<br />Schedule them<br />Dremel<br />Evenflow<br />Evenflow<br />Dremel<br />Sawzall<br />Bigtable<br />MySQL<br />Gateway<br />MapReduce / GFS<br />Chubby<br />
  19. 19. When they started to get big data, what did Google build?<br />Columnar storage + metadata<br />Dremel<br />Evenflow<br />Evenflow<br />Dremel<br />Sawzall<br />Bigtable<br />MySQL<br />Gateway<br />MapReduce / GFS<br />Chubby<br />
  20. 20. When they started to get big data, what did Google build?<br />End users query data<br />Dremel<br />Evenflow<br />Evenflow<br />Dremel<br />Sawzall<br />Bigtable<br />MySQL<br />Gateway<br />MapReduce / GFS<br />Chubby<br />
  21. 21. When they started to get big data, what did Google build?<br />Coordinate within system <br />Dremel<br />Evenflow<br />Evenflow<br />Dremel<br />Sawzall<br />Bigtable<br />MySQL<br />Gateway<br />MapReduce / GFS<br />Chubby<br />
  22. 22. The pattern repeats…<br />HiPal<br />Databee<br />Databee<br />Hive<br />Hive<br />HBase<br />Scribe<br />Zookeeper<br />
  23. 23. The pattern repeats…<br />Oozie<br />Oozie<br />Hive<br />Pig & Hive<br />HBase<br />Data Highway<br />Zookeeper<br />
  24. 24. The pattern repeats…<br />Azkaban<br />Azkaban<br />Pig<br />Voldemort<br />Sqoop<br />Kafka<br />Zookeeper<br />
  25. 25. Formalized in CDH<br />Cloudera’s Distribution Including Apache Hadoop<br />Hue<br />Hue<br />Oozie<br />Oozie<br />Hive<br />Hive / Pig<br />HBase<br />Sqoop<br />Flume<br />Zookeeper<br />
  26. 26. Cloudera’s product strategy<br /><ul><li>Provide the reference distribution for the Apache Hadoop platform
  27. 27. Functionally complete
  28. 28. Performant and secure
  29. 29. Integrated & tested
  30. 30. Easy to trial & consume
  31. 31. 100% Apache licensed
  32. 32. Open to partners and the extended IT ecosystem
  33. 33. Provide a commercial solution to helps enterprises run Hadoop in production
  34. 34. Software & services
  35. 35. Increase transparency, consistency & reliability
  36. 36. Lower the cost & complexity of administration
  37. 37. Improved compliance to policies & processes</li></ul>Cloudera’s Distribution Including Apache Hadoop<br />Cloudera Enterprise<br />
  38. 38. Cloudera’s Distribution including Apache Hadoop (CDH) is among other things Apache Hadoop code<br /><ul><li>The only code Cloudera includes for MapReduce, HDFS and Hadoop Common is code committed to the Apache Hadoop project
  39. 39. Means no forking and conformance to an open standard
  40. 40. This is similarly the case with:
  41. 41. Apache Hive
  42. 42. Apache Hbase
  43. 43. Apache Pig
  44. 44. and so on…</li></li></ul><li>CDH is: Apache Hadoop people<br />* Source – Apache, Cloudera & Yahoo jira, Q4, 2010<br />
  45. 45. CDH is something that works with the enterprise IT ecosystem<br />Drivers, language enhancements, testing<br />Sqoop frame-work, adapters<br />More coming…<br />Packaging, testing<br />
  46. 46. CDH improves to make Apache Hadoop easier to run in trial or production<br />1Q 2011<br />4Q 2010<br /><ul><li>Known issues & limitations
  47. 47. Security guide
  48. 48. Certified integrations
  49. 49. Predictable updates
  50. 50. Integrated system
  51. 51. Installation guide
  52. 52. Availability of support
  53. 53. Packaging
  54. 54. Patching</li></ul>3Q 2010<br /><ul><li>Security guide
  55. 55. Certified integrations
  56. 56. Predictable updates
  57. 57. Integrated system
  58. 58. Installation guide
  59. 59. Availability of support
  60. 60. Packaging
  61. 61. Patching
  62. 62. Certified integrations
  63. 63. Predictable updates
  64. 64. Integrated system
  65. 65. Installation guide
  66. 66. Availability of support
  67. 67. Packaging
  68. 68. Patching</li></ul>2Q 2010<br /><ul><li>Integrated system
  69. 69. Installation guide
  70. 70. Availability of support
  71. 71. Packaging
  72. 72. Patching</li></ul>2009<br /><ul><li>Installation guide
  73. 73. Availability of support
  74. 74. Packaging
  75. 75. Patching</li></li></ul><li>CDH3 is generally available!<br />I/O performance improvements<br />Job performance improvements<br />Stability improvements<br />Durability improvements<br />Log data collection<br />Database integration<br />Web UI<br />Authentication<br />Indexing<br />Expanded platform support – RHEL6, Suse11, Maven<br />Scheduling<br />Workflow<br />Replication<br />24<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  76. 76. Why Enterprise?<br />Hadoop is a distributed system that presents unique operational challenges<br />The fixed cost of managing internal patch & release infrastructure is prohibitive<br />Hadoop skills & expertise are scarce<br />Challenging to track consistently to community development efforts<br />25<br />Copyright 2011 Cloudera Inc. All rights reserved<br />
  77. 77. Cloudera Enterprise<br /><ul><li>Reduces the risks of running Hadoop in production
  78. 78. Improves consistency, compliance and administrative overhead</li></ul>Management Suite<br /><ul><li>Authorization Manager
  79. 79. Activity Monitor (new)
  80. 80. Service Monitor
  81. 81. Resource Manager
  82. 82. Service & Configuration Manager (new)</li></ul>Cloudera Management Suite<br /><ul><li>Production support for CDH & certified integrations (Oracle, Netezza, Teradata, Greenplum, Aster Data)</li></ul>26<br />Copyright 2011 Cloudera Inc. All rights reserved<br />

×