Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cloudera – One Platform to Rule Them All

366 views

Published on

Presentation about the Cloudera platform – a bit of history and the use cases.
By Nuno Barreto – Partner & Big Data Lead @XpandIT

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Cloudera – One Platform to Rule Them All

  1. 1. Cloudera – One Platform to Rule Them All
  2. 2. • A Bit of History • The Platform • The Use Cases • Summary AGENDA
  3. 3. A BIT OF HISTORY
  4. 4. RDBMS VS CLOUDERA …according to Google Trends (since January 2010)
  5. 5. HADOOP – THE EARLY DAYS • Hadoop Distributed Filesystem (HDFS) • Hadoop MapReduce
  6. 6. X86 X86 X86 X86 NODE1 NODE2 NODE3 NODEN ...
  7. 7. HADOOP OVERTIME • Hive – SQL-like query • Pig - Programming model • HBase – NoSQL database (operational) • YARN – Resource Manager • Impala – Online SQL (analytics) • Spark – Streaming, Batch, ML • Kafka – Messaging … new additions
  8. 8. HADOOP ECOSYSTEM TODAY
  9. 9. THE PLATFORM
  10. 10. CLOUDERA ENTERPRISE DATA HUB • Processing and Storage Core is 100% open source • Only tested – at large scale – Apache components get in • Value added features (operations and governance) • Reactive and Pro-active & Predictive Support • Easy, Fast & Secure • The best partners – us 
  11. 11. CLOUDERA MANAGER • Operations • Monitoring • Configuration Management • Multi-tenant Management • Backup & Disaster Recovery • Extensible Integration … for cluster operations
  12. 12. CLOUDERA DIRECTOR … makes Cloudera cloud ready
  13. 13. CLOUDERA NAVIGATOR • Audit & Trace • Alert • Lineage • Encryption • Optimizer … for cluster governance & security
  14. 14. THE USE CASES
  15. 15. HADOOP – THE ULTIMATE DATA TOOLKIT
  16. 16. DATA LAKE / ENTERPRISE DATA HUB Sensor Data Blogs Emails Web Logs Docs (e.g.PDF) Images Videos CRM ERP Legacy 3rd Patry Extract (includesFileTansfer),TransformandLoad Scale-out DistributedDatabase Visualization(Reporting,ExplorationandSandboxing) RawDataSources Operational Systems DW&DATAMARTs
  17. 17. DATA LAKE / ENTERPRISE DATA HUB
  18. 18. MESSAGING
  19. 19. MESSAGING
  20. 20. IOT Devices with sensors & actuators Devices with sensors & actuators Gateway Gateway EDH
  21. 21. IOT
  22. 22. DATA SCIENCE
  23. 23. DATA SCIENCE
  24. 24. CLOUDERA DATA SCIENCE WORKBENCH Use R, Python or Scala No need to sample Collaborative research Bring Analysis to the data Secure by default Flexible Deployment
  25. 25. SUMMARY
  26. 26. SUMMARY • Hadoop is an eco-system, not two projects • Spark will not replace Hadoop, Spark “is” Hadoop • Cloudera has a complete offering • Cloudera is for Batch & NRT • Cloudera is for Analytics & Operational … key takeaways
  27. 27. THANK YOU
  28. 28. Credits • Includes icons (pages 8 and 15) made by Freepik from www.flaticon.com • Cloudera images from www.cloudera.com

×