Your SlideShare is downloading. ×
Hadoop
          Data Analytics in the Cloud

          Mike Olson
          Chief Executive Officer




Friday, July 17, 2...
Hadoop History

          ▪   Doug Cutting worked on Nutch (web-scale crawler-based
              search), 2002-2004
     ...
Why Hadoop?

          ▪   Large web properties invented MapReduce for large-scale,
              reliable, inexpensive an...
Where Does Data Come From?
          Many Sources Provide Deeper Insight




Friday, July 17, 2009
Where Does Data Come From?
          Many Sources Provide Deeper Insight

          ▪   Simulations and Scientific/Experime...
Where Does Data Come From?
          Many Sources Provide Deeper Insight

          ▪   Simulations and Scientific/Experime...
Where Does Data Come From?
          Many Sources Provide Deeper Insight

          ▪   Simulations and Scientific/Experime...
Where Does Data Come From?
          Many Sources Provide Deeper Insight

          ▪   Simulations and Scientific/Experime...
Where Does Data Come From?
          Many Sources Provide Deeper Insight

          ▪   Simulations and Scientific/Experime...
Hadoop Technical Overview: HDFS
          Storing Data: Distributed Over Many Machines




                          HDFS:...
Hadoop Technical Overview: HDFS
          Storing Data: Distributed Over Many Machines




                          HDFS:...
Hadoop Technical Overview: HDFS
          Storing Data: Distributed Over Many Machines




                              C...
Hadoop Technical Overview: HDFS
          Storing Data: Distributed Over Many Machines




                               ...
Hadoop Technical Overview: MapReduce
          Processing Data: Leveraging Data Locality




                           Ma...
Hadoop Technical Overview: MapReduce
          Processing Data: Leveraging Data Locality




                           Ma...
Hadoop Technical Overview: MapReduce
          Processing Data: Leveraging Data Locality




                           Ma...
Hadoop Technical Overview: MapReduce
          Processing Data: Leveraging Data Locality




                           Da...
Hadoop Technical Overview: Reliability
          Fault Tolerance: Handled with Software




                        Softwa...
Hadoop Technical Overview: Reliability
          Fault Tolerance: Handled with Software




                        Softwa...
Hadoop Technical Overview: Reliability
          Fault Tolerance: Handled with Software




                   Data loss p...
Cloud Deployment Options for Hadoop
          ▪   In your data center
              •   Acquire, provision, administer ser...
(c) 1009 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved.


...
Upcoming SlideShare
Loading in...5
×

Rhat OSS - Cloudera - Mike Olson - Hadoop Data Analytics In The Cloud

1,774

Published on

Mike Olson's talk on Hadoop Data Analytics at the O'Reilly Open Source Convention

Published in: Technology, Education
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,774
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Transcript of "Rhat OSS - Cloudera - Mike Olson - Hadoop Data Analytics In The Cloud"

  1. 1. Hadoop Data Analytics in the Cloud Mike Olson Chief Executive Officer Friday, July 17, 2009
  2. 2. Hadoop History ▪ Doug Cutting worked on Nutch (web-scale crawler-based search), 2002-2004 ▪ Google published MapReduce paper in 2004 ▪ Cutting adds DFS & MapReduce support to Nutch ▪ Joined by Mike Cafarella ▪ 2006: Yahoo! hires Cutting, Hadoop spins out of Nutch ▪ Web-scale deployments in 2007, 2008 at Y!, Facebook, others ▪ Today: 22 committers to core project ▪ Related projects: HBase, Hive, Pig, Mahout, Hama and others Friday, July 17, 2009
  3. 3. Why Hadoop? ▪ Large web properties invented MapReduce for large-scale, reliable, inexpensive analytics ▪ Enterprises generally need these techniques ▪ Retail, financial services, oil and gas, health care, green technologies and more ▪ Hardware trends driving toward long-term retention of valuable source data ▪ New analytical tools are required ▪ Hadoop complements current-generation data warehousing and analytical products Friday, July 17, 2009
  4. 4. Where Does Data Come From? Many Sources Provide Deeper Insight Friday, July 17, 2009
  5. 5. Where Does Data Come From? Many Sources Provide Deeper Insight ▪ Simulations and Scientific/Experimental Data ▪ genome sequencing, medical imaging, wireless sensors Friday, July 17, 2009
  6. 6. Where Does Data Come From? Many Sources Provide Deeper Insight ▪ Simulations and Scientific/Experimental Data ▪ genome sequencing, medical imaging, wireless sensors ▪ Existing Databases ▪ product catalogs, historical sales data, transaction histories Friday, July 17, 2009
  7. 7. Where Does Data Come From? Many Sources Provide Deeper Insight ▪ Simulations and Scientific/Experimental Data ▪ genome sequencing, medical imaging, wireless sensors ▪ Existing Databases ▪ product catalogs, historical sales data, transaction histories ▪ User Data ▪ web logs, clicks on website, pictures, videos, bbs, etc Friday, July 17, 2009
  8. 8. Where Does Data Come From? Many Sources Provide Deeper Insight ▪ Simulations and Scientific/Experimental Data ▪ genome sequencing, medical imaging, wireless sensors ▪ Existing Databases ▪ product catalogs, historical sales data, transaction histories ▪ User Data ▪ web logs, clicks on website, pictures, videos, bbs, etc ▪ System Generated Data ▪ 1000’s of systems reporting status every second Friday, July 17, 2009
  9. 9. Where Does Data Come From? Many Sources Provide Deeper Insight ▪ Simulations and Scientific/Experimental Data ▪ genome sequencing, medical imaging, wireless sensors ▪ Existing Databases ▪ product catalogs, historical sales data, transaction histories ▪ User Data ▪ web logs, clicks on website, pictures, videos, bbs, etc ▪ System Generated Data ▪ 1000’s of systems reporting status every second ▪ Data Comes in All Shapes, Sizes, Schemas and Structures ▪ Hadoop combines many sources regardless of format and structure Friday, July 17, 2009
  10. 10. Hadoop Technical Overview: HDFS Storing Data: Distributed Over Many Machines HDFS: Hadoop Distributed File System Friday, July 17, 2009
  11. 11. Hadoop Technical Overview: HDFS Storing Data: Distributed Over Many Machines HDFS: Hadoop Distributed File System Friday, July 17, 2009
  12. 12. Hadoop Technical Overview: HDFS Storing Data: Distributed Over Many Machines Commodity Servers HDFS: Hadoop Distributed File System Friday, July 17, 2009
  13. 13. Hadoop Technical Overview: HDFS Storing Data: Distributed Over Many Machines Commodity Servers Files are broken into blocks and distributed across all servers. Replication protects data from hardware failure. HDFS: Hadoop Distributed File System Friday, July 17, 2009
  14. 14. Hadoop Technical Overview: MapReduce Processing Data: Leveraging Data Locality MapReduce Friday, July 17, 2009
  15. 15. Hadoop Technical Overview: MapReduce Processing Data: Leveraging Data Locality MapReduce Friday, July 17, 2009
  16. 16. Hadoop Technical Overview: MapReduce Processing Data: Leveraging Data Locality MapReduce Friday, July 17, 2009
  17. 17. Hadoop Technical Overview: MapReduce Processing Data: Leveraging Data Locality Data elements processed locally, in parallel Reliable computation implicitly managed by Hadoop MapReduce Friday, July 17, 2009
  18. 18. Hadoop Technical Overview: Reliability Fault Tolerance: Handled with Software Software Fault Tolerance Friday, July 17, 2009
  19. 19. Hadoop Technical Overview: Reliability Fault Tolerance: Handled with Software Software Fault Tolerance Friday, July 17, 2009
  20. 20. Hadoop Technical Overview: Reliability Fault Tolerance: Handled with Software Data loss prevented through automatic replication and rebalancing Computation is restarted automatically without user intervention Software Fault Tolerance Friday, July 17, 2009
  21. 21. Cloud Deployment Options for Hadoop ▪ In your data center • Acquire, provision, administer servers • Choose a virtualization infrastructure? ▪ On dedicated, hosted services • Scale up or down by coordinating with your MSP • On dynamic web services (AWS and others) • Spin up, use, shut down a cluster • Issues: • Data persistence and location, organizational control Friday, July 17, 2009
  22. 22. (c) 1009 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. Friday, July 17, 2009

×