Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data & The Cloud

Joe Ziegler's presentation at the 5th Elephant conference in Bangalore.

  • Login to see the comments

Big Data & The Cloud

  1. 1. Amazon Web ServicesBig Data and the Cloud : A Best Friend Story
  2. 2. Joe ZieglerTechnical Evangelistzieglerj@amazon.com @jiyosub
  3. 3. Characteristics of Big Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World
  4. 4. Characteristics of Big Data
  5. 5. BIG DATA When your data sets become so large that you have to startinnovating how to collect, store, organize, analyze and share it
  6. 6. Bigger Data isBetter Data
  7. 7. Features driven by MapReduce
  8. 8. Bigger Data isHarder Data
  9. 9. Big Data is Getting Bigger 2.7 Zetabytes in 2012 Over 90% will be unstructured Data spread across a wide array of silos
  10. 10. Why is Big Data Hard (and Getting Harder)? Changing Data Requirements Faster response time of fresher dataSampling is not good enough & history is important Increasing complexity of analytics Users demand inexpensive experimentation
  11. 11. Where is it Coming From?Computer Generated Human Generated• Application server logs • Twitter “Fire Hose” 50m (web sites, games) tweets/day 1,400% growth• Sensor data (weather, per year water, smart grids) • Blogs/Reviews/Emails/Pict• Images/videos (traffic, ures security cameras) • Social Graphs: Facebook, Linked-in, Contacts
  12. 12. The Role of Data is Changing
  13. 13. Until now, Questions you ask drove Data model New model is collect as much data as possible – “Data-First Philosophy”
  14. 14. Data is the new raw material forData is the new raw material for onbusiness on par any business any par with with capital, people, labor capital, people, labor
  15. 15. We Need Tools Built Specifically for Big Data
  16. 16. Hadoop• Scale out Easily • Solves some Problems• Parallel Computing • Complex to Run• Commodity Hardware • Special Skills to Maintain
  17. 17. How the Cloud IsBig Data’s Best Friend
  18. 18. How do we define the cloud? By Benefits!
  19. 19. No Cap Ex Pay Per Elasticity Use CloudFast Time to Market Focus on core competency
  20. 20. Why is the CloudBig Data’s Best Friend
  21. 21. We know we want collect, store,organize, analyze and share it.But we have limited resources.
  22. 22. The Cloud OptimizesPrecious IT Resources i.e. Skilled People
  23. 23. “Over the next decade, the number of files or containersthat encapsulate the information in the digital universewill grow by 75x.While the pool of IT staff available to manage them willgrow only slightly. At 1.5x” - 2011 IDC Digital Universe Study
  24. 24. Deploying a Hadoop cluster is hard
  25. 25. Cloud computing 30% 70%The Old Using Big Managing All of theIT World Data “Undifferentiated Heavy Lifting”
  26. 26. Cloud computing 30% 70% The Old Using Big Managing All of the IT World Data “Undifferentiated Heavy Lifting” Configuring Cloud-Based Analyzing and Using Big Data CloudInfrastructure Assets 70% 30%
  27. 27. ManagedReusability Services Scale Innovation
  28. 28. ManagedReusability Services Scale Innovation
  29. 29. ManagedReusability Services Scale Innovation
  30. 30. ManagedReusability Services Scale Innovation
  31. 31. ManagedReusability Services Scale Innovation
  32. 32. The Cloud OptimizesCapacity Resources
  33. 33. Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
  34. 34. Elastic Compute Capacity WASTE On and Off Fast Growth Variable peaks Predictable peaks CUSTOMER DISSATISFACTION
  35. 35. Elastic Compute CapacityCapacity Traditional IT capacity Elastic cloud capacity Time Your IT needs
  36. 36. Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
  37. 37. The CloudEmpowers Users to Balance Cost and Time
  38. 38. 1 instance for 500 hours =500 instances for 1 hour
  39. 39. The Cloud Reduces CostFor Experimentation
  40. 40. The CloudEnables Collection and Storage of Big Data
  41. 41. Simple Storage Service 1 Trillion 1000.000 750.000 500.000 250.000 0.000 750k+ peak transactions per second
  42. 42. Global Accessibility RegionUS-WEST (N. California) EU-WEST (Ireland) GOV CLOUD ASIA PAC (Tokyo) US-EAST (Virginia)US-WEST (Oregon) ASIA PAC (Singapore) SOUTH AMERICA (Sao Paulo)
  43. 43. Storage Costs are Declining
  44. 44. Big Data on the Cloud In the Real World
  45. 45. Big Data Verticals SocialMedia/Adverti Financial Oil & Gas Retail Life Sciences Security Network/Gami sing Services ng User Anti-virus Targeted Monte Carlo Demographics Recommend Advertising Simulations Seismic Genome Fraud Usage analysis Analysis Analysis Detection Image and Transactions Video Risk Analysis Analysis Image In-game Processing Recognition metrics
  46. 46. Visualizations
  47. 47. Bank – Monte Carlo Simulations “The AWS platform was a good fit for its unlimited and flexible computational power to23 Hours to our risk-simulation process requirements. With AWS, we now have the power to decide20 Minutes how fast we want to obtain simulation results, and, more importantly, we have the ability to run simulations not possible before due to the large amount of infrastructure required.” – Castillo, Director, Bankinter
  48. 48. Recommendations The Taste Testhttp://www.etsy.com/tastetest
  49. 49. RecommendationsGift Ideas for Facebook Friends etsy.com/gifts
  50. 50. Click Stream Analysis User recently purchased a Targeted Adsports movie and (1.7 Million per day) is searching for video games
  51. 51. Characteristics of Big Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World
  52. 52. Questions?
  53. 53. Joe ZieglerTechnical Evangelistzieglerj@amazon.com @jiyosub

×