BIG Data on AWSPaul Duffy
Characteristics ofBig Data            How the Cloud Is            Big Data’s Best Friend                       Big Data on...
Characteristics of Big Data
The cost of data generation is falling rapidly Dramatic increase in volume, velocity and              variety of data
BIG DATAA collection of tools, techniques and technologies thatallow you to work productively with data at any scale.
Big Data is Getting Bigger            2.7 Zetabytes in 2012             Over 90% will be            unstructured          ...
Features driven by MapReduce
Variable data structures and sourcesComputer Generated          Human Generated• Application server logs     • Twitter “Fi...
The Role of Data  is Changing
Traditional analytics required a              fixed data model,based on pre-known questions     Big Data promotes data exp...
Collection &   Computation    CollaborationGeneration              storage        & analytics    & sharing
Lower costs,faster throughput                    Collection &        Computation         Collaboration    Generation      ...
Require tools designed for data collection and computation atany volume, velocity or format.
Software •   Designed for distribution •   Easy programming models •   Flexible language choice •   Platform for abstracti...
Infrastructure  •   Designed for distribution  •   Easy programming models  •   Flexible language choice  •   Platform for...
Software           Infrastructure
How the Cloud IsBig Data’s Best Friend
How do we define the cloud?       By Benefits!
No Cap Ex                                      Pay Per     Elasticity                                      Use            ...
Why is the CloudBig Data’s Best Friend?
We know we want collect, store, organize, analyze andshare it.But we have limited resources.
The Cloud OptimizesPrecious IT Resourcesi.e. Skilled People
“Over the next decade, the number of files or containers thatencapsulate the information in the digital universe will grow ...
Deploying a Hadoop cluster is hard
Cloud computing                       30%                       70%      The Old                            Managing All o...
Cloud computing                           30%                            70%      The Old                                 ...
ManagedReusability              ServicesScale         Innovation
ManagedReusability              ServicesScale         Innovation
ManagedReusability              ServicesScale         Innovation
ManagedReusability              ServicesScale         Innovation
ManagedReusability              ServicesScale         Innovation
The Cloud OptimizesCapacity Resources
Elastic Compute Capacity    On and Off             Fast Growth    Variable peaks         Predictable peaks
Elastic Compute Capacity                                                WASTE       On and Off                 Fast Growth...
Elastic Compute CapacityCapacity                           Traditional                                   IT capacity      ...
Elastic Compute Capacity       On and Off          Fast Growth       Variable peaks      Predictable peaks
The Cloud Empowers Usersto Balance Cost and Time
1 instance for 500 hours=500 instances for 1 hour                           I like this!                             I scale
The CloudReduces CostFor Experimentation
The CloudEnables Collection and Storageof Big Data
Storage Costs are Declining
Simple Storage Service                                         1 Trillion1000,000 750,000 500,000 250,000   0,000         ...
Global Accessibility                                                  Region US-WEST (N. California)                      ...
Amazon DynamoDBManaged NoSQL database serviceUnlimited sizeUnlimited scaleFlexible key/value storeConsistent, low latencie...
Amazon Elastic MapReduceOn-demand, managed analytics platformPowered by HadoopIntegrated with Spot instances to lower cost...
Big Data on the CloudIn the Real World
Big Data Verticals                                                                                               SocialMed...
Visualizations
Bank – Monte Carlo Simulations                 “The AWS platform was a good fit for its                 unlimited and flex...
RecommendationsThe Taste Test http://www.etsy.com/tastetest
RecommendationsGift Ideas for Facebook Friendsetsy.com/gifts
Click Stream Analysis   User recently   purchased a   sports movie and       Targeted Ad   is searching for   (1.7 Million...
Characteristics ofBig Data            How the Cloud Is            Big Data’s Best Friend                       Big Data on...
Thank you…
Big Data on AWS
Upcoming SlideShare
Loading in...5
×

Big Data on AWS

548

Published on

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
548
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
41
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Big Data on AWS

  1. 1. BIG Data on AWSPaul Duffy
  2. 2. Characteristics ofBig Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World
  3. 3. Characteristics of Big Data
  4. 4. The cost of data generation is falling rapidly Dramatic increase in volume, velocity and variety of data
  5. 5. BIG DATAA collection of tools, techniques and technologies thatallow you to work productively with data at any scale.
  6. 6. Big Data is Getting Bigger 2.7 Zetabytes in 2012 Over 90% will be unstructured Data spread across a wide array of silos
  7. 7. Features driven by MapReduce
  8. 8. Variable data structures and sourcesComputer Generated Human Generated• Application server logs • Twitter “Fire Hose” 50m (web sites, games) tweets/day 1,400%• Sensor data (weather, growth per year water, smart grids) • Blogs/Reviews/Emails/P• Images/videos (traffic, ictures security cameras) • Social Graphs: Facebook, Linked-in, Contacts
  9. 9. The Role of Data is Changing
  10. 10. Traditional analytics required a fixed data model,based on pre-known questions Big Data promotes data exploration and experimentation which leads to innovation
  11. 11. Collection & Computation CollaborationGeneration storage & analytics & sharing
  12. 12. Lower costs,faster throughput Collection & Computation Collaboration Generation storage & analytics & sharing Increased pressure on traditional IT and tools
  13. 13. Require tools designed for data collection and computation atany volume, velocity or format.
  14. 14. Software • Designed for distribution • Easy programming models • Flexible language choice • Platform for abstraction and ecosystem • Good example: Hadoop
  15. 15. Infrastructure • Designed for distribution • Easy programming models • Flexible language choice • Platform for abstraction and ecosystem • Good example: Cloud computing
  16. 16. Software Infrastructure
  17. 17. How the Cloud IsBig Data’s Best Friend
  18. 18. How do we define the cloud? By Benefits!
  19. 19. No Cap Ex Pay Per Elasticity Use CloudFast Time to Market Focus on core competency
  20. 20. Why is the CloudBig Data’s Best Friend?
  21. 21. We know we want collect, store, organize, analyze andshare it.But we have limited resources.
  22. 22. The Cloud OptimizesPrecious IT Resourcesi.e. Skilled People
  23. 23. “Over the next decade, the number of files or containers thatencapsulate the information in the digital universe will grow by75x.While the pool of IT staff available to manage them will growonly slightly. At 1.5x” - 2011 IDC Digital Universe Study
  24. 24. Deploying a Hadoop cluster is hard
  25. 25. Cloud computing 30% 70% The Old Managing All of the IT World Using Big Data “Undifferentiated Heavy Lifting”
  26. 26. Cloud computing 30% 70% The Old Managing All of the IT World Using Big Data “Undifferentiated Heavy Lifting” Cloud-Based Configuring Infrastructure Analyzing and Using Big Data Cloud Assets 70% 30%
  27. 27. ManagedReusability ServicesScale Innovation
  28. 28. ManagedReusability ServicesScale Innovation
  29. 29. ManagedReusability ServicesScale Innovation
  30. 30. ManagedReusability ServicesScale Innovation
  31. 31. ManagedReusability ServicesScale Innovation
  32. 32. The Cloud OptimizesCapacity Resources
  33. 33. Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
  34. 34. Elastic Compute Capacity WASTE On and Off Fast Growth Variable peaks Predictable peaks CUSTOMER DISSATISFACTION
  35. 35. Elastic Compute CapacityCapacity Traditional IT capacity Elastic cloud capacity Time Your IT needs
  36. 36. Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
  37. 37. The Cloud Empowers Usersto Balance Cost and Time
  38. 38. 1 instance for 500 hours=500 instances for 1 hour I like this! I scale
  39. 39. The CloudReduces CostFor Experimentation
  40. 40. The CloudEnables Collection and Storageof Big Data
  41. 41. Storage Costs are Declining
  42. 42. Simple Storage Service 1 Trillion1000,000 750,000 500,000 250,000 0,000 750k+ peak transactions per second
  43. 43. Global Accessibility Region US-WEST (N. California) EU-WEST (Ireland) GOV CLOUD ASIA PAC (Tokyo) US-EAST (Virginia)US-WEST (Oregon) ASIA PAC (Singapore) SOUTH AMERICA (Sao Paulo)
  44. 44. Amazon DynamoDBManaged NoSQL database serviceUnlimited sizeUnlimited scaleFlexible key/value storeConsistent, low latencies (single digit milliseconds, SSD)Robust, durable data storageIntegrated analytics with Elastic MapReduce
  45. 45. Amazon Elastic MapReduceOn-demand, managed analytics platformPowered by HadoopIntegrated with Spot instances to lower costsVibrant ecosystem of toolsElastic clustersFlexible programming model (Java, Python, Ruby etc)
  46. 46. Big Data on the CloudIn the Real World
  47. 47. Big Data Verticals SocialMedia/Advertisi Financial Oil & Gas Retail Life Sciences Security Network/Gamin ng Services g User Anti-virus Targeted Monte Carlo Demographics Recommend Advertising Simulations Seismic Genome Fraud Usage analysis Analysis Analysis Detection Image and Transactions Video Risk Analysis Processing Analysis Image In-game Recognition metrics
  48. 48. Visualizations
  49. 49. Bank – Monte Carlo Simulations “The AWS platform was a good fit for its unlimited and flexible computational power to23 Hours to our risk-simulation process requirements. With AWS, we now have the power to decide20 Minutes how fast we want to obtain simulation results, and, more importantly, we have the ability to run simulations not possible before due to the large amount of infrastructure required.” – Castillo, Director, Bankinter
  50. 50. RecommendationsThe Taste Test http://www.etsy.com/tastetest
  51. 51. RecommendationsGift Ideas for Facebook Friendsetsy.com/gifts
  52. 52. Click Stream Analysis User recently purchased a sports movie and Targeted Ad is searching for (1.7 Million per day) video games
  53. 53. Characteristics ofBig Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World
  54. 54. Thank you…
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×