Big Data Analysis: Powered by the Cloud

568 views
513 views

Published on

Opening Keynote at ZDNet Advanced Computing Conference by Abhishek Sinha (Business Development Manager APAC)

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
568
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
23
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Big Data Analysis: Powered by the Cloud

  1. 1. Cloud
  2. 2. What is big dataData analysis PipelineHow customers are using the pipeline
  3. 3. When your data sets becomeso large that you have to startinnovating how to collect, store,organize, analyze and share it
  4. 4. What does big data look like ?
  5. 5. VolumeVelocityVariety3Vs
  6. 6. Where is this data coming from ?
  7. 7. Human generatedMachine generatedTweetSurf the internetBuy and sell productsUpload images and videosPlay gamesCheck in at restaurantsSearch for cafesFind dealsWatch content onlineLook for directionsUse social media
  8. 8. Human generatedMachine generatedNetworks and securitydevicesMobile phonesCell phone towersSmart gridsSmart metersTelematics from carsSensors on machinesVideos from traffic andsecurity cameras
  9. 9. What is it used for ?
  10. 10. Data for competitiveadvantage
  11. 11. Data for competitiveadvantageCustomer SegmentationFinancial modeling,System analysis,Line-of-sight,Replacing Human decisionsBusiness intelligence..
  12. 12. Data for competitiveadvantageCustomer SegmentationFinancial modeling,System analysis,Line-of-sight,Replacing Human decisionsBusiness intelligence..Innovating new business andrevenue models
  13. 13. GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  14. 14. GenerationCollectStoreCollaboration & sharingAnalysis and Computationlower cost,increasedthroughput
  15. 15. GenerationCollectStoreCollaboration & sharingAnalysis and Computationlower cost,increasedthroughputconstraint
  16. 16. Very high barrier toturning data intoinformation…
  17. 17. Very high barrier toturning data intoinformation.Infrastructure capacityTechnical SkillsQuestions to askCheap experimentation
  18. 18. Amazon Web Services Cloud
  19. 19. Elastic and highly scalableNo upfront capital expenseOnly pay for what you use++Available on-demand+=Removeconstraints
  20. 20. Remove constraints = More experimentationMore experimentation = More innovationMore Innovation = Competitive edge
  21. 21. Amazon Web ServicesRemoves constraintsFocus on your dataLeave undifferentiated heavy lifting to us
  22. 22. HOW
  23. 23. GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  24. 24. 25
  25. 25. AWSImport/ExportCorporatedata centerAmazonElasticMapReduceAmazonSimpleStorageService (S3)BI UsersClickstream datafrom 500+websites and VoDplatform
  26. 26. GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  27. 27. More than 25 Million Streaming Members50 Billion Events Per Day30 Million plays every day2 billion hours of video in 3months4 million ratings per day3 million searchesDevice location , time ,day, week etc.Social data
  28. 28. 10 TB of streaming data per day
  29. 29. What is S3?Highly scalable data storageAccess via APIsFast(850K requestsper sec)Highly available & durable(99.999999999% DurabilityEconomical($0.095 per GB)*Web store
  30. 30. Data consumed in multiple waysS3EMRProd Cluster(EMR)RecommendationEngineAd-hocAnalysisPersonalization
  31. 31. Velocity of dataAmazon Dynamodb
  32. 32. GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  33. 33. “Who buys video games?”
  34. 34. 3.5 billion records13 TB of click stream logs71 million unique cookiesPer day:
  35. 35. 500% return on ad spend17,000% reduction inprocurement timeResults:
  36. 36. “Who is using ourservice?”
  37. 37. Identified early mobile usageInvested heavily in mobiledevelopmentFinding signal in the noise of logs
  38. 38. 9,432,061 unique mobile devicesused the Yelp mobile app.4 million+ calls. 5 million+ directions.In January 2013
  39. 39. What is EMR?Map-Reduce engine Integrated with toolsHadoop-as-a-serviceMassively parallelCost effective AWS wrapperIntegrated to AWS servi
  40. 40. +Source: http://nerds.airbnb.com/redshift-performance-costTable Size Query type Hive Redshift3 billionrowsSimple rangequery1680seconds (28min)360 seconds(6 min)1 millionrows2 complexjoins182 seconds 8 seconds$13.60/hour on Redshift versus $57/hour onHIVE
  41. 41. Every day is crucial and costly
  42. 42. Challenge: To run a virtual screen with a higheraccuracy algorithm & 21 million compounds
  43. 43. Metric CountCompute Hours ofWork109,927 hoursCompute Days ofWork4,580 daysCompute Years ofWork12.55 yearsLigand Count ~21 million ligandsUsing Cycle Computing and AmazonWeb Services
  44. 44. 3 Hoursfor $4828.85/hr
  45. 45. Instead of $20+Million inInfrastructure
  46. 46. GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  47. 47. Open web index.3.4 billion records.Available to all.1000 Genomesproject
  48. 48. GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  49. 49. Thank you! aws.amazon.com/big-datasinhaar@amazon.comMay 21st, COEX Auditorium, SeoulOne day Free trainingWalk through of serviceshttp://aws.amazon.com/apac/awsday/seoul/

×