Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris

1,065
-1

Published on

Big Data presentation by Carlos Conde at AWS Summit Paris

Published in: Technology, Business
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,065
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris

  1. 1. BIG DATA ANALYTICS WITH AWS Carlos Conde │Solutions Architect
  2. 2. BIG DATAWhen innovation is required to collect, store, analyze, and share your data
  3. 3. YOU DON’T HAVE THE CHOICE…
  4. 4. 27 TB per dayLarge Hadron Collider – CERN
  5. 5. BIGGERIS BETTER
  6. 6. The more data you collect The more VALUE you can derive from it
  7. 7. Big Data Verticals SocialMedia/Advertising Oil & Gas Retail Life Sciences Financial Services Security Network/Gaming User Monte Anti-virus Demographics Targeted Recommendations Carlo Advertising Simulations Seismic Genome Fraud Usage Analysis Analysis Detection analysis Image and Transaction Risk Video Analysis Analysis Image In-game Processing Recognition metrics
  8. 8.  2.7 Zetabytes in 2012 Over 90% will be unstructured
  9. 9. VOLUMEVELOCITYVARIETY
  10. 10. COLLECT │ STORE │ ANALYZE │ SHARE
  11. 11. COLLECT │ STORE │ ANALYZE │ SHARE
  12. 12. AWSIMPORT / EXPORT
  13. 13. COLLECT │ STORE │ ANALYZE │ SHARE
  14. 14. AMAZONS3
  15. 15. AMAZONDYNAMODB
  16. 16. HBaseonAMAZONEMR
  17. 17. COLLECT │ STORE │ ANALYZE │ SHARE
  18. 18. GPUGRAPHICS PROCESSING UNIT
  19. 19. CLUSTER GPUQUADRUPLE EXTRA LARGE Intel Xeon X5570, quad-core 2x Nehalem architecture NVIDIA Tesla Fermi 2x M2050 GPUs 22 GB of memory – 1.7 TB of storage$2.1 PER HOUR
  20. 20. PARALELIZATION
  21. 21. ON A SINGLE INSTANCECOST: 4h x $2.1 = $8.4RENDERING TIME: 4h
  22. 22. ON MULTIPLE INSTANCESCOST: 2 x 2h x $2.1 = $8.4RENDERING TIME:
  23. 23. "Hadoop is a reliable storage and data analysis system" HDFS MapReduce
  24. 24. Deploying a Hadoop cluster is hardhttp://eddie.niese.net/20090313/dont-pity-incompetence/
  25. 25. AMAZON EMR HADOOP + AWS
  26. 26. Doing analytics in Eclipse is wrong…
  27. 27. PIG
  28. 28. A real Pig script(used at Twitter)
  29. 29. USE THE RIGHT TOOL FOR THE RIGHT JOB RDBMS Hadoop Interactive Reporting  Affordable (<1sec) Storage/Compute Multistep Transactions  Structured or Not (Agility) Lots of Updates/Deletes  Resilient Auto Scalability
  30. 30. Data Warehouse (Batch Processing)Data Warehouse Data Warehouse (Steady State) (Steady State) Shrink to Expand to 9 instances 25 instances
  31. 31. OPERATIONAL HADOOP EXPERIENCE Operated over a million Hadoop clusters last year
  32. 32. COLLECT │ STORE │ ANALYZE │ SHARE
  33. 33. PUBLIC DATA SETShttp://aws.amazon.com/publicdatasets
  34. 34. COLLECT │ STORE │ ANALYZE │ SHARE
  35. 35. INNOVATE
  36. 36. « Want to increase innovation? Lower the cost of failure » Joi Ito
  37. 37. AWS LOWERS THE COST OF INNOVATION Testing a new idea is cheap

×