BIG DATA ANALYTICS          WITH AWS     Carlos Conde │Solutions Architect
BIG DATAWhen innovation is required to collect, store, analyze, and share your data
YOU DON’T HAVE THE CHOICE…
27 TB per dayLarge Hadron Collider – CERN
BIGGERIS BETTER
The more data you collect  The more VALUE you   can derive from it
Big Data Verticals                                                                                                        ...
 2.7 Zetabytes in  2012 Over 90% will be  unstructured
VOLUMEVELOCITYVARIETY
COLLECT │ STORE │ ANALYZE │ SHARE
COLLECT │ STORE │ ANALYZE │ SHARE
AWSIMPORT / EXPORT
COLLECT │ STORE │ ANALYZE │ SHARE
AMAZONS3
AMAZONDYNAMODB
HBaseonAMAZONEMR
COLLECT │ STORE │ ANALYZE │ SHARE
GPUGRAPHICS PROCESSING UNIT
CLUSTER GPUQUADRUPLE EXTRA LARGE           Intel Xeon X5570, quad-core    2x     Nehalem architecture           NVIDIA Tes...
PARALELIZATION
ON A SINGLE INSTANCECOST: 4h x $2.1 = $8.4RENDERING TIME: 4h
ON MULTIPLE INSTANCESCOST: 2 x 2h x $2.1 = $8.4RENDERING TIME:
"Hadoop is a reliable storage and data analysis system"               HDFS                 MapReduce
Deploying a Hadoop cluster is hardhttp://eddie.niese.net/20090313/dont-pity-incompetence/
AMAZON EMR  HADOOP + AWS
Doing analytics in Eclipse       is wrong…
PIG
A real Pig script(used at Twitter)
USE THE RIGHT TOOL             FOR THE RIGHT JOB        RDBMS                          Hadoop   Interactive Reporting    ...
Data Warehouse                                 (Batch Processing)Data Warehouse                                           ...
OPERATIONAL HADOOP EXPERIENCE    Operated over a million Hadoop clusters last year
COLLECT │ STORE │ ANALYZE │ SHARE
PUBLIC     DATA SETShttp://aws.amazon.com/publicdatasets
COLLECT │ STORE │ ANALYZE │ SHARE
INNOVATE
« Want to increase innovation?  Lower the cost of failure »                         Joi Ito
AWS LOWERS THE COST   OF INNOVATION  Testing a new idea is cheap
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Upcoming SlideShare
Loading in...5
×

Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris

942

Published on

Big Data presentation by Carlos Conde at AWS Summit Paris

Published in: Technology, Business
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total Views
942
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris

  1. 1. BIG DATA ANALYTICS WITH AWS Carlos Conde │Solutions Architect
  2. 2. BIG DATAWhen innovation is required to collect, store, analyze, and share your data
  3. 3. YOU DON’T HAVE THE CHOICE…
  4. 4. 27 TB per dayLarge Hadron Collider – CERN
  5. 5. BIGGERIS BETTER
  6. 6. The more data you collect The more VALUE you can derive from it
  7. 7. Big Data Verticals SocialMedia/Advertising Oil & Gas Retail Life Sciences Financial Services Security Network/Gaming User Monte Anti-virus Demographics Targeted Recommendations Carlo Advertising Simulations Seismic Genome Fraud Usage Analysis Analysis Detection analysis Image and Transaction Risk Video Analysis Analysis Image In-game Processing Recognition metrics
  8. 8.  2.7 Zetabytes in 2012 Over 90% will be unstructured
  9. 9. VOLUMEVELOCITYVARIETY
  10. 10. COLLECT │ STORE │ ANALYZE │ SHARE
  11. 11. COLLECT │ STORE │ ANALYZE │ SHARE
  12. 12. AWSIMPORT / EXPORT
  13. 13. COLLECT │ STORE │ ANALYZE │ SHARE
  14. 14. AMAZONS3
  15. 15. AMAZONDYNAMODB
  16. 16. HBaseonAMAZONEMR
  17. 17. COLLECT │ STORE │ ANALYZE │ SHARE
  18. 18. GPUGRAPHICS PROCESSING UNIT
  19. 19. CLUSTER GPUQUADRUPLE EXTRA LARGE Intel Xeon X5570, quad-core 2x Nehalem architecture NVIDIA Tesla Fermi 2x M2050 GPUs 22 GB of memory – 1.7 TB of storage$2.1 PER HOUR
  20. 20. PARALELIZATION
  21. 21. ON A SINGLE INSTANCECOST: 4h x $2.1 = $8.4RENDERING TIME: 4h
  22. 22. ON MULTIPLE INSTANCESCOST: 2 x 2h x $2.1 = $8.4RENDERING TIME:
  23. 23. "Hadoop is a reliable storage and data analysis system" HDFS MapReduce
  24. 24. Deploying a Hadoop cluster is hardhttp://eddie.niese.net/20090313/dont-pity-incompetence/
  25. 25. AMAZON EMR HADOOP + AWS
  26. 26. Doing analytics in Eclipse is wrong…
  27. 27. PIG
  28. 28. A real Pig script(used at Twitter)
  29. 29. USE THE RIGHT TOOL FOR THE RIGHT JOB RDBMS Hadoop Interactive Reporting  Affordable (<1sec) Storage/Compute Multistep Transactions  Structured or Not (Agility) Lots of Updates/Deletes  Resilient Auto Scalability
  30. 30. Data Warehouse (Batch Processing)Data Warehouse Data Warehouse (Steady State) (Steady State) Shrink to Expand to 9 instances 25 instances
  31. 31. OPERATIONAL HADOOP EXPERIENCE Operated over a million Hadoop clusters last year
  32. 32. COLLECT │ STORE │ ANALYZE │ SHARE
  33. 33. PUBLIC DATA SETShttp://aws.amazon.com/publicdatasets
  34. 34. COLLECT │ STORE │ ANALYZE │ SHARE
  35. 35. INNOVATE
  36. 36. « Want to increase innovation? Lower the cost of failure » Joi Ito
  37. 37. AWS LOWERS THE COST OF INNOVATION Testing a new idea is cheap

×