• Like
  • Save

Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris

  • 820 views
Uploaded on

Big Data presentation by Carlos Conde at AWS Summit Paris

Big Data presentation by Carlos Conde at AWS Summit Paris

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • I like your Big Data presentation.
    I would like to share with you document about application of Big Data and Data Science in retail banking. http://www.slideshare.net/LadislavUrban/syoncloud-big-data-for-retail-banking-syoncloud
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
820
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
1
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. BIG DATA ANALYTICS WITH AWS Carlos Conde │Solutions Architect
  • 2. BIG DATAWhen innovation is required to collect, store, analyze, and share your data
  • 3. YOU DON’T HAVE THE CHOICE…
  • 4. 27 TB per dayLarge Hadron Collider – CERN
  • 5. BIGGERIS BETTER
  • 6. The more data you collect The more VALUE you can derive from it
  • 7. Big Data Verticals SocialMedia/Advertising Oil & Gas Retail Life Sciences Financial Services Security Network/Gaming User Monte Anti-virus Demographics Targeted Recommendations Carlo Advertising Simulations Seismic Genome Fraud Usage Analysis Analysis Detection analysis Image and Transaction Risk Video Analysis Analysis Image In-game Processing Recognition metrics
  • 8.  2.7 Zetabytes in 2012 Over 90% will be unstructured
  • 9. VOLUMEVELOCITYVARIETY
  • 10. COLLECT │ STORE │ ANALYZE │ SHARE
  • 11. COLLECT │ STORE │ ANALYZE │ SHARE
  • 12. AWSIMPORT / EXPORT
  • 13. COLLECT │ STORE │ ANALYZE │ SHARE
  • 14. AMAZONS3
  • 15. AMAZONDYNAMODB
  • 16. HBaseonAMAZONEMR
  • 17. COLLECT │ STORE │ ANALYZE │ SHARE
  • 18. GPUGRAPHICS PROCESSING UNIT
  • 19. CLUSTER GPUQUADRUPLE EXTRA LARGE Intel Xeon X5570, quad-core 2x Nehalem architecture NVIDIA Tesla Fermi 2x M2050 GPUs 22 GB of memory – 1.7 TB of storage$2.1 PER HOUR
  • 20. PARALELIZATION
  • 21. ON A SINGLE INSTANCECOST: 4h x $2.1 = $8.4RENDERING TIME: 4h
  • 22. ON MULTIPLE INSTANCESCOST: 2 x 2h x $2.1 = $8.4RENDERING TIME:
  • 23. "Hadoop is a reliable storage and data analysis system" HDFS MapReduce
  • 24. Deploying a Hadoop cluster is hardhttp://eddie.niese.net/20090313/dont-pity-incompetence/
  • 25. AMAZON EMR HADOOP + AWS
  • 26. Doing analytics in Eclipse is wrong…
  • 27. PIG
  • 28. A real Pig script(used at Twitter)
  • 29. USE THE RIGHT TOOL FOR THE RIGHT JOB RDBMS Hadoop Interactive Reporting  Affordable (<1sec) Storage/Compute Multistep Transactions  Structured or Not (Agility) Lots of Updates/Deletes  Resilient Auto Scalability
  • 30. Data Warehouse (Batch Processing)Data Warehouse Data Warehouse (Steady State) (Steady State) Shrink to Expand to 9 instances 25 instances
  • 31. OPERATIONAL HADOOP EXPERIENCE Operated over a million Hadoop clusters last year
  • 32. COLLECT │ STORE │ ANALYZE │ SHARE
  • 33. PUBLIC DATA SETShttp://aws.amazon.com/publicdatasets
  • 34. COLLECT │ STORE │ ANALYZE │ SHARE
  • 35. INNOVATE
  • 36. « Want to increase innovation? Lower the cost of failure » Joi Ito
  • 37. AWS LOWERS THE COST OF INNOVATION Testing a new idea is cheap