• Like
  • Save
Big Data Analytics
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Published

 

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,394
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
16
Comments
0
Likes
9

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. BIG DATA ANALYTICS WITH AWS Carlos Conde │Solutions Architect
  • 2. BIG DATAWhen innovation is required to collect, store, analyze, and share your data
  • 3. YOU DON’T HAVE THE CHOICE…
  • 4. 27 TB per dayLarge Hadron Collider – CERN
  • 5. BIGGERIS BETTER
  • 6. The more data you collect The more VALUE you can derive from it
  • 7. Big Data Verticals Media Social Network Oil & Gas Retail Life Sciences Financial Services SecurityAdvertising Gaming User Anti-virus Demographics Targeted Recommendations Monte CarloAdvertising Simulations Seismic Genome Fraud Usage Analysis Analysis Detection analysisImage and Transaction Risk Video Analysis Analysis Image In-gameProcessing Recognition metrics
  • 8.  2.7 Zetabytes in 2012 Over 90% will be unstructured
  • 9. VOLUMEVELOCITYVARIETY
  • 10. COLLECT │ STORE │ ANALYZE │ SHARE
  • 11. COLLECT │ STORE │ ANALYZE │ SHARE
  • 12. AWSIMPORT / EXPORT
  • 13. COLLECT │ STORE │ ANALYZE │ SHARE
  • 14. AMAZONS3
  • 15. AMAZONDYNAMODB
  • 16. HBaseonAMAZON EMR
  • 17. COLLECT │ STORE │ ANALYZE │ SHARE
  • 18. GPUGRAPHICS PROCESSING UNIT
  • 19. CLUSTER GPUQUADRUPLE EXTRA LARGE Intel Xeon X5570, quad-core 2x Nehalem architecture NVIDIA Tesla Fermi 2x M2050 GPUs 22 GB of memory – 1.7 TB of storage $ 2.1 PER HOUR
  • 20. PARALELIZATION
  • 21. ON A SINGLE INSTANCECOST: 4h x $2.1 = $8.4RENDERING TIME: 4h
  • 22. ON MULTIPLE INSTANCESCOST: 2 x 2h x $2.1 = $8.4RENDERING TIME:
  • 23. "Hadoop is a reliable storage and data analysis system" HDFS MapReduce
  • 24. Deploying a Hadoop cluster is hard
  • 25. AMAZON EMR HADOOP + AWS
  • 26. Doing analytics in Eclipse iswrong…
  • 27. PIG
  • 28. A real Pig script(used at Twitter)
  • 29. USE THE RIGHT TOOL FOR THE RIGHT JOB RDBMS Hadoop Interactive Reporting (<1sec)  Affordable Storage/Compute Multistep Transactions  Structured or Not (Agility) Lots of Updates/Deletes  Resilient Auto Scalability
  • 30. Data Warehouse (Batch Processing)Data Warehouse Data Warehouse (Steady State) (Steady State) Shrink to Expand to 9 instances 25 instances
  • 31. OPERATIONAL HADOOP EXPERIENCE Operated over a million Hadoop clusters last year
  • 32. COLLECT │ STORE │ ANALYZE │ SHARE
  • 33. PUBLIC DATA SETShttp://aws.amazon.com/publicdatasets
  • 34. COLLECT │ STORE │ ANALYZE │ SHARE
  • 35. INNOVATE
  • 36. « Want to increase innovation? Lower the cost of failure » Joi Ito
  • 37. AWS LOWERSTHE COST OF INNOVATION Testing a new idea is cheap