• Save
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
 

Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris

on

  • 1,228 views

Big Data presentation by Carlos Conde at AWS Summit Paris

Big Data presentation by Carlos Conde at AWS Summit Paris

Statistics

Views

Total Views
1,228
Views on SlideShare
1,228
Embed Views
0

Actions

Likes
3
Downloads
0
Comments
1

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • I like your Big Data presentation.
    I would like to share with you document about application of Big Data and Data Science in retail banking. http://www.slideshare.net/LadislavUrban/syoncloud-big-data-for-retail-banking-syoncloud
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris Presentation Transcript

    • BIG DATA ANALYTICS WITH AWS Carlos Conde │Solutions Architect
    • BIG DATAWhen innovation is required to collect, store, analyze, and share your data
    • YOU DON’T HAVE THE CHOICE…
    • 27 TB per dayLarge Hadron Collider – CERN
    • BIGGERIS BETTER
    • The more data you collect The more VALUE you can derive from it
    • Big Data Verticals SocialMedia/Advertising Oil & Gas Retail Life Sciences Financial Services Security Network/Gaming User Monte Anti-virus Demographics Targeted Recommendations Carlo Advertising Simulations Seismic Genome Fraud Usage Analysis Analysis Detection analysis Image and Transaction Risk Video Analysis Analysis Image In-game Processing Recognition metrics
    •  2.7 Zetabytes in 2012 Over 90% will be unstructured
    • VOLUMEVELOCITYVARIETY
    • COLLECT │ STORE │ ANALYZE │ SHARE
    • COLLECT │ STORE │ ANALYZE │ SHARE
    • AWSIMPORT / EXPORT
    • COLLECT │ STORE │ ANALYZE │ SHARE
    • AMAZONS3
    • AMAZONDYNAMODB
    • HBaseonAMAZONEMR
    • COLLECT │ STORE │ ANALYZE │ SHARE
    • GPUGRAPHICS PROCESSING UNIT
    • CLUSTER GPUQUADRUPLE EXTRA LARGE Intel Xeon X5570, quad-core 2x Nehalem architecture NVIDIA Tesla Fermi 2x M2050 GPUs 22 GB of memory – 1.7 TB of storage$2.1 PER HOUR
    • PARALELIZATION
    • ON A SINGLE INSTANCECOST: 4h x $2.1 = $8.4RENDERING TIME: 4h
    • ON MULTIPLE INSTANCESCOST: 2 x 2h x $2.1 = $8.4RENDERING TIME:
    • "Hadoop is a reliable storage and data analysis system" HDFS MapReduce
    • Deploying a Hadoop cluster is hardhttp://eddie.niese.net/20090313/dont-pity-incompetence/
    • AMAZON EMR HADOOP + AWS
    • Doing analytics in Eclipse is wrong…
    • PIG
    • A real Pig script(used at Twitter)
    • USE THE RIGHT TOOL FOR THE RIGHT JOB RDBMS Hadoop Interactive Reporting  Affordable (<1sec) Storage/Compute Multistep Transactions  Structured or Not (Agility) Lots of Updates/Deletes  Resilient Auto Scalability
    • Data Warehouse (Batch Processing)Data Warehouse Data Warehouse (Steady State) (Steady State) Shrink to Expand to 9 instances 25 instances
    • OPERATIONAL HADOOP EXPERIENCE Operated over a million Hadoop clusters last year
    • COLLECT │ STORE │ ANALYZE │ SHARE
    • PUBLIC DATA SETShttp://aws.amazon.com/publicdatasets
    • COLLECT │ STORE │ ANALYZE │ SHARE
    • INNOVATE
    • « Want to increase innovation? Lower the cost of failure » Joi Ito
    • AWS LOWERS THE COST OF INNOVATION Testing a new idea is cheap