• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

Big Data and Analytics Innovation Summit

on

  • 719 views

 

Statistics

Views

Total Views
719
Views on SlideShare
719
Embed Views
0

Actions

Likes
0
Downloads
19
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Big Data and Analytics Innovation Summit Big Data and Analytics Innovation Summit Presentation Transcript

    • Cloud
    • The big data pipelineHow customers are using the pipelineThe big data eco-system on the cloud
    • GenerationCollectStoreCollaboration & sharingAnalysis and Computation
    • GenerationCollectStoreCollaboration & sharingAnalysis and Computationlower cost,increasedthroughput
    • GenerationCollectStoreCollaboration & sharingAnalysis and Computationlower cost,increasedthroughputconstraint
    • Generated dataAvailable for analysisData volumeGartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
    • Very high barrier toturning data intoinformation…
    • Very high barrier toturning data intoinformation.Infrastructure capacityTechnical SkillsQuestions to askCheap experimentation
    • Amazon Web Services Cloud
    • Elastic and highly scalableNo upfront capital expenseOnly pay for what you use++Available on-demand+=Removeconstraints
    • Remove constraints = More experimentationMore experimentation = More innovationMore Innovation = Competitive edge
    • Amazon Web ServicesRemoves constraintsFocus on your dataLeave undifferentiated heavy lifting to us
    • big data
    • Bankinter uses HPC on AWS for Monte CarloSimulation“Bankinter uses AWS as anintegral part of our credit-risk simulation application;We need to perform atleast 5,000,000 simulationsto get realistic results”CreditDataAverage simulationtime went from 23 hours to 20 minutes
    • Challenge:Learn about customer based onwhat they do, rather than whatthey say (i.e., data exhaust);virtually unlimited dataSolution:Always-on cluster continuallyprocesses new financial dataand stores results in S3.Collaborative filtering used toprovide recommendations andad-hoc queries performedusing Hive.
    • For illustrative purposes only.
    • S&P Capital IQMicrosoftSQL ServerAmazon S3:• Companies You MayBe Interested InAmazon S3:• Clicks• Key Developments• Company ProfilesAmazon Elastic Map-Reduce:• Compute User Selectivity• Compute Key Developments• Join & Score
    • Challenge:Volatile weather is deadly to crops like grapes and tomatoesSolution:Built a predictive model based on freely available data—60 years ofcrop data, 14 TBs of soil data, and one million government Dopplerradar points. 50 hadoop clusters process new data as it comes into S3each day, continuously updating the model.150B SoilObservations3M DailyWeatherMeasurements850K PrecisionRainfall GridsTracked
    • Simulations Each Month• Per Simulation:• 10K Unique Scenarios Generated• 5 Trillion Datapoints• 5-6k Node Hadoop Cluster
    • AWSImport/ExportCorporatedata centerAmazonElasticMapReduceAmazonSimpleStorageService (S3)BI UsersClickstream datafrom 500+websites and VoDplatform
    • More than 25 Million Streaming Members50 Billion Events Per Day30 Million plays every day2 billion hours of video in 3months4 million ratings per day3 million searchesDevice location , time ,day, week etc.Social data
    • 10 TB of streaming data per day
    • Data consumed in multiple waysS3EMRProd Cluster(EMR)RecommendationEngineAd-hocAnalysisPersonalization
    • Amazon Dynamodb
    • “Who buys video games?”
    • 3.5 billion records13 TB of click stream logs71 million unique cookiesPer day:
    • 500% return on ad spend17,000% reduction inprocurement timeResults:
    • “Who is using ourservice?”
    • Identified early mobile usageInvested heavily in mobiledevelopmentFinding signal in the noise of logs9,432,061 unique mobile devicesused the Yelp mobile app.
    • Every day is crucial and costly
    • Challenge: To run a virtual screen with a higheraccuracy algorithm & 21 million compounds
    • Metric CountCompute Hours ofWork109,927 hoursCompute Days ofWork4,580 daysCompute Years ofWork12.55 yearsLigand Count ~21 million ligandsUsing Cycle Computing and AmazonWeb Services
    • 3 Hoursfor $4828.85/hr
    • Relational Database ServiceFully managed database(MySQL, Oracle, MSSQL)DynamoDBNoSQL, Schemaless,Provisioned throughputdatabaseS3Object datastore up to 5TBper object99.999999999% durability
    • Map-Reduce engineHadoop-as-a-serviceMassively parallelCost effective AWS wrapperAmazon Elastic MapReduce
    • AmazonRedshiftdata warehouse servicepetabyte-scalefast and fully managed
    • RDBMSRedshiftOLTPERPReportingand BI
    • +Source: http://nerds.airbnb.com/redshift-performance-costTable Size Query type Hive Redshift3 billionrowsSimple rangequery1680seconds (28min)360 seconds(6 min)1 millionrows2 complexjoins182 seconds 8 seconds$13.60/hour on Redshift versus $57/hour onHIVE
    • GenerationCollectStoreCollaboration & sharingAnalysis and Computation
    • Thank you! aws.amazon.com/big-dataMay 14st, Kowloonbay International Trade& Exhibition Centre (KITEC), Hong KongOne day Free trainingWalk through of serviceshttp://aws.amazon.com/apac/awsday/hk/