Big Data Analysis: Powered by the Cloud
Upcoming SlideShare
Loading in...5

Big Data Analysis: Powered by the Cloud



Opening Keynote at ZDNet Advanced Computing Conference by Abhishek Sinha (Business Development Manager APAC)

Opening Keynote at ZDNet Advanced Computing Conference by Abhishek Sinha (Business Development Manager APAC)



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Big Data Analysis: Powered by the Cloud Big Data Analysis: Powered by the Cloud Presentation Transcript

  • Cloud
  • What is big dataData analysis PipelineHow customers are using the pipeline
  • When your data sets becomeso large that you have to startinnovating how to collect, store,organize, analyze and share it
  • What does big data look like ?
  • VolumeVelocityVariety3Vs
  • Where is this data coming from ?
  • Human generatedMachine generatedTweetSurf the internetBuy and sell productsUpload images and videosPlay gamesCheck in at restaurantsSearch for cafesFind dealsWatch content onlineLook for directionsUse social media
  • Human generatedMachine generatedNetworks and securitydevicesMobile phonesCell phone towersSmart gridsSmart metersTelematics from carsSensors on machinesVideos from traffic andsecurity cameras
  • What is it used for ?
  • Data for competitiveadvantage
  • Data for competitiveadvantageCustomer SegmentationFinancial modeling,System analysis,Line-of-sight,Replacing Human decisionsBusiness intelligence..
  • Data for competitiveadvantageCustomer SegmentationFinancial modeling,System analysis,Line-of-sight,Replacing Human decisionsBusiness intelligence..Innovating new business andrevenue models
  • GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  • GenerationCollectStoreCollaboration & sharingAnalysis and Computationlower cost,increasedthroughput
  • GenerationCollectStoreCollaboration & sharingAnalysis and Computationlower cost,increasedthroughputconstraint
  • Very high barrier toturning data intoinformation…
  • Very high barrier toturning data intoinformation.Infrastructure capacityTechnical SkillsQuestions to askCheap experimentation
  • Amazon Web Services Cloud
  • Elastic and highly scalableNo upfront capital expenseOnly pay for what you use++Available on-demand+=Removeconstraints
  • Remove constraints = More experimentationMore experimentation = More innovationMore Innovation = Competitive edge
  • Amazon Web ServicesRemoves constraintsFocus on your dataLeave undifferentiated heavy lifting to us
  • HOW
  • GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  • 25
  • AWSImport/ExportCorporatedata centerAmazonElasticMapReduceAmazonSimpleStorageService (S3)BI UsersClickstream datafrom 500+websites and VoDplatform
  • GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  • More than 25 Million Streaming Members50 Billion Events Per Day30 Million plays every day2 billion hours of video in 3months4 million ratings per day3 million searchesDevice location , time ,day, week etc.Social data
  • 10 TB of streaming data per day
  • What is S3?Highly scalable data storageAccess via APIsFast(850K requestsper sec)Highly available & durable(99.999999999% DurabilityEconomical($0.095 per GB)*Web store
  • Data consumed in multiple waysS3EMRProd Cluster(EMR)RecommendationEngineAd-hocAnalysisPersonalization
  • Velocity of dataAmazon Dynamodb
  • GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  • “Who buys video games?”
  • 3.5 billion records13 TB of click stream logs71 million unique cookiesPer day:
  • 500% return on ad spend17,000% reduction inprocurement timeResults:
  • “Who is using ourservice?”
  • Identified early mobile usageInvested heavily in mobiledevelopmentFinding signal in the noise of logs
  • 9,432,061 unique mobile devicesused the Yelp mobile app.4 million+ calls. 5 million+ directions.In January 2013
  • What is EMR?Map-Reduce engine Integrated with toolsHadoop-as-a-serviceMassively parallelCost effective AWS wrapperIntegrated to AWS servi
  • +Source: Size Query type Hive Redshift3 billionrowsSimple rangequery1680seconds (28min)360 seconds(6 min)1 millionrows2 complexjoins182 seconds 8 seconds$13.60/hour on Redshift versus $57/hour onHIVE
  • Every day is crucial and costly
  • Challenge: To run a virtual screen with a higheraccuracy algorithm & 21 million compounds
  • Metric CountCompute Hours ofWork109,927 hoursCompute Days ofWork4,580 daysCompute Years ofWork12.55 yearsLigand Count ~21 million ligandsUsing Cycle Computing and AmazonWeb Services
  • 3 Hoursfor $4828.85/hr
  • Instead of $20+Million inInfrastructure
  • GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  • Open web index.3.4 billion records.Available to all.1000 Genomesproject
  • GenerationCollectStoreCollaboration & sharingAnalysis and Computation
  • Thank you! 21st, COEX Auditorium, SeoulOne day Free trainingWalk through of services