Transcript of "Big Data and Analytics Innovation Summit"
The big data pipelineHow customers are using the pipelineThe big data eco-system on the cloud
GenerationCollectStoreCollaboration & sharingAnalysis and Computation
GenerationCollectStoreCollaboration & sharingAnalysis and Computationlower cost,increasedthroughput
GenerationCollectStoreCollaboration & sharingAnalysis and Computationlower cost,increasedthroughputconstraint
Generated dataAvailable for analysisData volumeGartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Very high barrier toturning data intoinformation…
Very high barrier toturning data intoinformation.Infrastructure capacityTechnical SkillsQuestions to askCheap experimentation
Bankinter uses HPC on AWS for Monte CarloSimulation“Bankinter uses AWS as anintegral part of our credit-risk simulation application;We need to perform atleast 5,000,000 simulationsto get realistic results”CreditDataAverage simulationtime went from 23 hours to 20 minutes
Challenge:Learn about customer based onwhat they do, rather than whatthey say (i.e., data exhaust);virtually unlimited dataSolution:Always-on cluster continuallyprocesses new financial dataand stores results in S3.Collaborative filtering used toprovide recommendations andad-hoc queries performedusing Hive.
S&P Capital IQMicrosoftSQL ServerAmazon S3:• Companies You MayBe Interested InAmazon S3:• Clicks• Key Developments• Company ProfilesAmazon Elastic Map-Reduce:• Compute User Selectivity• Compute Key Developments• Join & Score
Challenge:Volatile weather is deadly to crops like grapes and tomatoesSolution:Built a predictive model based on freely available data—60 years ofcrop data, 14 TBs of soil data, and one million government Dopplerradar points. 50 hadoop clusters process new data as it comes into S3each day, continuously updating the model.150B SoilObservations3M DailyWeatherMeasurements850K PrecisionRainfall GridsTracked
Simulations Each Month• Per Simulation:• 10K Unique Scenarios Generated• 5 Trillion Datapoints• 5-6k Node Hadoop Cluster
AWSImport/ExportCorporatedata centerAmazonElasticMapReduceAmazonSimpleStorageService (S3)BI UsersClickstream datafrom 500+websites and VoDplatform
More than 25 Million Streaming Members50 Billion Events Per Day30 Million plays every day2 billion hours of video in 3months4 million ratings per day3 million searchesDevice location , time ,day, week etc.Social data