Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics


Published on

Watch this recorded webcast and listen to Infochimps CSO and Co-Founder, Dhruv Bansal, and Think Big Analytics Principal Architect, Douglas Moore, share successful use cases and recommendations for building real-time predictive analytics in your enterprise.

Published in: Technology, Business
  • Be the first to comment

[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics

  1. 1. Measure Twice,Build Once#RTanalyticsDouglas MoorePrincipal Consultant & ArchitectMay 2013Dhruv BansalCSO
  2. 2. #RTanalytics | 2About UsNext generation Big Data stackto power your applicationsData science and engineeringservices that accelerate time to valueDouglas MoorePrincipal Consultant & ArchitectDhruv BansalCSO & Co-Founder
  3. 3. #RTanalytics | 3Agenda Think Big Use Case Infochimps Cloud: Streams, Queries, Batch Think Big & Infochimps TogetherMeasure Twice, Build OnceUnderstandthe problemModel thesolutionTest locallyGrow theinfrastructure
  4. 4. #RTanalytics | 4POLL!Poll
  5. 5. #RTanalytics | 5PollVery Advanced18%Advanced19%Not Advanced45%Not Started18%RESULTS: How advanced is your organizationsapproach to Big Data?
  6. 6. #RTanalytics | 6Accelerating Your Time to ValueStrategyand RoadmapIMAGINETrainingand EducationILLUMINATEHands-OnData Science andData EngineeringIMPLEMENTLeading Provider ofData Science and Engineering Services
  7. 7. #RTanalytics | 7 Use Cases- Scale batch analysis pipeline- Generate lively stats- Recommendations- Better Predictions• #page views in next 30days? Environment- AWS- Version 1 already in production Project Plan- 8-9 weeks- Combined Data Engineering+ Data ScienceEngagement- Staff• 1 Arch + 1 PM• 1 Data Engineer• 2 Data Scientists• 3 Client EngineersThe Beauty of Predictive Analytics
  8. 8. #RTanalytics | 8 Predictive Model Design & Build Process- Listening & Learning- Discovery (Digging through the data)- Creating a Research Agenda- Testing & Learning Production Quality Predictive Model Development- Data Cleansing, Aggregations, Conditioning- Predictive Model Training Process- Predictive Model Execution Process Challenges:- What functional forms predict future impression counts given counts up totime T?- Robust estimators, like medians rather than means, to cope with outliers- How do we distinguish between new articles, versus old articles wereseeing for the first time?- How well do impression counts correspond to real humans?Predictive Analytics Process
  9. 9. #RTanalytics | 9 Better end-user experience- View an ad, see the counter move. Need to catch fast moving events- Content half life measured at 3 hours (H Mason: Path to additional real-time capabilities- Example: Trend analysis to recommend ‘hot’ articles.Why Real-time?
  10. 10. #RTanalytics | 10Overall ArchitectureNoSQLMemcache (Tuple fail tracking)QueueHadoopAd ServingLBEdgeEdgeImpressionS3S3S3DFSArchive LogsManagementServerLBEdgeEdgeRelationalStoreAd ManagementAd SellingStorm- Queue Management- Simple Bot Filtering- Real-time Bucketization- Performance Counters- Event LoggingView AdCleansingModel TrainingRecommendationsEventsMonitoring & Alerting (Metrics, Alarms, Notifications)Model ParametersgetPredictionPerformance CountersImpression Buckets
  11. 11. #RTanalytics | 11Analytics ArchitectureStormWebServerTime SeriesBucketBoltSimple BotAnnotatorDFSAdapterImpressionSpoutTime SeriesBuckets(Batch)Time SeriesBuckets(Realtime)ImpressionPredictionPredictiveModelParametersImpressionsImpressionsImpressionsHadoopImpressionBucketizationPredictiveModelTrainingNoSQLBoltTime
  12. 12. #RTanalytics | 12AnalyzeMassive HistoricalData SetAnalyzeRecentPastRealtimePredictionSolution ApproachHistorical Data Set = S3Analyze = Hadoop + Pig + RRecent Past = Storm + NoSQLAnalyze = R + Web Service
  13. 13. #RTanalytics | 13POLL!Poll
  14. 14. #RTanalytics | 14PollLess than 30 days8%Less than 90 days54%More than 6 months38%RESULTS: Say you are building a Big Data project, which timeframe would you want to build a production solution?
  15. 15. #RTanalytics | 15Any Data  Any Analytics  Any Cloud
  16. 16. #RTanalytics | 16Data Flow Architecture5/10/2013
  17. 17. #RTanalytics | 17Inside Cloud::Streams
  18. 18. #RTanalytics | 18TwitterGnipPowertrackFacebookGnipEDCBlogsMoreoverMetabaseTVTranscriptionRadioTranscriptionPrintTranscriptionNewMediaDataSourcesTraditionalMediaDataSourcesTraditional & Social MediaListening Platform5/10/2013Full Example
  19. 19. #RTanalytics | 19POLL!Poll
  20. 20. #RTanalytics | 20PollHadoop36%Queries35%Real-time29%Which element of the Big Data stack is most important toyou?
  21. 21. #RTanalytics | 21Don’t Build it Yourself55% of enterpriseBig Data projects fail**According to a December 2012 survey of 300 IT organizations by SSWUG5%9%9%77%Project Costs by FunctionComputeSoftwareOperations StaffEngineering Staff
  22. 22. #RTanalytics | 22How Do We Compare tothe Competition?Competition Think Big &InfochimpsSpeed 6+ months to value 30 days to valueExperience New college gradsFew successfulimplementationsAdvanced Degrees& Published AuthorsQuality Offshore Onshore, ManagedServiceProven Learn on your dime Blue ChipCustomersMethodology Waterfall Agile, test & learn
  23. 23. Questions?#RTanalyticsThank you forparticipating!
  24. 24. #RTanalytics | 24Let’s continue the conversation!