Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pivotal Digital Transformation Forum: Data Science Bridging the Gap

6,782 views

Published on

Dr Carsten Riggelsen, Principal Data Scientist, Pivotal EMEA

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Pivotal Digital Transformation Forum: Data Science Bridging the Gap

  1. 1. Data Science: Bridging the Gap Between Data Generation and Data Comprehension Dr Carsten Riggselsen Principal Data Scientist Pivotal
  2. 2. 2© Copyright 2015 Pivotal. All rights reserved. Analyzing data is nothing new
  3. 3. 3© Copyright 2015 Pivotal. All rights reserved. “Their Data”“Our Data”“My Data” “Data” “The Data” “Data (Big)”
  4. 4. 4© Copyright 2015 Pivotal. All rights reserved. “Data” vs. “Data-Driven” Deploy analytic apps and automation at scale Store any type and size of data Discover insights Create analytics algorithms
  5. 5. 5© Copyright 2015 Pivotal. All rights reserved.
  6. 6. 6© Copyright 2015 Pivotal. All rights reserved. Data Science Product Management Product Design Engineering Continuous Improvement Data Science
  7. 7. 7© Copyright 2015 Pivotal. All rights reserved. Isolated Data Science I don’t think (Big) Data is valuable, it’s a hype – prove me wrong. We do BI and stuff already. Data Science is a hype – prove me wrong.
  8. 8. 8© Copyright 2015 Pivotal. All rights reserved. Data Science Product Management Product Design Engineering Continuous Improvement Data Science
  9. 9. 9© Copyright 2015 Pivotal. All rights reserved. Data Science Product Management Product Design Engineering Continuous Improvement
  10. 10. 10© Copyright 2015 Pivotal. All rights reserved. “Mere” convenience through Apps Automate mundane or tedious tasks Present information at a glance in an app User Interaction with the app Consistency and unbiasedness 24-7 availability Scalability Platform independence Easy Provisioning
  11. 11. 11© Copyright 2015 Pivotal. All rights reserved. Smart Apps – Data Science Powered Combining/link data sources/streams across areas and domains There is an element of prediction involved based on accumulated data/info Inferring (ab)normal patterns, e.g., profiling users, usage patterns There is an element of root-cause identification involved
  12. 12. 12© Copyright 2015 Pivotal. All rights reserved. DS-Cheat-Sheet - Is it a SMART App? q  Can past knowledge potentially improve on how to inform or act in the future? q  Is past knowledge based on data/info from different domains? q  Do you need to affect outcomes in real-time? q  Are (ab)normal patterns to be inferred? q  Is the reason or cause for an action or a pattern unclear yet an important thing to know? q  Is the solution highly personalised? q  Is “crowdsourcing” knowledge (data/information) beneficial?
  13. 13. 13© Copyright 2015 Pivotal. All rights reserved. The Car Unlock Button – Press it!
  14. 14. 14© Copyright 2015 Pivotal. All rights reserved. “Siri or OK Google – unlock my car… UnnnLoooock my Caaaar…” “OK – I will unlock your house”
  15. 15. 15© Copyright 2015 Pivotal. All rights reserved. SMART Unlock Access to your Calendar/Agenda Infer where/when you usually go by car Awareness of Bank Holidays etc. Knows where you parked your car Knows where you are (GPS)
  16. 16. 16© Copyright 2015 Pivotal. All rights reserved. Works Efficient Convenient Smart The Car-Unlock Experience I unlocked your car!
  17. 17. 17© Copyright 2015 Pivotal. All rights reserved. Examples
  18. 18. 18© Copyright 2015 Pivotal. All rights reserved. Obstruction Duration Prediction •  Predict duration of road incidents in London •  Android app developed on top of the model •  http://ds-demo-transport.cfapps.io
  19. 19. 19© Copyright 2015 Pivotal. All rights reserved. R E A LT I M E DASHBOARD Driving Prediction https://youtu.be/5gySgGWJMHA
  20. 20. 20© Copyright 2015 Pivotal. All rights reserved. Time to Delivery Ÿ  Three sub problems –  Time to delivery estimate –  Time slot availability –  Courier scheduling Ÿ  Courier scheduling and time to delivery estimate may have mutual feedback Logistics Comp. Logistics Comp.
  21. 21. 21© Copyright 2015 Pivotal. All rights reserved. Telco: Protecting Minors - Age Prediction Estimate age of the customer based on their calling habits Can distinguish minors in with an accuracy of >80% •  Call records from March-Aug 2014 •  Corresponds to ~3TB data •  Attributes are •  Calling party ID •  Called party ID •  Date •  Time •  Duration at start/end •  Location •  Type of call and bearer •  TAC •  Data •  Call records from March-Aug 2014 •  Corresponds to ~3TB data •  Attributes are •  Calling party ID •  Called party ID •  Date •  Time •  Duration at start/end •  Location •  Type of call and bearer •  TAC •  Data CDR CRM Data Feature Importance Observation Calls (holidays-schooltime) 0.08-0.06 Minors call less in school holiday Average call length 0.07 Minors make shorter calls Call timing (night-day) 0.07-0.03 Minors call more at nighttime Number of phone uses 0.05 Minors use the phones less Percentage of text use 0.05 Minors text less Number of contacts 0.05 Minors less likely to have 1 contact Percentage of calls to minors 0.04 Minors call other minors more Percentage of voice use 0.04 Typical Caller-Callee ratio 0.04 Minors receive more calls than make Fri/Sat/Thurs ratio 0.04-0.03 Minors call more at weekends Number of locations 0.04 Minors more likely to have 2 locs
  22. 22. 22© Copyright 2015 Pivotal. All rights reserved. Internal Transaction Fraud Detection Beyond signatures Beyond simple metrics for thresholding Beyond manual engineering of rules Monitor each and every entity in its environmental context
  23. 23. 23© Copyright 2015 Pivotal. All rights reserved. Internal Transaction Fraud Detection Beyond signatures Beyond simple metrics for thresholding Beyond manual engineering of rules Monitor each and every entity in its environmental context
  24. 24. 24© Copyright 2015 Pivotal. All rights reserved. 2 5 3 3 3,25 UserID and Data Experts analyze Overall vote is determined S(id) = w1 · M1(id) + ... + wj · Mj(id) X i wi = 1 s.t. Weights are a measure of “importance” for model expert j. Initially uniform across all experts. Mixture of Experts Metaphor
  25. 25. 25© Copyright 2015 Pivotal. All rights reserved. Anomalous User Behavior Comparison Mean Anomaly Scores Users Transaction Anomaly SoD Risk Terminated Employees CDHDR Access Anomaly VPN Access Anomaly Cluster Outlier Total Score # % Reg B Red 0.6 0.6 0.1 0.2 0.1 0.6 2.3 26 0.3% Amber 0.4 0.5 0.1 0.1 0.1 0.6 1.7 73 0.8% Green 0.0 0.0 0.0 0.0 0.1 0.0 0.1 8,765 98.9% Reg A Red 0.1 - - 1.0 0.4 0.9 2.4 1 0.01% Amber 0.4 0.2 0.0 0.1 0.2 0.7 1.7 25 0.4% Green 0.0 0.0 0.0 0.0 0.1 0.0 0.2 6,853 99.6%
  26. 26. 26© Copyright 2015 Pivotal. All rights reserved. Add SMARTness to your app by leveraging data Don’t think of Data Science in an isolated fashion Move beyond POCs on Big Data Start with a minimal viable product/solution Get the right platform and resources in place Collaborate and interact Conclusions
  27. 27. Digital Transformation Forum Disrupt or Be Disrupted 19 OCTOBER · BMW WELT EVENT CENTRE · MUNICH

×