Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtworks

21,872 views

Published on

We are in the midst of an exciting time. There is an explosion of very interesting data, and emergence of powerful new technologies for harnessing data, and devices that enable humans to receive tremendous benefits from it. What is required are innovative processes that enable the creation and delivery of value from all of that data. More often than not, it is the predictive (what will happen?) and prescriptive (how to make it happen!) analytics that produces this value, not the raw data itself.

Agile software teams are continuously involved in projects that involve rich, complex, and messy data. Often this data represents innovative analytics opportunities. Being analytics-aware gives these teams the opportunity to collaborate with stakeholders to innovate by creating additional value from the data. This session is aimed at making Agile software teams more analytics-aware so that they will recognize these innovation opportunities.

The trouble with conventional analytics (like conventional software development) is that it involves long, phased, sequential steps that take too long and fail to deliver actionable results. This talk will examine the convergence of the following elements of an exciting emerging field called Agile Analytics:
•sophisticated analytics techniques, plus
•lean learning principles, plus
•agile delivery methods, plus
•so-called "big data" technologies
Learn:
•The analytical modeling process and techniques
•How analytical models are deployed using modern technologies
•The complexities of data discovery, harvesting, and preparation
•How to apply agile techniques to shorten the analytics development cycle
•How to apply lean learning principles to develop actionable and valuable analytics
•How to apply continuous delivery techniques to operationalize analytical models

Published in: Technology

Big Data Agile Analytics by Ken Collier - Director Agile Analytics, Thoughtworks

  1. 1. BIG DATA AGILE ANALYTICS Ken Collier, PhD Director, Agile Analytics @theagilist #thoughtworks 1
  2. 2. Value Complexity What happened? Descriptive Analytics Why did it happen? Diagnostic Analytics What will happen? Predictive Analytics How can we make it happen? Prescriptive Analytics
  3. 3. Value Complexity What happened? Descriptive Analytics Why did it happen? Diagnostic Analytics What will happen? Predictive Analytics How can we make it happen? Prescriptive Analytics 3 Traditional Business Intelligence Advanced Analytics
  4. 4. Agile Analytics Big Data Solutions Thinking Ethics Agile DeliveryLean Learning Impact Advanced Analytics
  5. 5. Agile Analytics Big Data Solutions Thinking Ethics Agile Delivery Lean Learning Impact Advanced Analytics Volume Velocity Variety NoSQL Complexity Polyglot Persistence
  6. 6. Big Data Analytics Pipeline Modeling Data Operational Data External Data Data Integration Reporting Engine Dimension Mapping Clean Data Report Report Report Dimensional Data Data Sampling Feature Selection Data Partitioning Test Data Training Data Analytical Modeling Candidate Model Model Validation Accepted Model
  7. 7. Agile Analytics Big Data Solutions Thinking Ethics Agile Delivery Lean Learning Impact Advanced Analytics Volume Velocity Variety NoSQL Complexity Polyglot Persistence
  8. 8. Advanced Analytics Agile Analytics Big Data Solutions Thinking Ethics Agile Delivery Lean Learning Impact Volume Velocity Variety NoSQL Complexity Polyglot Persistence
  9. 9. Discover & Explore Analyze & Act Data Convergence Analytical Divergence Discover Harvest Filter Integrate Augment Analyze Act Analytical Opportunities How Advanced Analytics Works If we knew X, we could do Y
  10. 10. Typical Timeline 3-6 months 2 months 2-4 months 10 Data Convergence Analytical Divergence Discover Harvest Filter Integrate Augment Analyze Act Analytical Opportunities Traditional Analytics If we knew X, we could do Y
  11. 11. Advanced Analytics Agile Analytics Big Data Solutions Thinking Ethics Agile DeliveryLean Learning Impact Volume Velocity Variety NoSQL Complexity Polyglot Persistence Continuous Integration Collaboration Evolve Continuous Delivery
  12. 12. Advanced Analytics Agile Analytics Big Data Solutions Thinking Ethics Agile DeliveryLean Learning Impact Volume Velocity Variety NoSQL Complexity Polyglot Persistence Continuous Integration Collaboration Evolve Continuous Delivery Hypothesis Build Learn Measure
  13. 13. Analytical Divergence Analytical Opportunities If we knew X, we could do Y Data Convergence Discover Harvest Filter Integrate Augment Analyze Act Repeat this cycle solving small problems every few days LEARN MEASURE BUILD Agility in Analytics
  14. 14. Retain high value customers High value business goal Like this example…
  15. 15. What’s the smallest, simplest thing we can do? Retain high value customers Like this example… Common features of defectors?
  16. 16. Is it useful & actionable? Retain high value customers Like this example… Common features of defectors?
  17. 17. Repeat!Retain high value customers Like this example… Common features of defectors? Shopping behaviors of defectors?
  18. 18. Retain high value customers Like this example… Common features of defectors? What leads to customers leaving? Shopping behaviors of defectors? What do defectors say about us? Customers’ sentiment before defecting? What encourages customers to stay? Do incentives reduce defection rates?
  19. 19. Problem solved or continue? What leads to customers leaving? Like this example… Common features of defectors? Shopping behaviors of defectors? What do defectors say about us? Customers’ sentiment before defecting? What encourages customers to stay? Do incentives reduce defection rates?
  20. 20. Advanced Analytics Agile Analytics Big Data Solutions Thinking Ethics Agile DeliveryLean Learning Impact Volume Velocity Variety NoSQL Complexity Polyglot Persistence Continuous Integration Collaboration Evolve Continuous Delivery Hypothesis Build Learn Measure Data Science Machine Learning Statistics
  21. 21. THE “DATA SCIENTIST” Machine Learning Statistical Modeling Artificial Neural Networks Decision Tree Learning Support Vector Machines Clustering …and many more… Bayesian Classification Monte Carlo Simulation Logistic Regression K-Nearest Neighbor …and many more… Domain Knowledge Data Semantics Business Understanding Business Communication Programming Skills Functional Programming Data “Wrangling” Map/Reduce, SQL, & NoSQL
  22. 22. Advanced Analytics Data Science Visual Storytelling Machine Learning Statistics Agile Analytics Big Data Solutions Thinking Ethics Agile DeliveryLean Learning Impact Volume Velocity Variety NoSQL Complexity Polyglot Persistence Continuous Integration Collaboration Evolve Continuous Delivery Hypothesis Build Learn Measure
  23. 23. drones.pitchinteractive.com Data Visualization
  24. 24. Advanced Analytics Data Science Visual Storytelling Machine Learning Statistics Agile Analytics Big Data Solutions Thinking Ethics Agile DeliveryLean Learning Impact Volume Velocity Variety NoSQL Complexity Polyglot Persistence Continuous Integration Collaboration Evolve Continuous Delivery Hypothesis Build Learn Measure Data Reduction
  25. 25. Objective Truth Discoverable Truth Uninterpretable Irrelevant Noise Not Actionable Impactful New Insights “Little Data”
  26. 26. Advanced Analytics Data Science Visual Storytelling Machine Learning Statistics Agile Analytics Big Data Solutions Thinking Ethics Agile DeliveryLean Learning Impact Volume Velocity Variety NoSQL Complexity Polyglot Persistence Continuous Integration Collaboration Evolve Continuous Delivery Hypothesis Build Learn Measure Data Reduction Insight Knowledge Action Disruption
  27. 27. Advanced Analytics Data Science Visual Storytelling Machine Learning Statistics Agile Analytics Big Data Solutions Thinking Ethics Agile DeliveryLean Learning Impact Volume Velocity Variety NoSQL Complexity Polyglot Persistence Continuous Integration Collaboration Evolve Continuous Delivery Hypothesis Build Learn Measure Data Reduction Insight Knowledge Action Disruption Business vs. IT Focus vs. Platform Monitor & Measure
  28. 28. Advanced Analytics Data Science Visual Storytelling Machine Learning Statistics Agile Analytics Big Data Solutions Thinking Ethics Agile DeliveryLean Learning Impact Volume Velocity Variety NoSQL Complexity Polyglot Persistence Continuous Integration Collaboration Evolve Continuous Delivery Hypothesis Build Learn Measure Data Reduction Insight Knowledge Action Disruption Business vs. IT Focus vs. Platform Monitor & Measure Privacy Controls Radical Transparency Data Democracy Open Data
  29. 29. Advanced Analytics Data Science Visual Storytelling Machine Learning Statistics Agile Analytics Big Data Solutions Thinking Ethics Agile DeliveryLean Learning Impact Volume Velocity Variety NoSQL Complexity Polyglot Persistence Continuous Integration Collaboration Evolve Continuous Delivery Hypothesis Build Learn Measure Data Reduction Insight Knowledge Action Disruption Business vs. IT Focus vs. Platform Monitor & Measure Privacy Controls Radical Transparency Data Democracy Open Data
  30. 30. Ken Collier, Director, Agile Analytics kcollier@thoughtworks.com Value Creation Cool New Technologies + Sophisticated Analytics + Lean Learning Principals + Fast Agile Delivery =

×