When Good Intentions Fail Tips on avoiding common advanced analytics traps Evan Stubbs Solution Manager, ANZ – SAS 16th February, 2010
Today’s Agenda Four (hopefully) thought provoking statements Some answers
My provocative statements for the day … Seeing only part of the picture is worse than seeing nothing at all. Rule-based detection systems will seduce, distract, and eventually trap you. Focusing on tools is the fastest road to failure. Insight generated in isolation is less than useless and will actually hurt you.
Seeing only part of the picture is worse than seeing nothing at all.
Anyone know what this is? Formula courtesy of Wired: http://www.wired.com/techbiz/it/magazine/17-03/wp_quant?currentPage=all
Consider this … A process identifies non-state individuals conspiring against the government based on: The contents of their communications Their communication methods of choice The frequency of their interactions If the individuals are conspiring, 99% of the time the test will be positive If the individuals are not conspiring, 99% of the time the test will be negative
So we execute! The test is put into production A collection of individuals are identified as conspiring against the government The test is known to be 99% accurate, so enforcement is mobilised and set into action Pretty conclusive, right? It may be wrong as high as 99.99% of the time, despite being 99% accurate (Huh?!?)
Here’s why … Few people actually conspire against the government: Assume 1 / 500,000 people actually conspire Assume Australia’s population is 22 million General formula: Population * (Incidence rate / Sample Population) * Test Efficiency A positive result will be wrong in 99.99% of cases, despite the test being 99% accurate
The Lessons If you look through a keyhole, you’ll only ever see a tiny part of the room. If you rely too heavily on a single detection method, you will be wrong, catastrophically so at times. It’s only a matter of time.
Anyone know what this is? David X. Li’s Gaussian Copula function, the formula that almost brought down the financial world
Rule-based detection systems will seduce, distract, and eventually trap you.
Another one … Identification of the communication point of a seditious cell could involve Their relationships The directionality and frequency of ‘interesting’ communication Analysis of the information shows that two individuals are equally possible information dissemination points There is one standout who, over three months, leads the number of ‘interesting’ messages sent Pretty conclusive, right?
Nope, yet again … Bad rules lead to bad results. Even worse, you may not know until well after the fact!
The Lessons Rules don’t work well with ‘context’, but they do provide a false sense of security. Maintaining a rules list can be a fun job in its own right! Rule-based detection works great when your subjects maintain their behaviour and are happy to be observed. How often does that happen?
Focusing on tools is the fastest road to failure.
There are many methodologies … Knowledge source Statistical Judgmental Univariate Multivariate Self Others Data- based Theory- based Role No role Unstructured Structured Extrapolation models Data mining Intentions/ expectations Role playing(Simulatedinteraction) Unaided judgment Quantitative analogies Neural nets Conjoint analysis Rule-based forecasting Feedback No feedback Linear Classification Segmentation Causal models Prediction markets Decom-position Structured analogies Delphi Judgmental bootstrapping Game theory Expert systems Methodology Tree for Forecasting forecastingpriciples.com JSA-KCG September 2005
And picking an approach can be complicated … Sufficient objective data Judgmental methods Quantitative methods No Yes Large changes expected Good knowledge of relationships Yes No Yes No Conflict among a few decision makers Policy analysis Type of data Large changes likely Yes No Yes No Yes No Time series Cross-section Accuracy feedback Similar cases exist Policy analysis Policy analysis Good domain knowledge Yes No No Yes Yes No Unaided judgment Type of knowledge No Yes Yes No Domain Self Delphi/ Predictionmarkets Judgmental bootstrapping/ Decomposition Conjoint analysis Intentions/ expectations Role playing(Simulatedinteraction/ Game theory) Structured analogies Expert systems Rule-based forecasting Extrapolation/ Neural nets/Data mining Causal models/ Segmentation Quantitative analogies Several methods provide useful forecasts Yes No Combine forecasts Single method Omitted information? Yes No Use adjusted forecast Use unadjusted forecast Selection Tree for Forecasting Methods forecastingprinciples.com JSA-KCG January 2006
Here’s a simpler approach … Which one gives me the answers? Which one lets me automate the manual stuff? Which one plays with everything else I have?
The Lessons The tools aren’t as important as answering the question quickly, accurately, and in a way that can be executed. Focus on solving the intelligence problem, not on the colour of widget X.
Insight generated in isolation is less than useless and will actually hurt you.
Evan’s Generalised Formula for Analysis Paralysis Every isolated information source, s, will create p new ‘possibilities’ Comparing and validating each of these possibilities will take t time The total time to compare and validate these possibilities : (((s*p)((s*p)-1))/2) * t
Evan’s Generalised Formula for Analysis Paralysis Let’s say you have: Five people Each coming up with their own set of ten calculations On their standalone desktops with their own extract of data And it takes two hours to validate and compare who has the ‘best’ answer Total time elapsed: 306 work days, or two months of wasted team effort And this is just for one small case!
The Lessons Every time you create a new standalone datasource, you geometrically increase your pointless workload. Every time you use another non-integrated tool, you waste time and money. Make sure your tools operationalise on a common platform, even if you find you must use multiple tools.
The Core Answers Focus on solving the problem Build a process that uses a wide range of validating / confirming techniques Integrate, re-use, automate, and operationalise everything Measure success by business outcomes, not models developed Keep things as simple as possible, but no simpler
Integrated Business Analytics Alert Generation Process Operational Data Sources Exploratory Data Analysis & Transformation Alert Administration Business Rules Social Network Analysis AnalyticsData Staging Network Rules Network Analytics Individuals Analytics Text Analytics Predictive Modeling Alert Management & Reporting Accounts Learn and Improve Cycle Interaction Management Transactions Intelligent Data Repository