Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Emerging Principles
for

Guerrilla Analytics
Development
Subtitle
Date

Enda Ridge, PhD

Edward Curry, PhD
Overview

What is Guerrilla Analytics
Why do we need Guerrilla Analytics?
•

Analytics project landscape

•

Example proje...
What is Guerrilla Analytics

Guerrilla Analytics is:
Data analytics for dynamic and high

velocity environments subject to...
Analytics Project Landscape

Velocity /
Dynamic

.

Guerrilla
Analytics

Business Intelligence (BI)
& Traditional Analytic...
Guerrilla Analytics: some example projects
Forensic Investigations and Financial Remediations e.g. ENRON and MF Global
 A...
Characteristics of projects requiring Guerrilla Analytics

Velocity/Dynamics

Restrictions
Toolset

Requirements

• Workin...
Going further than Agile Development – Guerrilla Analytics
 Agile has been successfully applied to some data analytics pr...
Methodology and Guerrilla Analytics Principles
Follows Design Science Principles (Hevner et al. 2004)
 With a Relevance C...
Methodology and Guerrilla Analytics Principles
Project

Description

Analytics Team Project Team

Duration

Financial
reme...
Guerrilla Analytics Principles
Simple lightweight principles to quickly build minimal analytics capability despite the
vel...
Guerrilla Analytics Principles: Data Manipulation

Data Manipulation
Tools and services for querying, profiling & changing...
Guerrilla Analytics Principles: Data Provenance

This is
some
dataasd
fasd fda
Fewaod
kewpfe
w
Dewdw
ea
Dawede
wdwed
w

Da...
Guerrilla Analytics Principles: Coding

Coding
Techniques for uniform, reproducible, reviewable code and analyses

Princip...
Guerrilla Analytics Principles: Testing

Select *
from
You!
Then put
results in a
table

Testing
Techniques for increasing...
Conclusions and Future Directions for Guerrilla Analytics
Conclusions
Simple and lightweight principles and techniques qui...
Upcoming SlideShare
Loading in …5
×

Guerrilla Analytics at BIC, Orlando, Dec 2012

1,533 views

Published on

These are the slides presented at the Business Intelligence Congress, Orlando, December 2012

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Guerrilla Analytics at BIC, Orlando, Dec 2012

  1. 1. Emerging Principles for Guerrilla Analytics Development Subtitle Date Enda Ridge, PhD Edward Curry, PhD
  2. 2. Overview What is Guerrilla Analytics Why do we need Guerrilla Analytics? • Analytics project landscape • Example projects and characteristics • Project differentiators requiring Guerrilla Analytics Principles for Guerrilla Analytics Development Conclusions and Future Directions Business Intelligence Congress 2012, Orlando, Florida 1
  3. 3. What is Guerrilla Analytics Guerrilla Analytics is: Data analytics for dynamic and high velocity environments subject to restrictions on capability, up-skilling and toolsets Business Intelligence Congress 2012, Orlando, Florida 2
  4. 4. Analytics Project Landscape Velocity / Dynamic . Guerrilla Analytics Business Intelligence (BI) & Traditional Analytics Restrictions (time, up-skilling, toolsets) Business Intelligence Congress 2012, Orlando, Florida 3
  5. 5. Guerrilla Analytics: some example projects Forensic Investigations and Financial Remediations e.g. ENRON and MF Global  Aim: quickly find how fraud was committed, how regulation was broken  Outputs: legal case, employee dismissal, regulatory fines Data Journalism e.g. coverage of London 2011 riots and prosecutions  Aim: quickly respond to breaking news events  Outputs: news articles, publications for public consumption Velocity / Dynamic Internal Ad-hoc Management Information  Aim: quickly produce internal MI not available directly from the BI suite  Outputs: KPI and insights supporting decision making by senior internal stakeholders Restrictions Business Intelligence Congress 2012, Orlando, Florida 4
  6. 6. Characteristics of projects requiring Guerrilla Analytics Velocity/Dynamics Restrictions Toolset Requirements • Working on client systems • Limited or new toolset • No time to source, license & install • Tight timelines (hours, days) • Changing requirements by the day • Little time for specifications • Reproducibility Team Capability • Newly assembled team • Short term duration • Wide mix of technical / business skills required • Little time to onboard & train Business Intelligence Congress 2012, Orlando, Florida Data • • • • Lack of documentation Changing & incomplete data Human keyed data Poor quality data sources (Excel, PDF, web scrape) 5
  7. 7. Going further than Agile Development – Guerrilla Analytics  Agile has been successfully applied to some data analytics projects but…  the restrictions and high velocity of some projects require changes and additions to Agile Agile Guerrilla Analytics Principles Partial Yes Requirements Yes Yes Team Capability No Yes Toolset No Yes Differentiator Data Velocity Restrictions Business Intelligence Congress 2012, Orlando, Florida 6
  8. 8. Methodology and Guerrilla Analytics Principles Follows Design Science Principles (Hevner et al. 2004)  With a Relevance Cycle, Rigor Cycle and Design Cycle (Hevner 2007) Initial principles defined using induction by simple enumeration Examined key factors in successful guerrilla analytics projects  Interviews using the key-informant method (Bagozzi et al. 1991; Campbell 1955) Case studies were chosen for a number of reasons:  Defined as Guerilla Analytics projects  Projects had encountered successes but also significant issues and challenges  Willingness to participate  Project staff well placed to provide valuable insights  Authors active participants within some of the projects. Business Intelligence Congress 2012, Orlando, Florida 7
  9. 9. Methodology and Guerrilla Analytics Principles Project Description Analytics Team Project Team Duration Financial remediation of government client Multiple undocumented system extracts and refreshes, ad-hoc spreadsheet data sources, daily delivery deadlines to distributed teams, reporting to high level government stakeholders 10 data analysts - statisticians, mathematicians, database experts and software engineers 100 forensic accountants and technology consultants Multiple guerrilla analytics phases over 1.5 years with client reporting every 3 months and daily delivery of analytics. Fraud investigation of manufacturing client Single system extract, distributed team. Data mining and user profiling to detect fraud event. 2 experienced data analysts 1 business consultant 3 weeks Forensic investigation of a bankruptcy Multiple system extracts and poor quality bank statement data. Reporting to government committee 6 data analysts, recent graduates to managers 3 forensic accountants 2 months Risk remediation of bank misselling Multiple system extracts, multiple ad hoc human keyed sources, daily delivery deadlines. Restricted to client systems 5 experienced data analysts 40 risk analysts 7 months Business Intelligence Congress 2012, Orlando, Florida 8
  10. 10. Guerrilla Analytics Principles Simple lightweight principles to quickly build minimal analytics capability despite the velocity and restrictions of the Guerrilla Analytics environment Data Manipulation Tools and services for querying, profiling & changing data This is some dataasd fasd fda Fewaod kewpfe w Dewdw ea Dawede wdwed w Data Provenance Tracing where data comes from and where it’s going to Coding Techniques for uniform, reviewable, reproducible code and analyses Select * from You! Then put results in a table Business Intelligence Congress 2012, Orlando, Florida Testing Techniques for increasing confidence in results 9
  11. 11. Guerrilla Analytics Principles: Data Manipulation Data Manipulation Tools and services for querying, profiling & changing data Principle #1: Establish a Data Manipulation Environment  Basic ability to manipulate data and files with program code Principle #2: Establish Common Data Manipulation Services such as Search and Cleaning  Query data, test joins, search all fields, find common field values, fuzzy search, fuzzy deduplicate, pattern match, tear down data, move data  Conventions on cleaning, naming and renaming Business Intelligence Congress 2012, Orlando, Florida 10
  12. 12. Guerrilla Analytics Principles: Data Provenance This is some dataasd fasd fda Fewaod kewpfe w Dewdw ea Dawede wdwed w Data Provenance Tracing where data comes from and where it’s going to Principle #3: Separate key types of data  System data dumps, ad-hoc data dumps, data produced by the team Principle #4: Log and version control all data received and produced  Simple conventions for logging all data and versions of data received and loaded  Simple conventions for storing work products  Easily visible from the data/files themselves to reduce documentation overhead Principle #5: Version control all analytics data builds  Begin early to capture knowledge in analytics builds  Version controlled with simple file and data naming conventions Business Intelligence Congress 2012, Orlando, Florida 11
  13. 13. Guerrilla Analytics Principles: Coding Coding Techniques for uniform, reproducible, reviewable code and analyses Principle #6: Write end-to-end analytics code  Avoid ad-hoc exploratory code snippets in final output  Tear down and rebuild ensures all data are up to date  Analytics end-to-end code is faster to rerun and review Principle #7: Design code and analytics for audit and reproducibility  Horizontal coding highlights changes to columns (derived fields, cleaning etc)  Vertical coding creates filter fields rather than immediately removing data  Intermediate tables allow review of key intermediate steps Business Intelligence Congress 2012, Orlando, Florida 12
  14. 14. Guerrilla Analytics Principles: Testing Select * from You! Then put results in a table Testing Techniques for increasing confidence in results Principle #8: Identify key project data indicators and test against them  Identify recognised data profiles (counts, domains of values etc)  Design work products so they can be reconciled against these numbers Principle #9: Check for consistency by testing against previous outputs  Lock down data understanding and complexity in versioned and tested data builds that are easily identified by naming convention or storage location  Draw analyses from these tested builds  Compare outputs to previous related work products Business Intelligence Congress 2012, Orlando, Florida 13
  15. 15. Conclusions and Future Directions for Guerrilla Analytics Conclusions Simple and lightweight principles and techniques quickly build minimal analytics capability despite:  Restrictions on toolsets and team capability  Very dynamic project requirements and data Future Directions  Continue to identify new principles  Team capability framework for measuring guerrilla analytics maturity  Enumerate team skill sets that enable guerrilla analytics Business Intelligence Congress 2012, Orlando, Florida 14

×