Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Guerrilla Analytics:
7 Principles for Agile Analytics
ENDA RIDGE, PHD
What You Will Learn
 Why you must identify and mitigate disruptions in projects
 How the Guerrilla Analytics Principles ...
What I’ve Learned
PhD
‘Design of
Experiments
for Tuning
Algorithms’
Boutique
Consultancy
Forensic
Data
Analytics
Senior
Ma...
Teams Need ‘Guerrilla Analytics’
Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net
3
Data
•Extra...
7 Guerrilla Analytics Principles
Principle 1: Space is cheap, confusion
is expensive
Principle 2: Prefer simple, visual pr...
Case Study
Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net
5
Case Study: Business Problem
 Situation: A pharma organization’s programme to improve its Identity
Access Management (IAM...
Case Study: Data Science Problem
System User Permission
System01 Chaz Email
System01 Chaz Network
System01 Dave Email
Syst...
Case Study: Approach
System User Permission
System01 Chaz Email
System01 Chaz Network
System01 Dave Email
System02 Chaz Em...
Case Study: Approach
System User Permission
System01 Chaz Email
System01 Chaz Network
System01 Dave Email
System02 Chaz Em...
Data
Data
•Extraction
•Receipt
•Loading
Analytics
•Transform
•Algorithms
•Consolidate
Insight
•Reporting
•Work Products
Co...
Data Receipt: Situation
2015-10-01.log
EMAIL_Server.csv
EMAIL_Server.csv2
IAM from Joe.log
2015-10-05.log
Security logs.lo...
Data Receipt: Guerrilla Analytics
Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net
12
Data
D001...
Data Loading: Situation
RawSchema
2015-10-01.log
EMAIL_Server.csv
EMAIL_Server.csv2
IAM from Joe.log
2015-10-05.log
Securi...
Data Loading: Guerrilla Analytics
Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net
14
RawSchema...
Analytics
Data
•Extraction
•Receipt
•Loading
Analytics
•Transform
•Algorithms
•Consolidate
Insight
•Reporting
•Work
Produc...
Transformation: Situation
Lots of renaming
IDent Usr sys PTY
3477 Charlie Email4.5 Read
4598 Snoopy Email4.5 Read;
send
… ...
Transformation: Guerrilla Analytics
Principles in Action
 Principle 3: Prefer automation
 Principle 4: Maintain Data Pro...
Algorithm: Situation
1
•Choose
data
•Apply
mapping
2
•Cast
•Index
3
•Reshape
& Join
•Apply
Rules
•Tidy
4
•Apply
Algorithm
...
Work Products: Guerrilla Analytics
Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net
19
Principl...
Early result
50
55
60
65
70
75
80
85
90
95
100
ABC AB ABD EF E A BD G ZY WZY
 Taking too long to cover users
 Still too ...
Iteration
Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net
21
Analysis: Situation
Data
•Latest data
•Latest mapping
Analysis
•Tidy data format
•Apply itemset
mining
Insight
•?
Copyrigh...
Analysis 1
Analysis 2
…
Guerrilla Analytics: Consolidate
1
•Choose
data
•Apply
mapping
2
•Cast
•Index
3
•Reshape
& Join
•A...
Analysis 1
Analysis 2
…
Guerrilla Analytics: Consolidate
1
•Choose
data
•Apply
mapping
2
•Cast
•Index
3
•Reshape
& Join
•A...
Analysis 1
Analysis 2
…
Guerrilla Analytics: Consolidate
1
•Choose
data
•Apply
mapping
2
•Cast
•Index
3
•Reshape
& Join
•A...
Reporting
Data
•Extraction
•Receipt
•Loading
Analytics
•Transform
•Algorithms
•Consolidate
Insight
•Reporting
•Work
Produc...
Iterative Analysis
50
55
60
65
70
75
80
85
90
95
100
ABC AB ABD EF E etc
 Data cleaning and algorithm tuning give
better ...
Reporting
Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net
28
Reporting: Situation
We analysed the latest data, applying an itemset
mining algorithm to recommend permission roles.
Resu...
Reporting: Situation
We analysed the latest data, applying an
itemset mining algorithm to recommend
permission roles.
Resu...
Guerrilla Analytics: Project Structure
Files Project
data
D001
D002
D010
…
…
work
prod
WP_001
WP_002
Copyright Enda Ridge ...
Guerilla Analytics: Project Structure
Files Project
data
D001
D002
D010
…
…
work
prod
WP_001
WP_002
Data Science environme...
Reporting: Guerrilla Analytics
We analysed the latest data, applying an itemset mining
algorithm to recommend permission r...
Guerrilla Analytics Success
Coped with multiple inconsistent data deliveries
Robust to evolving business rules and moving ...
Guerrilla Analytics Capability
Agility
3.
Guerrilla
Analytics
Mindset
2.
Supporting
Tools
1.
Simple
Conventions
Copyright ...
Guerrilla Analytics Capability
Agility
3.
Guerrilla
Analytics
Mindset
2.
Supporting
Tools
1.
Simple
Conventions
Copyright ...
Summing up
 Agility means delivering despite disruptions
 High performing agile teams have capability to
mitigate disrup...
Upcoming SlideShare
Loading in …5
×

Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

9,605 views

Published on

The 7 Guerrilla Analytics Principles and their application in a case study. First presented at Predictive Analytics World London 2015 (http://predictiveanalyticsworld.co.uk/london2015/agenda/#p2126)

Published in: Data & Analytics

Guerrilla Analytics: 7 Principles for Agile Analytics (Predictive Analytics World 2015)

  1. 1. Guerrilla Analytics: 7 Principles for Agile Analytics ENDA RIDGE, PHD
  2. 2. What You Will Learn  Why you must identify and mitigate disruptions in projects  How the Guerrilla Analytics Principles help  Case study on the Guerrilla Analytics Principles in action How this will help you  Data Scientists: you need a defensive Guerrilla Analytics mindset. Without it you will be overwhelmed by the highly iterative nature of predictive analytics  Managers and Directors: you need a Guerrilla Analytics capability for a high performing team Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 1
  3. 3. What I’ve Learned PhD ‘Design of Experiments for Tuning Algorithms’ Boutique Consultancy Forensic Data Analytics Senior Manager Professional Services Head of Algorithms Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 2 No matter the industry, teams are always plagued by the same problem … Time is wasted in the confusion and chaos of highly iterative Data Science 2004 2008 2010 2012 2015
  4. 4. Teams Need ‘Guerrilla Analytics’ Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 3 Data •Extraction •Receipt •Loading Analytics •Transform •Algorithms •Consolidate Insight •Reporting •Work Products
  5. 5. 7 Guerrilla Analytics Principles Principle 1: Space is cheap, confusion is expensive Principle 2: Prefer simple, visual project structures and conventions Principle 3: Prefer automation Principle 4: Maintain Data Provenance Principle 5: Version control changes Principle 6: Consolidate team knowledge Principle 7: Prefer code that runs from start to finish Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 4
  6. 6. Case Study Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 5
  7. 7. Case Study: Business Problem  Situation: A pharma organization’s programme to improve its Identity Access Management (IAM). IAM ensures that IT access privileges are granted according to one interpretation of policy  Objective: identify ‘permission roles’ that group up common IT permissions  Benefits:  IT efficiency. Assign roles instead of individual permissions  Staff and systems are properly authenticated and audited  Ensure company data is not at risk for being misused  Avoid regulatory non-compliance Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 6
  8. 8. Case Study: Data Science Problem System User Permission System01 Chaz Email System01 Chaz Network System01 Dave Email System02 Chaz Emailing System02 Chaz Sharepoint System02 Dave Sharepoint System02 Meg Email System02 Meg Sharepoint System02 Meg Network …. … …  Find common subsets of permissions  These are ‘permission roles’ for Identity Access Management  70 systems  Thousands of permissions  Users can access several systems  All systems are different  Team is mobilized and ready to review permissions Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 7
  9. 9. Case Study: Approach System User Permission System01 Chaz Email System01 Chaz Network System01 Dave Email System02 Chaz Emailing System02 Chaz Sharepoint System02 Dave Sharepoint System02 Meg Email System02 Meg Sharepoint System02 Meg Network …. …. …. User Permission Chaz Email Chaz Sharepoint Chaz Network Dave Email Dave Sharepoint Meg Email Meg Sharepoint Meg Network …. …. Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 8 Seems like a popular group
  10. 10. Case Study: Approach System User Permission System01 Chaz Email System01 Chaz Network System01 Dave Email System02 Chaz Emailing System02 Chaz Sharepoint System02 Dave Sharepoint System02 Meg Email System02 Meg Sharepoint System02 Meg Network …. …. …. User Permission Chaz Email Chaz Sharepoint Chaz Network Dave Email Dave Sharepoint Meg Email Meg Sharepoint Meg Network Sarah Email Sarah Sharepoint Sarah Network …. …. Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 9 Or is it this bigger group?
  11. 11. Data Data •Extraction •Receipt •Loading Analytics •Transform •Algorithms •Consolidate Insight •Reporting •Work Products Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 10
  12. 12. Data Receipt: Situation 2015-10-01.log EMAIL_Server.csv EMAIL_Server.csv2 IAM from Joe.log 2015-10-05.log Security logs.log 2015-10-07.log …  Multiple files from 70 different systems  No consistency  Delivered at different points in time  Refreshed at irregular intervals Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 11
  13. 13. Data Receipt: Guerrilla Analytics Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 12 Data D001 •2015-10-01.log D002 •EMAIL_Server.csv D003 •EMAIL_Server.csv2 D004 •IAM from Joe.log D005 •2015-10-05.log …  Principle 1: Space is cheap, confusion is expensive  Principle 2: Prefer simple, visual project structures and conventions  Principle 4: Maintain Data Provenance  Robust to multiple data deliveries  Robust to random file names and customer inconsistencies
  14. 14. Data Loading: Situation RawSchema 2015-10-01.log EMAIL_Server.csv EMAIL_Server.csv2 IAM from Joe.log 2015-10-05.log Security logs.log 2015-10-07.log …  Files loaded all over the analytics environment  Files renamed  Files moved  Files ‘archived’  Raw files edited Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 13
  15. 15. Data Loading: Guerrilla Analytics Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 14 RawSchema D001 2015-10- 01.log D002 EMAIL_Server.csv D003 EMAIL_Server.csv2 D004 IAM from Joe.log D005 2015-10- 05.log D006 Security logs.log D007 2015-10- 07.log …  Principle 1: Space is cheap, confusion is expensive  Keep everything  Principle 2: Prefer simple, visual project structures and conventions  One place for raw data  Principle 4: Maintain Data Provenance  Don’t rename, move, modify in any way  Robust to crazy inconsistent files  Force code to explicitly use data IDs
  16. 16. Analytics Data •Extraction •Receipt •Loading Analytics •Transform •Algorithms •Consolidate Insight •Reporting •Work Products Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 15
  17. 17. Transformation: Situation Lots of renaming IDent Usr sys PTY 3477 Charlie Email4.5 Read 4598 Snoopy Email4.5 Read; send … … … …  70 different systems  Unhelpful field names  Evolving understanding of correct fields Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 16 id user system permission 3477 Charlie Email4.5 Read 4598 Snoopy Email4.5 Read; send … … … …
  18. 18. Transformation: Guerrilla Analytics Principles in Action  Principle 3: Prefer automation  Principle 4: Maintain Data Provenance  Principle 5: Version control changes  Principle 6: Consolidate team knowledge  Robust to evolving names and inconsistencies  Data provenance of field names Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 17 IDent Usr sys PTY 3477 Charlie Email4.5 Read 4598 Snoopy Email4.5 Read; send … … … … id user system permissio n 3477 Charlie Email4.5 Read 4598 Snoopy Email4.5 Read; send … … … … dataset from to Sys1 IDent id Sys1 Usr user … … …
  19. 19. Algorithm: Situation 1 •Choose data •Apply mapping 2 •Cast •Index 3 •Reshape & Join •Apply Rules •Tidy 4 •Apply Algorithm •Check Output Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 18  Where do my outputs go?  How to iteratively develop code/rules etc?  Different algorithm parameters  Different algorithms  How do I iterate with the broader team and customer?
  20. 20. Work Products: Guerrilla Analytics Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 19 Principles in actionWorkProducts WP001 010_Reshape.sql 020_Apply_Rules.sql 030_Algorithm.py 050_Reports.py 050_Report.ppt WP002 WP003 …  Principle 1: Space is cheap, confusion is expensive  Keep everything  Principle 2: Prefer simple, visual project structures and conventions  One place for each output  Principle 4: Maintain Data Provenance  Code, plots, reports etc in one place  Robust to multiple iterative work products  Scalable to team of any size
  21. 21. Early result 50 55 60 65 70 75 80 85 90 95 100 ABC AB ABD EF E A BD G ZY WZY  Taking too long to cover users  Still too many permission groups  suspect data quality  Could tweak the itemset mining algorithms  Need to iterate and improve Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 20
  22. 22. Iteration Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 21
  23. 23. Analysis: Situation Data •Latest data •Latest mapping Analysis •Tidy data format •Apply itemset mining Insight •? Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 22  Wasted effort in repetition  Risk of inconsistency in repetitions  Need clear view of how understanding has evolved
  24. 24. Analysis 1 Analysis 2 … Guerrilla Analytics: Consolidate 1 •Choose data •Apply mapping 2 •Cast •Index 3 •Reshape & Join •Apply Rules •Tidy 4 •Published Interface Datasets Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 23
  25. 25. Analysis 1 Analysis 2 … Guerrilla Analytics: Consolidate 1 •Choose data •Apply mapping 2 •Cast •Index 3 •Reshape & Join •Apply Rules •Tidy 4 •Published Interface Datasets Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 24 Build tool automation Version controlled code
  26. 26. Analysis 1 Analysis 2 … Guerrilla Analytics: Consolidate 1 •Choose data •Apply mapping 2 •Cast •Index 3 •Reshape & Join •Apply Rules •Tidy 4 •Published Interface Datasets Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 25 Build tool automation Version controlled code  Principle 3: prefer automation  Principle 4: maintain data provenance  Principle 5: version control changes  Principle 6: consolidate team knowledge
  27. 27. Reporting Data •Extraction •Receipt •Loading Analytics •Transform •Algorithms •Consolidate Insight •Reporting •Work Products Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 26
  28. 28. Iterative Analysis 50 55 60 65 70 75 80 85 90 95 100 ABC AB ABD EF E etc  Data cleaning and algorithm tuning give better results  Clear version of ‘consolidated knowledge’  Clear work products for each iteration Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 27
  29. 29. Reporting Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 28
  30. 30. Reporting: Situation We analysed the latest data, applying an itemset mining algorithm to recommend permission roles. Results suggest an optimal cut-off of 3 permission roles to cover 80% of user activities. The remaining users should be reviewed in light of…. Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 29 50 60 70 80 90 100 ABC AB ABD EF E etc
  31. 31. Reporting: Situation We analysed the latest data, applying an itemset mining algorithm to recommend permission roles. Results suggest an optimal cut-off of 3 permission roles to cover 80% of user activities. The remaining users should be reviewed in light of….  Which latest data?  Which systems?  Which algorithm?  parameters?  Which business rules?  What recommendations?  How is it different from last iteration? Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 30 50 60 70 80 90 100 ABC AB ABD EF E etc
  32. 32. Guerrilla Analytics: Project Structure Files Project data D001 D002 D010 … … work prod WP_001 WP_002 Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 31
  33. 33. Guerilla Analytics: Project Structure Files Project data D001 D002 D010 … … work prod WP_001 WP_002 Data Science environment Project data D001 D002 … build clean_data algo_input work prod WP_001 WP_002 Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 32
  34. 34. Reporting: Guerrilla Analytics We analysed the latest data, applying an itemset mining algorithm to recommend permission roles. Results suggest an optimal cut-off of 3 permission roles to cover 80% of user activities. The remaining users should be reviewed in light of….  Which latest data?  Which rules?  Which systems?  Build version 2.2  Which algorithm parameters?  What recommendations?  Work product 042  How is it different from last iteration?  Work product 031 versus 042 Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 33 50 60 70 80 90 100 ABC AB ABD EF E etc
  35. 35. Guerrilla Analytics Success Coped with multiple inconsistent data deliveries Robust to evolving business rules and moving target of live systems Quick turn around of different algorithms while closing out permission roles in a live system Project delivered in weeks rather than months Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 34
  36. 36. Guerrilla Analytics Capability Agility 3. Guerrilla Analytics Mindset 2. Supporting Tools 1. Simple Conventions Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 35
  37. 37. Guerrilla Analytics Capability Agility 3. Guerrilla Analytics Mindset 2. Supporting Tools 1. Simple Conventions Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 36 • 7 Guerrilla Analytics Principles • 100+ practice tips • Data Science patterns •Build Tools •Tracking •Version control •Data receipt •Data load •Tidy Data format •…
  38. 38. Summing up  Agility means delivering despite disruptions  High performing agile teams have capability to mitigate disruptions  7 Guerrilla Analytics Principles for defensive Data Science  Guerrilla Analytics Principles in action across  Data receipt  Data load  Iterative work products  Consolidation  Reporting Copyright Enda Ridge 2015#GuerrillaAnalytics http://guerrilla-analytics.net 37 @Enda_Ridge http://guerrilla-analytics.net

×