Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Building Guerrilla Analytics Teams
Presented by:
Enda Ridge, PhD
People, Process and Technology
for Doing Data Science
Cop...
What this talk is about
• Data Science:
expectations and reality
• 3 Drivers for doing Data Science
• Why Data Science pro...
What we hear about Data Science
2Copyright Enda Ridge 2014
“Data is the new science. Big data holds the
answers.”
“the sex...
What we really want from Data Science
Copyright Enda Ridge 2014 3
• “I have made data available, now how do I use it?”
Lev...
My background
PhD Computer
Science
• Design of
Experiments for
Tuning
Algorithms”
Boutique
Consultancy
• Social Network
An...
Misconception about how we do Data Science
Copyright Enda Ridge 2014 5
Shearer C., The CRISP-
DM model: the new
blueprint ...
Reality – Guerrilla Analytics
• Disruptions
• Data
• Requirements
• Resources
• Business Rules
• Constraints
• Time
• Tool...
Guerrilla Analytics Workflow
Copyright Enda Ridge 2014 7
Data
• Extract
• Receive
• Load
Analytics
• Transform
• Algorithm...
Some Guerrilla Analytics Principles
• Prefer simple, project structures over heavily documented
and complex ones.1
• Prefe...
Building Guerrilla Analytics Capability
Copyright Enda Ridge 2014 9
Leverage
Justify
Ad-hoc
Guerrilla
Analytics
People
Pro...
People Capability
Copyright Enda Ridge 2014 10
People
Hard Skills
Programming
Software
Engineering
Visualization
Maths / S...
Capability: Data Programming
“Using a programming language to describe and execute data manipulations,
data analyses, data...
Capability: Software Engineering
“the application of a systematic, disciplined, quantifiable approach to the
development, ...
Capability: Domain Knowledge & Communication
Prefer analytics skills with great communication
Analytics
Forensic
Accountin...
Capability: Mind-set
Guerrilla Environment
• Changing requirements
• Poorly understood data
• Constraints
• Time pressure
...
TECHNOLOGY
Copyright Enda Ridge 2014 15
Guerrilla
Analytics
People
ProcessTech
Common Misconceptions about Technology
“If we use this tech, my team don’t need to code”
“We can productionise all possibl...
Technology Capability
Copyright Enda Ridge 2014 17
People
Agility
Data Manipulation
Environment
Scripting &
Command Line
S...
PROCESS
Copyright Enda Ridge 2014 18
Guerrilla
Analytics
People
ProcessTech
Guerrilla Analytics Workflow
Copyright Enda Ridge 2014 19
Data
• Extract
• Receive
• Load
Analytics
• Transform
• Algorith...
Common Misconceptions about Process
“We must document everything”
“We can completely plan a data science job”
“We should t...
Process Capability
Copyright Enda Ridge 2014 21
Data
• Extract
• Receive
• Load
Analytics
• Transform
• Algorithm
• Consol...
Summary
• Leverage
• Justify
• Ad-hoc
Data Science Aims
• Disruptions
• Constraints
• Reproducible, Testable, Explainable
...
Keep in Touch!
Copyright Enda Ridge 2014 23
@Enda_Ridge
GuerrillaAnalytics@gmail.com
www.guerrilla-analytics.net
Upcoming SlideShare
Loading in …5
×

Building Guerrilla Analytics Teams

1,064 views

Published on

A short introduction on how to build a Guerrilla Analytics capability. This is an overview of the People, Technology and Process for doing agile Data Science

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Building Guerrilla Analytics Teams

  1. 1. Building Guerrilla Analytics Teams Presented by: Enda Ridge, PhD People, Process and Technology for Doing Data Science Copyright Enda Ridge 2014
  2. 2. What this talk is about • Data Science: expectations and reality • 3 Drivers for doing Data Science • Why Data Science projects are so challenging • Introduction to Guerrilla Analytics • Building Guerrilla Analytics Capability Copyright Enda Ridge 2014 1 Guerrilla Analytics People ProcessTech
  3. 3. What we hear about Data Science 2Copyright Enda Ridge 2014 “Data is the new science. Big data holds the answers.” “the sexy job in the next 10 years will be statisticians” “Data Scientist: The Sexiest Job of the 21st Century” “Information is the oil of the 21st century, and analytics is the combustion engine.” http://www.gapminder.org/ http://www.statistics.com/data-science-quotes/ https://github.com/mbostock/d3/wiki/Gallery
  4. 4. What we really want from Data Science Copyright Enda Ridge 2014 3 • “I have made data available, now how do I use it?” Leverage • “I want to make data available or buy a data product. How do I know it will be worth it?” Justify • “I think I have a fraud problem / security breach / etc” • “Help me better understand my customers” Ad-hoc
  5. 5. My background PhD Computer Science • Design of Experiments for Tuning Algorithms” Boutique Consultancy • Social Network Analysis for Fraud Forensic Data Analytics • Professional Services Senior Manager • Data Science Consulting & Data Product Development Copyright Enda Ridge 2014 4
  6. 6. Misconception about how we do Data Science Copyright Enda Ridge 2014 5 Shearer C., The CRISP- DM model: the new blueprint for data mining, J Data Warehousing (2000); 5:13—22
  7. 7. Reality – Guerrilla Analytics • Disruptions • Data • Requirements • Resources • Business Rules • Constraints • Time • Toolsets • People • Repeatable • Explainable • Tested Copyright Enda Ridge 2014 6
  8. 8. Guerrilla Analytics Workflow Copyright Enda Ridge 2014 7 Data • Extract • Receive • Load Analytics • Transform • Algorithm • Consolidate Insight • Reports • Work Products Disruptions
  9. 9. Some Guerrilla Analytics Principles • Prefer simple, project structures over heavily documented and complex ones.1 • Prefer automation with program code over manual graphical approaches.2 • Link data on the file system, to data in the analytics environment, to data in work products.3 • Version control changes to program code AND data.4 Copyright Enda Ridge 2014 8
  10. 10. Building Guerrilla Analytics Capability Copyright Enda Ridge 2014 9 Leverage Justify Ad-hoc Guerrilla Analytics People ProcessTech
  11. 11. People Capability Copyright Enda Ridge 2014 10 People Hard Skills Programming Software Engineering Visualization Maths / Stats Soft Skills Communication Domain Knowledge Mindset
  12. 12. Capability: Data Programming “Using a programming language to describe and execute data manipulations, data analyses, data visualizations” Copyright Enda Ridge 2014 11 Guerrilla Environment • Wide variety of data • Poor quality data • Evolving understanding • Reproduce and repeat Benefit • Flexibility • Consolidation • Knowledge transfer • Self describing
  13. 13. Capability: Software Engineering “the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software” Copyright Enda Ridge 2014 12 Guerrilla Environment • Changing data • Iterations of work products • Reproduce despite pace • Correctness despite complexity Benefit • Version control • Testing • Automation • Issue/bug tracking
  14. 14. Capability: Domain Knowledge & Communication Prefer analytics skills with great communication Analytics Forensic Accounting Forensic Accountant Data Scientist Copyright Enda Ridge 2014 13
  15. 15. Capability: Mind-set Guerrilla Environment • Changing requirements • Poorly understood data • Constraints • Time pressure • Iterations • Dead Ends Required Capability • Tenacity • Curiosity • Problem solving • Communication The attitude and approach to work that best matches Guerrilla Analytics Copyright Enda Ridge 2014 14
  16. 16. TECHNOLOGY Copyright Enda Ridge 2014 15 Guerrilla Analytics People ProcessTech
  17. 17. Common Misconceptions about Technology “If we use this tech, my team don’t need to code” “We can productionise all possible data science scenarios” “We need to invest in a platform to get value from our data” “We need Big Data technology X” Copyright Enda Ridge 2014 16
  18. 18. Technology Capability Copyright Enda Ridge 2014 17 People Agility Data Manipulation Environment Scripting & Command Line Shared Space Visualization Consolidate Code Libraries Machine Images Project Wiki Process Support Source Code Control Issue Tracking Security
  19. 19. PROCESS Copyright Enda Ridge 2014 18 Guerrilla Analytics People ProcessTech
  20. 20. Guerrilla Analytics Workflow Copyright Enda Ridge 2014 19 Data • Extract • Receive • Load Analytics • Transform • Algorithm • Consolidate Insight • Reports • Work Products Disruptions
  21. 21. Common Misconceptions about Process “We must document everything” “We can completely plan a data science job” “We should track everything in a traditional top-down way” “Work products must be right first time” Copyright Enda Ridge 2014 20
  22. 22. Process Capability Copyright Enda Ridge 2014 21 Data • Extract • Receive • Load Analytics • Transform • Algorithm • Consolidate Insight • Reports • Work Products Log Data Receipt Track Work Product Versions Track Work Product Release
  23. 23. Summary • Leverage • Justify • Ad-hoc Data Science Aims • Disruptions • Constraints • Reproducible, Testable, Explainable Guerrilla Analytics Copyright Enda Ridge 2014 22 • Hard Skills • Soft Skills People Capability • Analytics Agility • Consolidation • Process Support Technology Capability • Tracking Data (Inputs) • Tracking Work Products Creation • Tracking Outputs Process Capability
  24. 24. Keep in Touch! Copyright Enda Ridge 2014 23 @Enda_Ridge GuerrillaAnalytics@gmail.com www.guerrilla-analytics.net

×