An exciting talk on the main difficulties and how to overcome them when building and scaling data teams with Florian Douetteau
- Technological issues: What stack should they choose for the company’s architecture? And what about big data technologies; should they accept being a polyglot or rather assume being a ruthless dictator?
- HR issues: Who should they hire? Should they build their data team as an extension of the BI team? Or should they build it from scratch?
- Data issues: How are they supposed to get data inside his data lake? Which strategy should they adopt: the cicada, the spider or the fox one?
- Product issues: What is big data really about? And eventually, what are they willing to do with this bunch of data?
The talk aims at demonstrating how tough it can be to build and scale a data department, and at giving some insights about the strategy Florian thinks they should adopt.
2. Meet HAL
Hal AlowneBI Manager
Dim’s Private Showroom
2Outreachdigital.org @outreachdigit
3. Meet Hal’s Boss, DIM
Hey Hal !
We need
a big data platform
like the big guys.
Just do what they’re doing!
‟
”Big Data
Copy Cat
Project
3Outreachdigital.org @outreachdigit
6. TOY PLATFORM ANTI-PATTERN
6
Test and Invest in Infrastructure == Skilled People
or
Go For Cloud / Packaged Infrastructure
Your Brand New Hadoop Cluster
is perceived as slow, not so used
and not reliable
Outreachdigital.org @outreachdigit
7. TECHNO MISMATCH ANTI-PATTERN
7
Assume Being Polyglot
or
Be a Dictator
VS
VS
The Python
Clan
The R
Tribe
The Old Elephant
Fraternity
The New Elephant
Club
Outreachdigital.org @outreachdigit
8. PREDICTIVE ANALYTICS DEPLOYMENT
STRATEGY
8
Website 2000’ winners
Companies that were able to release fast
"Artificial Intelligence with Data for
Internet of Things" 2010’ winners
Companies able to put intelligence in production
?
Design a way to put “PREDICTIVE MODELS”
IN PRODUCTION
Outreachdigital.org @outreachdigit
10. Classic Business Intelligence Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
BI Solution Architect
Model Designer
ETL Developer
Dashboard / Report Designer
Specs
Dim
Big Boss
10Outreachdigital.org @outreachdigit
11. Data Science Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
Data Engineer
Data Analyst
System Engineer /
Data Architect
Business
Needs
Data Scientist
IT
Constraints
I.T.
11Outreachdigital.org @outreachdigit
13. Managing Extreme Personalities
13
Data Scientist
Highly Creative
Passionate
Hard to hire?
Hard to manage?
Want to take
Hal’s job?Ambitious
Hard to retain?
Outreachdigital.org @outreachdigit
14. Paired for Data
14
Data Analyst
Discover Patterns
Data Engineer
Make things work
Fight
data
entropy
Fight
tech
entropy
Outreachdigital.org @outreachdigit
15. What do you prefer?
15
One Analyst
One Engineer
One Data Scientist
Four data scientists
OR
Outreachdigital.org @outreachdigit
16. Two Mindsets Can Coexist
CLICKERS CODERS
Outreachdigital.org @outreachdigit 16
18. What is the main reason for data project to fail ?
18
> DATA NOT
AVAILABLE
Outreachdigital.org @outreachdigit
19. BUT FOR ONLY INCREMENTAL GAIN
Contribution to the overall project performance
0% 25% 50% 75% 100%
20%30%50%
Business Goal Definition and Data Feature Engineering Algorithm
Outreachdigital.org @outreachdigit 19
20. How to Get Data if you don’t have it
20
THE GRASSHOPER THE SPIDER THE FOX
Outreachdigital.org @outreachdigit
22. The Cicada : Optimistic and Opportunistic Data
22
THE CICADA
As a startup
As a group inside a company
- Build a new product using open data
- Benefit from the data sharing initiative within your company
- Wait for data to be available in your data lake
Outreachdigital.org @outreachdigit
23. The Spider: Power of the Network
23
THE SPIDER
As a startup
As a group inside a company
- Create a network of (web trackers | sensors)
- Make it available for free
- Build your service on people’s collected data
- Make a web service available to collect data
- Promote it internally so that people use it
Outreachdigital.org @outreachdigit
24. The Fox: Hunt for the Big Money first
24
THE FOX
As a startup
As a group inside a company
- Hunt for a Business Group within a large company with a problem
- Build a SaaS solution using their data
- Replicate to competitors
- Take in a charge a critical problem as per the CEO’s request
- Build your own integrated tech team to solve it
- Use those ressources to reset data services internally
Outreachdigital.org @outreachdigit
26. What is Big Data about ?
26Outreachdigital.org @outreachdigit
27. The Age Of Distributed Intelligence
27
Global,
Personalised and
Real Time Data
Driven Services
Outreachdigital.org @outreachdigit
28. Data to Visualize or Data to Automate ?
2013 2014 2015 2016 2017 2018
Moving to a world of automated decision making
28
DATA
FOR MORE
INSIGHTS
DATA
FOR AUTOMATED
DECISIONS
Outreachdigital.org @outreachdigit
29. Involve Product Team
29
Product Feature
Personalised Item Ranking
Product Feature
Notify User Only when Needed
Product Feature:
Historical Data For Path Optimisation
Have Product Management Deeply Involved
In the Data Team
Outreachdigital.org @outreachdigit
30. Focus on your added value
30
Build by the
Data Team
Is the problem at the
Core of my Business Process?
Is it a common
problem / with share data?
Can i solve it on my
own?
Really?
Hire Consultant
and Learn
Build by
the Data Team
Go for Best of
Breed SaaS
Solution
Build by
the Data Team?
YesNo
No Yes
No Yes
No Yes
Outreachdigital.org @outreachdigit
31. Create an API culture
Do not share
o Random Piece of Code
o Flat File
o Email
Do share
✓ Reproductible documented workflows
✓ Clean, documented APIs
Outreachdigital.org @outreachdigit
32. Did Hal found his solutions ?
Technology
Data
People
Product
Polyglot on top of open source
Find a way to make clickers and coders work together
Create an API culture and involve the product teams
Hunt for Big Problems and Convince the CEOIs this the end ?‟
”
Hal Alowne
BI Manager
Dim’s Private Showroom
Outreachdigital.org @outreachdigit 32
35. Objective Alignment
Autonomous Vehicles Need Experimental Ethics: Are We Ready for Utilitarian Cars?
http://arxiv.org/abs/1510.03346
Outreachdigital.org @outreachdigit 35
36. Data-Driven Artificial Sales & Marketing ?
ARTIFICIAL
SUPERVISOR
Please Call
The customers
Please Call
again
Could you add a
JOKE at the end of
this email
I need you to
ATTEND
A physical meeting
Here is the BRIEF
Analyzing continuously prospect behavior on
social networks, applications and websites
Outreachdigital.org @outreachdigit 36
37. I don’t know the answer but here
a free software for the data addicts in
your company
data scientists
and engineers
25
by the numbers
for clickers and coders
3000
lovely
users
80
customers
by the customers
Outreachdigital.org @outreachdigit
40. My nerdy background
Type Systems Automated Proving Abstract Program Interpretation Functional Programming Garbage
Collection and
Vms
Graph Analytics Chess IA Natural Language Processing 80% Emacs / 20% VIM
Outreachdigital.org @outreachdigit
41. So to sum it up …
41
I (USED TO)
HATE
GUIs
Outreachdigital.org @outreachdigit
42. …and our software is
A supercharged Visual IDE for Data Teams
Deployment
Outreachdigital.org @outreachdigit