Data Science
Software Platform
The Solitude
Of The
Data Team Manager
Data Driven NYC
04/11/16
My nerdy background
Type Systems Automated Proving Abstract Program Interpretation Functional Programming Garbage Collection
and Vms
Graph Analytics Chess IA Natural Language Processing 80% Emacs /20% VIM
So to sum it up …
3
I (USED TO)
HATE
GUIs
…and our software is
A supercharged Visual IDE for Data Teams
Deployment
Meet HAL
Hal AlowneBI Manager
Dim’s Private Showroom
Meet Hal’s Boss, DIM
Hey Hal !
We need
a big data platform
like the big guys.
Just do what they’re doing!
‟
”
Big Data
Copy Cat
Project
TECHNOLOGY
Disconnect
What technologies should I
use ?‟
”
Welcome to Technoslavia !
8
TOY PLATFORM ANTI-PATTERN
9
Test and Invest in Infrastructure == Skilled People
or
Go For Cloud / Packaged Infrastructure
Your	Brand	New	Hadoop	Cluster
is	perceived	as	slow,	not	so	used	
and	not	reliable
TECHNO MISMATCH ANTI-PATTERN
10
Assume Being Polyglot
or
Be a Dictator
VS
VS
The	Python
Clan
The	R
Tribe
The	Old	Elephant
Fraternity
The	New	Elephant
Club
PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY
11
Website	2000’	winners	
Companies	that	were	able	to	release fast	
"Artificial	Intelligence	with		Data	for	
Internet	of	Things"	2010’	winners	
Companies	able	to	put	intelligence	in	production
?
Design a way to put “PREDITICTIVE MODELS”
IN PRODUCTION
PEOPLE
Disconnect
Who should I hire ?
‟ ”
Classic Business Intelligence Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer Business Project
Sponsor
BI Solution Architect
Model Designer
ETL Developer
Dashboard / Report Designer
Specs
Dim
Big Boss
Data Science Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
Data Engineer
Data Analyst
System Engineer /
Data Architect
Business
Needs
Data Scientist
IT
Constraints
I.T.
Manage Expectations
15
Data
Plumberer
Data
Engineer
Data
Scientist
Data
Waiter
Data
Cleaner
Data
Analyst
REAL
JOB
DREAM
JOB
Managing Extreme Personalities
16
Data Scientist
Highly Creative
Passionate
Hard to hire?
Hard to manage?
Want to take
Hal’s job?Ambitious
Hard to retain?
Paired for Data
17
Data Analyst
Discover Patterns
Data Engineer
Make things work
Fight
data
entropy
Fight
tech
entropy
What do you prefer?
18
One Analyst
One Engineer
One Data Scientist
Four data scientists
OR
Two Mindsets Can Coexist
CLICKERS CODERS
DATA
Disconnect
What about data ?
‟ ”
What is the main reason for data project to fail ?
21
> DATA NOT
AVAILABLE
BUT FOR ONLY INCREMENTAL GAIN
50 30 20
0% 25% 50% 75% 100%
Contribution	 to	the	overall	project	performance
Business	Goal	Definition	and	Data Feature	Engineering Algorithm
How to Get Data if you don’t have it
23
THE	CICADA THE	SPIDER THE	FOX
Be Optimistic !
Wait for Open Data Initiatives
And data available in the
Enterprise Hub / Data Lake !
Create Network !
Create a set of trackers or
Addictive Data Collection
internally
To get Data on your side !
Hunt for Big Problem!
Convince the CEO that you can
Solve a Business Critical problem
And use it as an excuse to get all
The data you want !
PRODUCT
Disconnect
What is Big Data about ?
25
The Age Of Distributed Intelligence
26
Global,	Personalised	
and	Real	Time	Data	
Driven	Services
Data to Visualize or Data to Automate ?
2013 2014 2015 2016 2017 2018
Moving to a world of automated decision making
27
DATA
FOR MORE INSIGHTS
DATA
FOR AUTOMATED DECISIONS
Involve Product Team
28
Product	Feature
Personalised	Item	Ranking
Product	Feature
Notify	User	Only	when	Needed
Product	Feature:
Historical	Data	For	Path	Optimisation
Have Product Management Deeply Involved
In the Data Team
Focus on your added value
29
Build by the
DataTeam
Is the problem at the
Core of my BusinessProcess?
Is it a common
problem / with share data?
Can i solve it on my own?
Really?
Hire Consultant
and Learn
Build by
the Data Team
Go for Best of Breed
SaaS Solution
Build by
the Data Team?
YesNo
No Yes No Yes
No Yes
Create an API culture
Do not share
o Random Piece of Code
o Flat File
o Email
Do share
ü Reproductible documentedworkflows
ü Clean, documentedAPIs
Did Hal found his solutions ?
Technology
Data
People
Product
Polyglot on top of open source
Find a way to make clickers and coders work together
Create an API culture and involve the product teams
Hunt for Big Problems and Convince the CEO
Is this the end ?‟
”
Hal Alowne
BI Manager
Dim’s Private Showroom
That was the (romanced) story…
data scientists
and engineers25
2 locations
1 software
by the numbers
For a simple software for clickers and coders
Car Sharing
Worldwide
Leader
FlashSales
Worldwide
Leader
One Mission :
Never leave Hal ALONE
3,700
Hotels
Worldwide
2500 lovely
users
by the numbers
70 customers
Food		for	thoughts
www.dataiku.com/blog
THANK	YOU	!
FREE (as in Beer) Software
www.dataiku.com/dss

Dataiku - data driven nyc - april 2016 - the solitude of the data team manager