Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Role of the
DevOps in the
Data Analytics
Teams
J ON THE BEACH
05/21/16
MORPHEDWITH
DEEP LEARNING™
TYPICAL OPSGUY
(sour...
My initial interests
Type Systems Automated Proving Abstract Program Interpretation Functional Programming Garbage Collect...
So to sum it up …
I (USED TO?)
TO BE A BIG NERD
Collaboration
CLICKERS CODERS
Software is a Human Problem
I ended up building
A collaborative software
For data science....
DEV OPS
&&
DATA
Let’s get back to the (brief) history of DevOps
Agile Conference, 2008
Scrum, and Agile
in an operational context
He	!	We	...
DevOps
DevOps is the practice of
operations and development
engineers participating together
in the entire service lifecyc...
Let’s take an example: John devops from 2009
Learnt Python the Hard Way
Startedwith Puppet 1.0
Used EC2 before ELB and EBS...
Hegelian perspective
Conflict and Frustration
Concept
Combination
Catharsis
Create Culture
Share
Create Tools
Dev
+
Ops
There’s been op associated to data for a while ?
It’s called Business Intelligence !
History of Data Analytics (Oversimplified)
2013 2014 2015 2016 2017 2018
Moving to a world of automated decision making
DA...
The Age Of Distributed Intelligence
Global,	Personalised	
and	Real	Time	Data	
Driven	Services
Data, Analytics and Data Science
Conflict and Frustration
Concept
Combination
Catharsis
Create Culture
Share
Create Tools
...
Welcome to Technoslavia !
Classic Business Intelligence Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer Business Proj...
Data Science Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
Data ...
Is there room for a new role ?
Data
Plumberer
Data
Engineer
Data
Scientist
Data
Waiter
Data
Cleaner
Data
Analyst
REAL
JOB
...
Imagine
a company building
a new ”smart car” app: AutoFine™
”Revolutionary Collaborative network that check the quality of...
Imagine
a company building
a new ”smart car” service AutoFine™
10 TB of Data
Every Month
Hive / Spark /
Python
10 Differen...
????
??
??
OPERATIONS : Whose is responsible for …
Check that the newly
trained model perform as
expected
Check that the p...
DATA OPS
As a
Philosophy
X OPS PHILOSOPHY
Highly
consensual
Highly
controversial
Create an API culture
Do not share
o Random Piece of Code
o Flat File
o Email
Do share
ü Reproductible documentedworkflows...
Defensive Data
Programming
•Software has errors.
•You are not your software, yet
you are are responsible for the
errors.
•...
Defensive Data Programming
•Handle the case when one of the input file is empty
•Handle the case when a new value appear
•...
Monitoring : the alerts for people who love it
• Performance ….
• Time Spent …
• Number of Errors …
Monitoring : Business Informal Monitoring
• % Opening
• Market Spent
• Exception User Events …
Resource Allocation
I’ve got this strange
Error ”OutOfMemory” . Do you know what it is
?
Why is the Hadoop Cluster going s...
The Philosophy of pre-allocating
more resources than necessary
Get to the latest package culture …
Data Scientist
I need the latest version of scikit
And networkX ….
And coud you repack...
The culture of containers
Developers’ Sandbox
DATA OPS
As a
Job Title
Job Title : a matter of name, $$ and social ladder
Data scientist Data Ops
Developer
Statistician
Full Stack Developer
Sys...
Job Role : A matter of Do or Don’t
DO DON’T
Things you really want to do Things you really don’t want to get into
FIGHT THE
TOY PLATFORM ANTI-PATTERN
Test and Invest in Infrastructure == Skilled People
or
Go For Cloud / Packaged Infrast...
FIGHT THE
TECHNO MISMATCH ANTI-PATTERN
Assume Being Polyglot
or
Be a Dictator
VS
VS
The	Python
Clan
The	R
Tribe
The	Old	El...
GETTINGDATA POLITICS
> DATA NOT
AVAILABLE
GETTINGDATA POLITICS
THE	FOX
Hunt for Big Problem!
Convince the CEO that you can
Solve a Business Critical problem
And use...
PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY
Website	2000’	winners	
Companies	that	were	able	to	release fast	
"Artificial	Inte...
OWN ANONYMISATION / PRIVACY
/ DATA SECURITY WITH PARTNERS ISSUES
Technical Feasibility ? What can or cannot be done ?
Let’s Wrap IT Up !
A Company Building a GPS powered automated car fine system
10 TB of Data
Every Month
Hive / Spark /
Pyt...
But you where do you stand ?
???? ???? ???? ?????
What's your roll-back strategy like?
What kind of multi-variatetesting o...
http://bit.ly/production-survey
Food		for	thoughts
www.dataiku.com/blog
THANK	YOU	!
http://bit.ly/production-survey http:/...
The Rise of the DataOps - Dataiku - J On the Beach 2016
Upcoming SlideShare
Loading in …5
×

The Rise of the DataOps - Dataiku - J On the Beach 2016

7,380 views

Published on

Many organisations are creating groups dedicated to data. These groups have many names : Data Team, Data Labs, Analytics Teams….
But whatever the name, the success of those teams depends a lot on the quality of the data infrastructure and their ability to actually deploy data science applications in production.
In that regards a new role of “DataOps” is emerging. Similar, to Dev Ops for (Web) Dev, the Data Ops is a merge between a data engineer and a platform administrator. Well versed in cluster administration and optimisation, a data ops would have also a perspective on the quality of data quality and the relevance of predictive models.

Do you want to be a Data Ops ? We’ll discuss its role and challenges during this talk

Published in: Technology

The Rise of the DataOps - Dataiku - J On the Beach 2016

  1. 1. The Role of the DevOps in the Data Analytics Teams J ON THE BEACH 05/21/16 MORPHEDWITH DEEP LEARNING™ TYPICAL OPSGUY (source: Reddit) TYPICAL YOUNGDATA SCIENTIST (source: Common Sense)
  2. 2. My initial interests Type Systems Automated Proving Abstract Program Interpretation Functional Programming Garbage Collection and Vms Graph Analytics Chess IA Natural Language Processing 80% Emacs /20% VIM
  3. 3. So to sum it up … I (USED TO?) TO BE A BIG NERD
  4. 4. Collaboration CLICKERS CODERS Software is a Human Problem I ended up building A collaborative software For data science....
  5. 5. DEV OPS && DATA
  6. 6. Let’s get back to the (brief) history of DevOps Agile Conference, 2008 Scrum, and Agile in an operational context He ! We should have our own velocity in Belgium 10 deploysper day : Dev and Op Operation at Flickr O’Reilly Velocity, June 2009Patrick Dubois 2007 Dev Ops QA DevOpsDays Ghent, October 2009
  7. 7. DevOps DevOps is the practice of operations and development engineers participating together in the entire service lifecycle, from design through the development process to production support. DevOps is also characterized by operations staff making use many of the same techniques as developers for their systems work. Invite Ops to the Dev Meeting Oh. And let them SPEAK Ops should know how to code
  8. 8. Let’s take an example: John devops from 2009 Learnt Python the Hard Way Startedwith Puppet 1.0 Used EC2 before ELB and EBS !
  9. 9. Hegelian perspective Conflict and Frustration Concept Combination Catharsis Create Culture Share Create Tools Dev + Ops
  10. 10. There’s been op associated to data for a while ? It’s called Business Intelligence !
  11. 11. History of Data Analytics (Oversimplified) 2013 2014 2015 2016 2017 2018 Moving to a world of automated decision making DATA FOR MORE INSIGHTS DATA FOR AUTOMATED DECISIONS
  12. 12. The Age Of Distributed Intelligence Global, Personalised and Real Time Data Driven Services
  13. 13. Data, Analytics and Data Science Conflict and Frustration Concept Combination Catharsis Create Culture Share Create Tools Data + Science
  14. 14. Welcome to Technoslavia !
  15. 15. Classic Business Intelligence Team Organization Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor BI Solution Architect Model Designer ETL Developer Dashboard / Report Designer Specs Dim Big Boss
  16. 16. Data Science Team Organization Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor Data Engineer Data Analyst System Engineer / Data Architect Business Needs Data Scientist IT Constraints I.T.
  17. 17. Is there room for a new role ? Data Plumberer Data Engineer Data Scientist Data Waiter Data Cleaner Data Analyst REAL JOB DREAM JOB DevOps For Data?
  18. 18. Imagine a company building a new ”smart car” app: AutoFine™ ”Revolutionary Collaborative network that check the quality of your driving and punish You with virtual fines if you’re a bad driver”
  19. 19. Imagine a company building a new ”smart car” service AutoFine™ 10 TB of Data Every Month Hive / Spark / Python 10 Different PredictiveModels Real-Time API / Workflow
  20. 20. ???? ?? ?? OPERATIONS : Whose is responsible for … Check that the newly trained model perform as expected Check that the product catalog and the websitetags remain consistent Check that the Hadoop cluster scales as expected and as enough bandwidthto handlethe workload Test the performance for the real-time API Monitor the performanceof the model and decide to rollback / maintain/ rollout
  21. 21. DATA OPS As a Philosophy
  22. 22. X OPS PHILOSOPHY Highly consensual Highly controversial
  23. 23. Create an API culture Do not share o Random Piece of Code o Flat File o Email Do share ü Reproductible documentedworkflows ü Clean, documentedAPIs
  24. 24. Defensive Data Programming •Software has errors. •You are not your software, yet you are are responsible for the errors. •You can never remove the errors, only reduce their probability.
  25. 25. Defensive Data Programming •Handle the case when one of the input file is empty •Handle the case when a new value appear •Handle the case when two columns become completely correlated •Handle the case when a column is 16k long •Etc.. Etc. etc…
  26. 26. Monitoring : the alerts for people who love it • Performance …. • Time Spent … • Number of Errors …
  27. 27. Monitoring : Business Informal Monitoring • % Opening • Market Spent • Exception User Events …
  28. 28. Resource Allocation I’ve got this strange Error ”OutOfMemory” . Do you know what it is ? Why is the Hadoop Cluster going slower than my laptop ?
  29. 29. The Philosophy of pre-allocating more resources than necessary
  30. 30. Get to the latest package culture … Data Scientist I need the latest version of scikit And networkX …. And coud you repackage that To enable TensorFlow optimizations ? System Administrator …..
  31. 31. The culture of containers Developers’ Sandbox
  32. 32. DATA OPS As a Job Title
  33. 33. Job Title : a matter of name, $$ and social ladder Data scientist Data Ops Developer Statistician Full Stack Developer Sys Admin DevOps
  34. 34. Job Role : A matter of Do or Don’t DO DON’T Things you really want to do Things you really don’t want to get into
  35. 35. FIGHT THE TOY PLATFORM ANTI-PATTERN Test and Invest in Infrastructure == Skilled People or Go For Cloud / Packaged Infrastructure Your Brand New Hadoop Cluster is perceived as slow, not so used and not reliable
  36. 36. FIGHT THE TECHNO MISMATCH ANTI-PATTERN Assume Being Polyglot or Be a Dictator VS VS The Python Clan The R Tribe The Old Elephant Fraternity The New Elephant Club
  37. 37. GETTINGDATA POLITICS > DATA NOT AVAILABLE
  38. 38. GETTINGDATA POLITICS THE FOX Hunt for Big Problem! Convince the CEO that you can Solve a Business Critical problem And use it as an excuse to get all The data you want ! THE SPIDER Create Network ! Create a set of trackers or Addictive Data Collection internally To get Data on your side !
  39. 39. PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY Website 2000’ winners Companies that were able to release fast "Artificial Intelligence with Data for Internet of Things" 2010’ winners Companies able to put intelligence in production ? Design a way to put “PREDITICTIVE MODELS” IN PRODUCTION
  40. 40. OWN ANONYMISATION / PRIVACY / DATA SECURITY WITH PARTNERS ISSUES Technical Feasibility ? What can or cannot be done ?
  41. 41. Let’s Wrap IT Up ! A Company Building a GPS powered automated car fine system 10 TB of Data Every Month Hive / Spark / Python 10 Different PredictiveModels Real-Time API / Workflow Robust Workflow With Data Quality Checks Functional Monitoring By Business People through Slack and Dashboards Monitoring for the API Feature Engineering Pipeline in Python
  42. 42. But you where do you stand ? ???? ???? ???? ????? What's your roll-back strategy like? What kind of multi-variatetesting or strategies do you havein place for predictivemodels? How do you manage the robustness of your data flow productionscripts? How can businesspeople monitor the performance of the application?
  43. 43. http://bit.ly/production-survey Food for thoughts www.dataiku.com/blog THANK YOU ! http://bit.ly/production-survey http://bit.ly/production-survey

×