Why the end of IT department will help data-scientists and Data-Science empowered by ipython notebooks?
In the first part of this talk, we will explain why IT departments are a major blocker to data-science and how to bypass them in the short term. To conclude, we will show why ipython notebook is the future of data-scientist practitioner with concrete examples.
4. Plan
1. Topics
○ Why the end of IT departments will help data-scientists
○ Data-Science empowered by ipython notebooks
2. Use cases
○ Algo trading
○ Clustering visualization
○ Confusion matrix visualization
○ Outlier inspection
○ Session clustering (idstats)
○ Amazing data-science platform: Quantopian
5. QA
Just another barrier of entry
Reminder: Data Maturity
Barriers of entry Levels
ML ● Sampling
● Big-Data
Level 5 | Level 1 | Level 2 | Level 3 | Level 4
6. The end of IT departments
● Car > 30K
● Gaz+parking = 5k
● max speed = 180 KM/h
● avg speed = 10 km/h
● ROI = 29%
● bike < 1K
● max speed = 45 km/h
● avg speed = 30 km/h
● ROI = 3000%
IT department
8. Strategies to get rid of IT
department*
*If don't cooperate, too slow, have always an excuse
-> union approach
1. Bypass them/ignore -> workarounds
http://fraka6.blogspot.ca/2014/08/dev-principle-you-should-apply-every.html
9. Strategies to get rid of IT
department*
*If don't cooperate, too slow, have always an excuse
-> union approach
1. Bypass them/ignore -> workarounds
2. Play their game -> Help them hang themselves
10. Strategy: Play the game
don't fight
1. Dialogue = explain goals
2. Listen proposal
3. Explain why it's not a good idea if its not
4. Do as they say (don't fight too much) -> Try
5. Evaluate: Failure + cost + lost 3 months
6. Who will be fired?
11. The NLU pipeline
virtual assistant
Why?
● Measure,Understand and Improve Virtual Assistant User Experience
What?
● Measure user experience (task completion), retention, ...
● Understand good/bad user experience ->
○ Speech
○ UX
○ Dialog
○ User
○ Client vs server side
○ Latency….
12. IT layer: R&D hadoop cluster
SQL layer of abstraction
Hook -> hadoop streaming
14. IPython Notebook
The IPython Notebook is an interactive computational environment, in which you can combine code execution, rich
text, mathematics, plots and rich media, as shown in this example session:
ipython notebook
http://fraka6.blogspot.ca/2015/04/how-to-create-your-ipython-datascience.html
extend to all
language
19. Simple way to inspect outliers?
mlboost/clustering/visu.py (matplotlib+scipy)
http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html
20. How to see session clusters?
mlboost/utils/idstats.py (mlboost)
http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html
29. Conclusion -> disrupt or be disrupted
● IT department = constraint to efficient data-science
○ IT -> business solution but also biggest problem
○ IT departments will die it's not an if but when
○ Last argument = Security
○ Strategy = outsource (amazon) or be inefficient
○ Why they hire old CIO …
○ IPython notebook = efficient exploration
● Follow the lead of quantopian
○ Community+ python(Research->Experiment->deploy)
● To be data-driven, we need data efficiency at any cost