SlideShare a Scribd company logo
francis@qmining.com
Plan
1. Topics
○ Why the end of IT departments will help data-scientists
○ Data-Science empowered by ipython notebooks
2. Use cases
○ Algo trading
○ Clustering visualization
○ Confusion matrix visualization
○ Outlier inspection
○ Session clustering (idstats)
○ Amazing data-science platform: Quantopian
QA
Just another barrier of entry
Reminder: Data Maturity
Barriers of entry Levels
ML ● Sampling
● Big-Data
Level 5 | Level 1 | Level 2 | Level 3 | Level 4
The end of IT departments
● Car > 30K
● Gaz+parking = 5k
● max speed = 180 KM/h
● avg speed = 10 km/h
● ROI = 29%
● bike < 1K
● max speed = 45 km/h
● avg speed = 30 km/h
● ROI = 3000%
IT department
IT department only argument
Strategies to get rid of IT
department*
*If don't cooperate, too slow, have always an excuse
-> union approach
1. Bypass them/ignore -> workarounds
http://fraka6.blogspot.ca/2014/08/dev-principle-you-should-apply-every.html
Strategies to get rid of IT
department*
*If don't cooperate, too slow, have always an excuse
-> union approach
1. Bypass them/ignore -> workarounds
2. Play their game -> Help them hang themselves
Strategy: Play the game
don't fight
1. Dialogue = explain goals
2. Listen proposal
3. Explain why it's not a good idea if its not
4. Do as they say (don't fight too much) -> Try
5. Evaluate: Failure + cost + lost 3 months
6. Who will be fired?
The NLU pipeline
virtual assistant
Why?
● Measure,Understand and Improve Virtual Assistant User Experience
What?
● Measure user experience (task completion), retention, ...
● Understand good/bad user experience ->
○ Speech
○ UX
○ Dialog
○ User
○ Client vs server side
○ Latency….
IT layer: R&D hadoop cluster
SQL layer of abstraction
Hook -> hadoop streaming
Data-Science empowered by ipython notebook
wsgi
Proto to Prod
Exploration to Proto
IPython Notebook
The IPython Notebook is an interactive computational environment, in which you can combine code execution, rich
text, mathematics, plots and rich media, as shown in this example session:
ipython notebook
http://fraka6.blogspot.ca/2015/04/how-to-create-your-ipython-datascience.html
extend to all
language
notebook - train a algo trading strategy
trading/example/run.py (pytrade->pandas + sklearn + theanets + matplotlib)
notebook integration in git
Default queriesPCA, LLE, isomap, t-sne etc. -> mlboost/clustering/visu.py (scikit-learn+matplotlib)
http://fraka6.blogspot.ca/2013/04/simplifying-clustering-visualization.html
Confusion matrices visu/mlboost/util/sklearn_confusion_matrix.py (sklearn+matplotlib)
http://fraka6.blogspot.ca/2013/05/generating-confusion-matrix-great.html
Simple way to inspect outliers?
mlboost/clustering/visu.py (matplotlib+scipy)
http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html
How to see session clusters?
mlboost/utils/idstats.py (mlboost)
http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html
The quantopian use case
Community+Research->Experiment->deploy
The quantopian use case
Community+Research->Experiment->deploy
The quantopian use case
Community+Research->Experiment->deploy
The quantopian use case
Community+Research->Experiment->deploy
The quantopian use case
Community+Research->Experiment->deploy
The quantopian use case
Community+Research->Experiment->deploy
Quantopian = leader in data-science
platform & fintech revolution
Self-disrupt or
be disrupted
Python as a leverage
Conclusion -> disrupt or be disrupted
● IT department = constraint to efficient data-science
○ IT -> business solution but also biggest problem
○ IT departments will die it's not an if but when
○ Last argument = Security
○ Strategy = outsource (amazon) or be inefficient
○ Why they hire old CIO …
○ IPython notebook = efficient exploration
● Follow the lead of quantopian
○ Community+ python(Research->Experiment->deploy)
● To be data-driven, we need data efficiency at any cost
francis@qmining.com
hum...

More Related Content

Similar to Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
Insuk (Chris) Cho
 
Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023
Chris Grundemann
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
Dan Lynn
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
Zhenxiao Luo
 
Slurm @ 2018 LabTech
Slurm @  2018 LabTechSlurm @  2018 LabTech
Slurm @ 2018 LabTech
Tin Ho
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
Databricks
 
AWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runnersAWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runners
Anthony Scata
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
Tal Bar-Zvi
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
Greg Landrum
 
Machine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloMachine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and Accumulo
Rahul Singh
 
Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache Ignite
Denis Magda
 
Introducing PMDK into PostgreSQL
Introducing PMDK into PostgreSQLIntroducing PMDK into PostgreSQL
Introducing PMDK into PostgreSQL
NTT Software Innovation Center
 
Socket Programming with Python
Socket Programming with PythonSocket Programming with Python
Socket Programming with Python
GLC Networks
 
Why learn python in 2017?
Why learn python in 2017?Why learn python in 2017?
Why learn python in 2017?
Karolis Ramanauskas
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
DESMOND YUEN
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
Intel® Software
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"
Hideyuki Kawashima
 
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
Hendrik van Run
 

Similar to Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook (20)

Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
 
Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Slurm @ 2018 LabTech
Slurm @  2018 LabTechSlurm @  2018 LabTech
Slurm @ 2018 LabTech
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
 
AWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runnersAWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runners
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
 
Machine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloMachine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and Accumulo
 
Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache Ignite
 
Introducing PMDK into PostgreSQL
Introducing PMDK into PostgreSQLIntroducing PMDK into PostgreSQL
Introducing PMDK into PostgreSQL
 
Socket Programming with Python
Socket Programming with PythonSocket Programming with Python
Socket Programming with Python
 
Why learn python in 2017?
Why learn python in 2017?Why learn python in 2017?
Why learn python in 2017?
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"
 
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
 

Recently uploaded

[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 

Recently uploaded (20)

[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 

Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

  • 2.
  • 3.
  • 4. Plan 1. Topics ○ Why the end of IT departments will help data-scientists ○ Data-Science empowered by ipython notebooks 2. Use cases ○ Algo trading ○ Clustering visualization ○ Confusion matrix visualization ○ Outlier inspection ○ Session clustering (idstats) ○ Amazing data-science platform: Quantopian
  • 5. QA Just another barrier of entry Reminder: Data Maturity Barriers of entry Levels ML ● Sampling ● Big-Data Level 5 | Level 1 | Level 2 | Level 3 | Level 4
  • 6. The end of IT departments ● Car > 30K ● Gaz+parking = 5k ● max speed = 180 KM/h ● avg speed = 10 km/h ● ROI = 29% ● bike < 1K ● max speed = 45 km/h ● avg speed = 30 km/h ● ROI = 3000% IT department
  • 8. Strategies to get rid of IT department* *If don't cooperate, too slow, have always an excuse -> union approach 1. Bypass them/ignore -> workarounds http://fraka6.blogspot.ca/2014/08/dev-principle-you-should-apply-every.html
  • 9. Strategies to get rid of IT department* *If don't cooperate, too slow, have always an excuse -> union approach 1. Bypass them/ignore -> workarounds 2. Play their game -> Help them hang themselves
  • 10. Strategy: Play the game don't fight 1. Dialogue = explain goals 2. Listen proposal 3. Explain why it's not a good idea if its not 4. Do as they say (don't fight too much) -> Try 5. Evaluate: Failure + cost + lost 3 months 6. Who will be fired?
  • 11. The NLU pipeline virtual assistant Why? ● Measure,Understand and Improve Virtual Assistant User Experience What? ● Measure user experience (task completion), retention, ... ● Understand good/bad user experience -> ○ Speech ○ UX ○ Dialog ○ User ○ Client vs server side ○ Latency….
  • 12. IT layer: R&D hadoop cluster SQL layer of abstraction Hook -> hadoop streaming
  • 13. Data-Science empowered by ipython notebook wsgi Proto to Prod Exploration to Proto
  • 14. IPython Notebook The IPython Notebook is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media, as shown in this example session: ipython notebook http://fraka6.blogspot.ca/2015/04/how-to-create-your-ipython-datascience.html extend to all language
  • 15. notebook - train a algo trading strategy trading/example/run.py (pytrade->pandas + sklearn + theanets + matplotlib)
  • 17. Default queriesPCA, LLE, isomap, t-sne etc. -> mlboost/clustering/visu.py (scikit-learn+matplotlib) http://fraka6.blogspot.ca/2013/04/simplifying-clustering-visualization.html
  • 18. Confusion matrices visu/mlboost/util/sklearn_confusion_matrix.py (sklearn+matplotlib) http://fraka6.blogspot.ca/2013/05/generating-confusion-matrix-great.html
  • 19. Simple way to inspect outliers? mlboost/clustering/visu.py (matplotlib+scipy) http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html
  • 20. How to see session clusters? mlboost/utils/idstats.py (mlboost) http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html
  • 21. The quantopian use case Community+Research->Experiment->deploy
  • 22. The quantopian use case Community+Research->Experiment->deploy
  • 23. The quantopian use case Community+Research->Experiment->deploy
  • 24. The quantopian use case Community+Research->Experiment->deploy
  • 25. The quantopian use case Community+Research->Experiment->deploy
  • 26. The quantopian use case Community+Research->Experiment->deploy
  • 27. Quantopian = leader in data-science platform & fintech revolution Self-disrupt or be disrupted
  • 28. Python as a leverage
  • 29. Conclusion -> disrupt or be disrupted ● IT department = constraint to efficient data-science ○ IT -> business solution but also biggest problem ○ IT departments will die it's not an if but when ○ Last argument = Security ○ Strategy = outsource (amazon) or be inefficient ○ Why they hire old CIO … ○ IPython notebook = efficient exploration ● Follow the lead of quantopian ○ Community+ python(Research->Experiment->deploy) ● To be data-driven, we need data efficiency at any cost