SlideShare a Scribd company logo
1 of 30
Download to read offline
francis@qmining.com
Plan
1. Topics
○ Why the end of IT departments will help data-scientists
○ Data-Science empowered by ipython notebooks
2. Use cases
○ Algo trading
○ Clustering visualization
○ Confusion matrix visualization
○ Outlier inspection
○ Session clustering (idstats)
○ Amazing data-science platform: Quantopian
QA
Just another barrier of entry
Reminder: Data Maturity
Barriers of entry Levels
ML ● Sampling
● Big-Data
Level 5 | Level 1 | Level 2 | Level 3 | Level 4
The end of IT departments
● Car > 30K
● Gaz+parking = 5k
● max speed = 180 KM/h
● avg speed = 10 km/h
● ROI = 29%
● bike < 1K
● max speed = 45 km/h
● avg speed = 30 km/h
● ROI = 3000%
IT department
IT department only argument
Strategies to get rid of IT
department*
*If don't cooperate, too slow, have always an excuse
-> union approach
1. Bypass them/ignore -> workarounds
http://fraka6.blogspot.ca/2014/08/dev-principle-you-should-apply-every.html
Strategies to get rid of IT
department*
*If don't cooperate, too slow, have always an excuse
-> union approach
1. Bypass them/ignore -> workarounds
2. Play their game -> Help them hang themselves
Strategy: Play the game
don't fight
1. Dialogue = explain goals
2. Listen proposal
3. Explain why it's not a good idea if its not
4. Do as they say (don't fight too much) -> Try
5. Evaluate: Failure + cost + lost 3 months
6. Who will be fired?
The NLU pipeline
virtual assistant
Why?
● Measure,Understand and Improve Virtual Assistant User Experience
What?
● Measure user experience (task completion), retention, ...
● Understand good/bad user experience ->
○ Speech
○ UX
○ Dialog
○ User
○ Client vs server side
○ Latency….
IT layer: R&D hadoop cluster
SQL layer of abstraction
Hook -> hadoop streaming
Data-Science empowered by ipython notebook
wsgi
Proto to Prod
Exploration to Proto
IPython Notebook
The IPython Notebook is an interactive computational environment, in which you can combine code execution, rich
text, mathematics, plots and rich media, as shown in this example session:
ipython notebook
http://fraka6.blogspot.ca/2015/04/how-to-create-your-ipython-datascience.html
extend to all
language
notebook - train a algo trading strategy
trading/example/run.py (pytrade->pandas + sklearn + theanets + matplotlib)
notebook integration in git
Default queriesPCA, LLE, isomap, t-sne etc. -> mlboost/clustering/visu.py (scikit-learn+matplotlib)
http://fraka6.blogspot.ca/2013/04/simplifying-clustering-visualization.html
Confusion matrices visu/mlboost/util/sklearn_confusion_matrix.py (sklearn+matplotlib)
http://fraka6.blogspot.ca/2013/05/generating-confusion-matrix-great.html
Simple way to inspect outliers?
mlboost/clustering/visu.py (matplotlib+scipy)
http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html
How to see session clusters?
mlboost/utils/idstats.py (mlboost)
http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html
The quantopian use case
Community+Research->Experiment->deploy
The quantopian use case
Community+Research->Experiment->deploy
The quantopian use case
Community+Research->Experiment->deploy
The quantopian use case
Community+Research->Experiment->deploy
The quantopian use case
Community+Research->Experiment->deploy
The quantopian use case
Community+Research->Experiment->deploy
Quantopian = leader in data-science
platform & fintech revolution
Self-disrupt or
be disrupted
Python as a leverage
Conclusion -> disrupt or be disrupted
● IT department = constraint to efficient data-science
○ IT -> business solution but also biggest problem
○ IT departments will die it's not an if but when
○ Last argument = Security
○ Strategy = outsource (amazon) or be inefficient
○ Why they hire old CIO …
○ IPython notebook = efficient exploration
● Follow the lead of quantopian
○ Community+ python(Research->Experiment->deploy)
● To be data-driven, we need data efficiency at any cost
francis@qmining.com
hum...

More Related Content

Similar to Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonInsuk (Chris) Cho
 
Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023Chris Grundemann
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
Slurm @ 2018 LabTech
Slurm @  2018 LabTechSlurm @  2018 LabTech
Slurm @ 2018 LabTechTin Ho
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkDatabricks
 
AWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runnersAWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runnersAnthony Scata
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Tal Bar-Zvi
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsGreg Landrum
 
Machine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloMachine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloRahul Singh
 
Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteDenis Magda
 
Socket Programming with Python
Socket Programming with PythonSocket Programming with Python
Socket Programming with PythonGLC Networks
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production EnvironmentsIntel® Software
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"Hideyuki Kawashima
 
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...Hendrik van Run
 

Similar to Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook (20)

Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
 
Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Slurm @ 2018 LabTech
Slurm @  2018 LabTechSlurm @  2018 LabTech
Slurm @ 2018 LabTech
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
 
AWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runnersAWS Techniques and lessons writing low cost autoscaling GitLab runners
AWS Techniques and lessons writing low cost autoscaling GitLab runners
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
 
Machine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and AccumuloMachine Learning & Graph Processing w/ Spark and Accumulo
Machine Learning & Graph Processing w/ Spark and Accumulo
 
Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache Ignite
 
Introducing PMDK into PostgreSQL
Introducing PMDK into PostgreSQLIntroducing PMDK into PostgreSQL
Introducing PMDK into PostgreSQL
 
Socket Programming with Python
Socket Programming with PythonSocket Programming with Python
Socket Programming with Python
 
Why learn python in 2017?
Why learn python in 2017?Why learn python in 2017?
Why learn python in 2017?
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
 
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"
 
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
PAD-3126 - Evolving the DevOps Organization around IBM PureApplication System...
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 

Recently uploaded (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 

Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

  • 2.
  • 3.
  • 4. Plan 1. Topics ○ Why the end of IT departments will help data-scientists ○ Data-Science empowered by ipython notebooks 2. Use cases ○ Algo trading ○ Clustering visualization ○ Confusion matrix visualization ○ Outlier inspection ○ Session clustering (idstats) ○ Amazing data-science platform: Quantopian
  • 5. QA Just another barrier of entry Reminder: Data Maturity Barriers of entry Levels ML ● Sampling ● Big-Data Level 5 | Level 1 | Level 2 | Level 3 | Level 4
  • 6. The end of IT departments ● Car > 30K ● Gaz+parking = 5k ● max speed = 180 KM/h ● avg speed = 10 km/h ● ROI = 29% ● bike < 1K ● max speed = 45 km/h ● avg speed = 30 km/h ● ROI = 3000% IT department
  • 8. Strategies to get rid of IT department* *If don't cooperate, too slow, have always an excuse -> union approach 1. Bypass them/ignore -> workarounds http://fraka6.blogspot.ca/2014/08/dev-principle-you-should-apply-every.html
  • 9. Strategies to get rid of IT department* *If don't cooperate, too slow, have always an excuse -> union approach 1. Bypass them/ignore -> workarounds 2. Play their game -> Help them hang themselves
  • 10. Strategy: Play the game don't fight 1. Dialogue = explain goals 2. Listen proposal 3. Explain why it's not a good idea if its not 4. Do as they say (don't fight too much) -> Try 5. Evaluate: Failure + cost + lost 3 months 6. Who will be fired?
  • 11. The NLU pipeline virtual assistant Why? ● Measure,Understand and Improve Virtual Assistant User Experience What? ● Measure user experience (task completion), retention, ... ● Understand good/bad user experience -> ○ Speech ○ UX ○ Dialog ○ User ○ Client vs server side ○ Latency….
  • 12. IT layer: R&D hadoop cluster SQL layer of abstraction Hook -> hadoop streaming
  • 13. Data-Science empowered by ipython notebook wsgi Proto to Prod Exploration to Proto
  • 14. IPython Notebook The IPython Notebook is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media, as shown in this example session: ipython notebook http://fraka6.blogspot.ca/2015/04/how-to-create-your-ipython-datascience.html extend to all language
  • 15. notebook - train a algo trading strategy trading/example/run.py (pytrade->pandas + sklearn + theanets + matplotlib)
  • 17. Default queriesPCA, LLE, isomap, t-sne etc. -> mlboost/clustering/visu.py (scikit-learn+matplotlib) http://fraka6.blogspot.ca/2013/04/simplifying-clustering-visualization.html
  • 18. Confusion matrices visu/mlboost/util/sklearn_confusion_matrix.py (sklearn+matplotlib) http://fraka6.blogspot.ca/2013/05/generating-confusion-matrix-great.html
  • 19. Simple way to inspect outliers? mlboost/clustering/visu.py (matplotlib+scipy) http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html
  • 20. How to see session clusters? mlboost/utils/idstats.py (mlboost) http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html
  • 21. The quantopian use case Community+Research->Experiment->deploy
  • 22. The quantopian use case Community+Research->Experiment->deploy
  • 23. The quantopian use case Community+Research->Experiment->deploy
  • 24. The quantopian use case Community+Research->Experiment->deploy
  • 25. The quantopian use case Community+Research->Experiment->deploy
  • 26. The quantopian use case Community+Research->Experiment->deploy
  • 27. Quantopian = leader in data-science platform & fintech revolution Self-disrupt or be disrupted
  • 28. Python as a leverage
  • 29. Conclusion -> disrupt or be disrupted ● IT department = constraint to efficient data-science ○ IT -> business solution but also biggest problem ○ IT departments will die it's not an if but when ○ Last argument = Security ○ Strategy = outsource (amazon) or be inefficient ○ Why they hire old CIO … ○ IPython notebook = efficient exploration ● Follow the lead of quantopian ○ Community+ python(Research->Experiment->deploy) ● To be data-driven, we need data efficiency at any cost