Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC

MLconf
MLconfMLconf
Think Big, Start Smart, Scale Fast
Analytics Communication:
Re-Introducing Complex Models
2
• Director of Data Science at Think Big
• I work in the intersection of statistics and technology
• But also business and analytics
• Too often see data scientists limit themselves and their
businesses
Dan Mallinger
3
1. Importance of Communication
2. Lost Tools of Analytics Communication
3. Tricks for those in Regulated Environments
4. More Communication
Today
4
Not Today
5
• Familiar = Clear
• Clear = Explainable
• Explainable = Understood
• Understood = Trustworthy
“Explainable” Model Fallacy
6
Better Communication Yields…
7
Bad Communication and Black Boxes…
8
Why We Should Care:
We Won’t Waste Money
Alas, not even a 250Gb
server was sufficient:
even after waiting
three days, the data
couldn't even be
loaded. […]
Steve said it would be
difficult for managers to
accept a process that
involved sampling.
9
hlm.html('Test1', test1_score__eoy~test1_score__boy + ...
is_special_ed * perc_free_lunch ...
other_score * support_rec ...
(is_focal | inst_sid), data=kinder)
Technically this is a regression…
So simple anyone can understand it!
Why We Should Care:
You Can’t Explain Your Models Anyway
10
• If your model need to be re-fit every month, it probably has an
eating disorder
• Be a better communicator to yourself
Why We Should Care:
Some of Us Don’t Understand Our Models
11
Meet Bob
12
• Predicting “Membership” (Not really, this is dummy outcome)
• Pick a “black box” model
• Build understanding
Airline Data
13
Danger! Does Your Manager Know What Strata Are?!
Manager Doesn’t Trust Samples?
14
• Easy:
sapply(1:5000, function(i) {
rand.rows <- sample.int(nrow(raw),
size=10000)
df <- raw[rand.rows, c(dep.cols, ind.cols)]
m <- nnet(Member~., data=df, size=10)
})
• Easier:
library(bootstrap)
• Bootstrap!
– Simple, but underused
– Resample data, rebuild models
– Parametric and non-parametric
bootstrapping (bias/variance)
Gist of non-parametric: Do it a
bunch of times, treat results as
distribution for CI
Manager Doesn’t Trust Samples?
15
Stability of Model
16
• Bob has convinced his
manager that his
sampling strategy is
acceptable (Good Job,
Bob!)
• But he hasn’t built trust in
the model
Now What?
17
Bob Doesn’t Explain Variables Like This…
18
• If X matters, then shuffling it should hurt our model
• Then bootstrap for confidence intervals
• Most R models have a method for this (see caret)
Shining a light into the parameters of our black box
Variable Importance
19
Shining a light into the parameters of our black box
Variable Importance: Bob’s Data
20
• Similar to variable importance
• How do relationships in our model play out in different settings?
• How much does our model depend on accurate measurement?
Sensitivity and Robustness
21
Sensitivity and Robustness Example
My code wasn’t working, so thanks to:
https://beckmw.wordpress.com/2013/10/07/sensitivity-analysis-for-neural-networks/
22
More Sensitivity and Robustness
Manual variable permutation in R
library(sensitivity)
23
• Bob’s manager has told him that
black box models are not
allowed
• But Bob’s neural net performed
better than anything else. Oh
dear!
Dang!
24
• Bob’s work in neural nets can be
leveraged!
• Generically: Prototype selection
• Identify points on the decision
boundary to improve model
• Specifically: Extracting decision
trees from neural nets
Blackbox to Whitebox
25
Blackbox to Whitebox: Methodology
“Extracting Decision Trees from Trained Neural Networks” - Krishnan & Bhattacharya
Also: https://github.com/dvro/scikit-protopy
26
• Bob has shown how
variables impact his
black box
• He’s shown how they
behave in different
contexts
• He’s show how robust
they are to errors
• But he hasn’t told us why
we should care
Now What?
27
Accuracy, False Positive Rates, Confusions matrices are CONSTRUCTS
Metrics and Assessment
28
• Enterprises are slow: Predict KPI not KRI
• Give confidence bands, sensitivities, and impact of context changes
• Build a story about the model internals and assumptions; tie to domain
knowledge of audience
• Explainability is up to the modeler, not the model *
• Unless, of course, your regulator says otherwise!
Conclusions
29
We’re hiring!
http://thinkbig.teradata.com
Thanks!
1 of 29

Recommended

Frequency Based Detection Of Task Switches by
Frequency Based Detection Of Task SwitchesFrequency Based Detection Of Task Switches
Frequency Based Detection Of Task Switchesrnair
520 views17 slides
Designing Progressive and Interactive Analytics Processes for High-Dimensiona... by
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...Cagatay Turkay
365 views27 slides
Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL by
Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATLDan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATL
Dan Mallinger – Data Science Practice Manager, Think Big Analytics at MLconf ATLMLconf
2.9K views19 slides
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat... by
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
105 views45 slides
AI in the Real World: Challenges, and Risks and how to handle them? by
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
907 views29 slides
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc... by
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...Sri Ambati
2.6K views45 slides

More Related Content

Similar to Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC

DutchMLSchool. Introduction to Machine Learning with the BigML Platform by
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformDutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformBigML, Inc
320 views55 slides
Andrii Belas "Modern approaches to working with categorical data in machine l... by
Andrii Belas "Modern approaches to working with categorical data in machine l...Andrii Belas "Modern approaches to working with categorical data in machine l...
Andrii Belas "Modern approaches to working with categorical data in machine l...Lviv Startup Club
62 views38 slides
Complexity 2 by
Complexity 2Complexity 2
Complexity 2David Maynard, MBA, PMP
514 views65 slides
DutchMLSchool. Logistic Regression, Deepnets, Time Series by
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesDutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesBigML, Inc
231 views58 slides
Community-Assisted Software Engineering Decision Making by
Community-Assisted Software Engineering Decision MakingCommunity-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision Makinggregoryg
382 views15 slides
Artur Suchwalko “What are common mistakes in Data Science projects and how to... by
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Lviv Startup Club
86 views32 slides

Similar to Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC(20)

DutchMLSchool. Introduction to Machine Learning with the BigML Platform by BigML, Inc
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformDutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
BigML, Inc320 views
Andrii Belas "Modern approaches to working with categorical data in machine l... by Lviv Startup Club
Andrii Belas "Modern approaches to working with categorical data in machine l...Andrii Belas "Modern approaches to working with categorical data in machine l...
Andrii Belas "Modern approaches to working with categorical data in machine l...
DutchMLSchool. Logistic Regression, Deepnets, Time Series by BigML, Inc
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesDutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time Series
BigML, Inc231 views
Community-Assisted Software Engineering Decision Making by gregoryg
Community-Assisted Software Engineering Decision MakingCommunity-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision Making
gregoryg382 views
Artur Suchwalko “What are common mistakes in Data Science projects and how to... by Lviv Startup Club
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
How to start your data career by Adwait Bhave
How to start your data careerHow to start your data career
How to start your data career
Adwait Bhave173 views
Journey of The Connected Enterprise - Knowledge Graphs - Smart Data by Benjamin Nussbaum
Journey of The Connected Enterprise - Knowledge Graphs - Smart DataJourney of The Connected Enterprise - Knowledge Graphs - Smart Data
Journey of The Connected Enterprise - Knowledge Graphs - Smart Data
Benjamin Nussbaum73.6K views
No, you don't need to learn python by QuantUniversity
No, you don't need to learn pythonNo, you don't need to learn python
No, you don't need to learn python
QuantUniversity299 views
Graphs for Ai and ML by Neo4j
Graphs for Ai and MLGraphs for Ai and ML
Graphs for Ai and ML
Neo4j378 views
The Art of Refactoring by drizzlo
The Art of RefactoringThe Art of Refactoring
The Art of Refactoring
drizzlo538 views
Machine Learning Essentials Demystified part1 | Big Data Demystified by Omid Vahdaty
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
Omid Vahdaty636 views
Cleaning Code - Tools and Techniques for Large Legacy Projects by Mike Long
Cleaning Code - Tools and Techniques for Large Legacy ProjectsCleaning Code - Tools and Techniques for Large Legacy Projects
Cleaning Code - Tools and Techniques for Large Legacy Projects
Mike Long7.5K views
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific by Databricks
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher ScientificEnabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Databricks517 views
Three Tools for "Human-in-the-loop" Data Science by Aditya Parameswaran
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data Science
Aditya Parameswaran2.2K views
Using MLOps to Bring ML to Production/The Promise of MLOps by Weaveworks
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks5.4K views
Real World NLP, ML, and Big Data by Devin Bost
Real World NLP, ML, and Big DataReal World NLP, ML, and Big Data
Real World NLP, ML, and Big Data
Devin Bost120 views

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments... by
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
945 views15 slides
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding by
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
634 views49 slides
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re... by
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
535 views18 slides
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush by
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
749 views25 slides
Josh Wills - Data Labeling as Religious Experience by
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
627 views22 slides
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai... by
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
614 views60 slides

More from MLconf(20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments... by MLconf
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
MLconf945 views
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding by MLconf
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf634 views
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re... by MLconf
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf535 views
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush by MLconf
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
MLconf749 views
Josh Wills - Data Labeling as Religious Experience by MLconf
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
MLconf627 views
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai... by MLconf
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
MLconf614 views
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea... by MLconf
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf954 views
Meghana Ravikumar - Optimized Image Classification on the Cheap by MLconf
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
MLconf371 views
Noam Finkelstein - The Importance of Modeling Data Collection by MLconf
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
MLconf304 views
June Andrews - The Uncanny Valley of ML by MLconf
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
MLconf423 views
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks by MLconf
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf451 views
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D... by MLconf
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf444 views
Vito Ostuni - The Voice: New Challenges in a Zero UI World by MLconf
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf303 views
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection... by MLconf
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
MLconf811 views
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip... by MLconf
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
MLconf573 views
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o... by MLconf
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
MLconf650 views
Neel Sundaresan - Teaching a machine to code by MLconf
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
MLconf1K views
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl... by MLconf
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf4K views
Soumith Chintala - Increasing the Impact of AI Through Better Software by MLconf
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf646 views
Roy Lowrance - Predicting Bond Prices: Regime Changes by MLconf
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
MLconf426 views

Recently uploaded

GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...James Anderson
85 views32 slides
Zero to Automated in Under a Year by
Zero to Automated in Under a YearZero to Automated in Under a Year
Zero to Automated in Under a YearNetwork Automation Forum
15 views23 slides
Uni Systems for Power Platform.pptx by
Uni Systems for Power Platform.pptxUni Systems for Power Platform.pptx
Uni Systems for Power Platform.pptxUni Systems S.M.S.A.
56 views21 slides
SAP Automation Using Bar Code and FIORI.pdf by
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdfVirendra Rai, PMP
23 views38 slides
Melek BEN MAHMOUD.pdf by
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
14 views1 slide
Special_edition_innovator_2023.pdf by
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdfWillDavies22
17 views6 slides

Recently uploaded(20)

GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson85 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2217 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software263 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10248 views
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada127 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma39 views

Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC

  • 1. Think Big, Start Smart, Scale Fast Analytics Communication: Re-Introducing Complex Models
  • 2. 2 • Director of Data Science at Think Big • I work in the intersection of statistics and technology • But also business and analytics • Too often see data scientists limit themselves and their businesses Dan Mallinger
  • 3. 3 1. Importance of Communication 2. Lost Tools of Analytics Communication 3. Tricks for those in Regulated Environments 4. More Communication Today
  • 5. 5 • Familiar = Clear • Clear = Explainable • Explainable = Understood • Understood = Trustworthy “Explainable” Model Fallacy
  • 7. 7 Bad Communication and Black Boxes…
  • 8. 8 Why We Should Care: We Won’t Waste Money Alas, not even a 250Gb server was sufficient: even after waiting three days, the data couldn't even be loaded. […] Steve said it would be difficult for managers to accept a process that involved sampling.
  • 9. 9 hlm.html('Test1', test1_score__eoy~test1_score__boy + ... is_special_ed * perc_free_lunch ... other_score * support_rec ... (is_focal | inst_sid), data=kinder) Technically this is a regression… So simple anyone can understand it! Why We Should Care: You Can’t Explain Your Models Anyway
  • 10. 10 • If your model need to be re-fit every month, it probably has an eating disorder • Be a better communicator to yourself Why We Should Care: Some of Us Don’t Understand Our Models
  • 12. 12 • Predicting “Membership” (Not really, this is dummy outcome) • Pick a “black box” model • Build understanding Airline Data
  • 13. 13 Danger! Does Your Manager Know What Strata Are?! Manager Doesn’t Trust Samples?
  • 14. 14 • Easy: sapply(1:5000, function(i) { rand.rows <- sample.int(nrow(raw), size=10000) df <- raw[rand.rows, c(dep.cols, ind.cols)] m <- nnet(Member~., data=df, size=10) }) • Easier: library(bootstrap) • Bootstrap! – Simple, but underused – Resample data, rebuild models – Parametric and non-parametric bootstrapping (bias/variance) Gist of non-parametric: Do it a bunch of times, treat results as distribution for CI Manager Doesn’t Trust Samples?
  • 16. 16 • Bob has convinced his manager that his sampling strategy is acceptable (Good Job, Bob!) • But he hasn’t built trust in the model Now What?
  • 17. 17 Bob Doesn’t Explain Variables Like This…
  • 18. 18 • If X matters, then shuffling it should hurt our model • Then bootstrap for confidence intervals • Most R models have a method for this (see caret) Shining a light into the parameters of our black box Variable Importance
  • 19. 19 Shining a light into the parameters of our black box Variable Importance: Bob’s Data
  • 20. 20 • Similar to variable importance • How do relationships in our model play out in different settings? • How much does our model depend on accurate measurement? Sensitivity and Robustness
  • 21. 21 Sensitivity and Robustness Example My code wasn’t working, so thanks to: https://beckmw.wordpress.com/2013/10/07/sensitivity-analysis-for-neural-networks/
  • 22. 22 More Sensitivity and Robustness Manual variable permutation in R library(sensitivity)
  • 23. 23 • Bob’s manager has told him that black box models are not allowed • But Bob’s neural net performed better than anything else. Oh dear! Dang!
  • 24. 24 • Bob’s work in neural nets can be leveraged! • Generically: Prototype selection • Identify points on the decision boundary to improve model • Specifically: Extracting decision trees from neural nets Blackbox to Whitebox
  • 25. 25 Blackbox to Whitebox: Methodology “Extracting Decision Trees from Trained Neural Networks” - Krishnan & Bhattacharya Also: https://github.com/dvro/scikit-protopy
  • 26. 26 • Bob has shown how variables impact his black box • He’s shown how they behave in different contexts • He’s show how robust they are to errors • But he hasn’t told us why we should care Now What?
  • 27. 27 Accuracy, False Positive Rates, Confusions matrices are CONSTRUCTS Metrics and Assessment
  • 28. 28 • Enterprises are slow: Predict KPI not KRI • Give confidence bands, sensitivities, and impact of context changes • Build a story about the model internals and assumptions; tie to domain knowledge of audience • Explainability is up to the modeler, not the model * • Unless, of course, your regulator says otherwise! Conclusions