SlideShare a Scribd company logo
Think Big, Start Smart, Scale Fast
Analytics Communication:
Re-Introducing Complex Models
2
• Director of Data Science at Think Big
• I work in the intersection of statistics and technology
• But also business and analytics
• Too often see data scientists limit themselves and their
businesses
Dan Mallinger
3
1. Importance of Communication
2. Lost Tools of Analytics Communication
3. Tricks for those in Regulated Environments
4. More Communication
Today
4
Not Today
5
• Familiar = Clear
• Clear = Explainable
• Explainable = Understood
• Understood = Trustworthy
“Explainable” Model Fallacy
6
Better Communication Yields…
7
Bad Communication and Black Boxes…
8
Why We Should Care:
We Won’t Waste Money
Alas, not even a 250Gb
server was sufficient:
even after waiting
three days, the data
couldn't even be
loaded. […]
Steve said it would be
difficult for managers to
accept a process that
involved sampling.
9
hlm.html('Test1', test1_score__eoy~test1_score__boy + ...
is_special_ed * perc_free_lunch ...
other_score * support_rec ...
(is_focal | inst_sid), data=kinder)
Technically this is a regression…
So simple anyone can understand it!
Why We Should Care:
You Can’t Explain Your Models Anyway
10
• If your model need to be re-fit every month, it probably has an
eating disorder
• Be a better communicator to yourself
Why We Should Care:
Some of Us Don’t Understand Our Models
11
Meet Bob
12
• Predicting “Membership” (Not really, this is dummy outcome)
• Pick a “black box” model
• Build understanding
Airline Data
13
Danger! Does Your Manager Know What Strata Are?!
Manager Doesn’t Trust Samples?
14
• Easy:
sapply(1:5000, function(i) {
rand.rows <- sample.int(nrow(raw),
size=10000)
df <- raw[rand.rows, c(dep.cols, ind.cols)]
m <- nnet(Member~., data=df, size=10)
})
• Easier:
library(bootstrap)
• Bootstrap!
– Simple, but underused
– Resample data, rebuild models
– Parametric and non-parametric
bootstrapping (bias/variance)
Gist of non-parametric: Do it a
bunch of times, treat results as
distribution for CI
Manager Doesn’t Trust Samples?
15
Stability of Model
16
• Bob has convinced his
manager that his
sampling strategy is
acceptable (Good Job,
Bob!)
• But he hasn’t built trust in
the model
Now What?
17
Bob Doesn’t Explain Variables Like This…
18
• If X matters, then shuffling it should hurt our model
• Then bootstrap for confidence intervals
• Most R models have a method for this (see caret)
Shining a light into the parameters of our black box
Variable Importance
19
Shining a light into the parameters of our black box
Variable Importance: Bob’s Data
20
• Similar to variable importance
• How do relationships in our model play out in different settings?
• How much does our model depend on accurate measurement?
Sensitivity and Robustness
21
Sensitivity and Robustness Example
My code wasn’t working, so thanks to:
https://beckmw.wordpress.com/2013/10/07/sensitivity-analysis-for-neural-networks/
22
More Sensitivity and Robustness
Manual variable permutation in R
library(sensitivity)
23
• Bob’s manager has told him that
black box models are not
allowed
• But Bob’s neural net performed
better than anything else. Oh
dear!
Dang!
24
• Bob’s work in neural nets can be
leveraged!
• Generically: Prototype selection
• Identify points on the decision
boundary to improve model
• Specifically: Extracting decision
trees from neural nets
Blackbox to Whitebox
25
Blackbox to Whitebox: Methodology
“Extracting Decision Trees from Trained Neural Networks” - Krishnan & Bhattacharya
Also: https://github.com/dvro/scikit-protopy
26
• Bob has shown how
variables impact his
black box
• He’s shown how they
behave in different
contexts
• He’s show how robust
they are to errors
• But he hasn’t told us why
we should care
Now What?
27
Accuracy, False Positive Rates, Confusions matrices are CONSTRUCTS
Metrics and Assessment
28
• Enterprises are slow: Predict KPI not KRI
• Give confidence bands, sensitivities, and impact of context changes
• Build a story about the model internals and assumptions; tie to domain
knowledge of audience
• Explainability is up to the modeler, not the model *
• Unless, of course, your regulator says otherwise!
Conclusions
29
We’re hiring!
http://thinkbig.teradata.com
Thanks!

More Related Content

Similar to Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC

DutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformDutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
BigML, Inc
 
Andrii Belas "Modern approaches to working with categorical data in machine l...
Andrii Belas "Modern approaches to working with categorical data in machine l...Andrii Belas "Modern approaches to working with categorical data in machine l...
Andrii Belas "Modern approaches to working with categorical data in machine l...
Lviv Startup Club
 
Complexity 2
Complexity 2Complexity 2
DutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesDutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time Series
BigML, Inc
 
Community-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision MakingCommunity-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision Making
gregoryg
 
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Lviv Startup Club
 
From DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transitionFrom DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transition
Anne-Marie Tousch
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
Adwait Bhave
 
Journey of The Connected Enterprise - Knowledge Graphs - Smart Data
Journey of The Connected Enterprise - Knowledge Graphs - Smart DataJourney of The Connected Enterprise - Knowledge Graphs - Smart Data
Journey of The Connected Enterprise - Knowledge Graphs - Smart Data
Benjamin Nussbaum
 
No, you don't need to learn python
No, you don't need to learn pythonNo, you don't need to learn python
No, you don't need to learn python
QuantUniversity
 
Graphs for Ai and ML
Graphs for Ai and MLGraphs for Ai and ML
Graphs for Ai and ML
Neo4j
 
Traits of a Good Engineer
Traits of a Good EngineerTraits of a Good Engineer
Traits of a Good Engineer
Rajagopal Nagarajan
 
The Art of Refactoring
The Art of RefactoringThe Art of Refactoring
The Art of Refactoring
drizzlo
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
Omid Vahdaty
 
Cleaning Code - Tools and Techniques for Large Legacy Projects
Cleaning Code - Tools and Techniques for Large Legacy ProjectsCleaning Code - Tools and Techniques for Large Legacy Projects
Cleaning Code - Tools and Techniques for Large Legacy Projects
Mike Long
 
How Will Your ML Project Fail
How Will Your ML Project FailHow Will Your ML Project Fail
How Will Your ML Project Fail
Elena Samuylova
 
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher ScientificEnabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Databricks
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data Science
Aditya Parameswaran
 
李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning
台灣資料科學年會
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 

Similar to Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC (20)

DutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformDutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
 
Andrii Belas "Modern approaches to working with categorical data in machine l...
Andrii Belas "Modern approaches to working with categorical data in machine l...Andrii Belas "Modern approaches to working with categorical data in machine l...
Andrii Belas "Modern approaches to working with categorical data in machine l...
 
Complexity 2
Complexity 2Complexity 2
Complexity 2
 
DutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesDutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time Series
 
Community-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision MakingCommunity-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision Making
 
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...Artur Suchwalko “What are common mistakes in Data Science projects and how to...
Artur Suchwalko “What are common mistakes in Data Science projects and how to...
 
From DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transitionFrom DevOps to MLOps: practical steps for a smooth transition
From DevOps to MLOps: practical steps for a smooth transition
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
Journey of The Connected Enterprise - Knowledge Graphs - Smart Data
Journey of The Connected Enterprise - Knowledge Graphs - Smart DataJourney of The Connected Enterprise - Knowledge Graphs - Smart Data
Journey of The Connected Enterprise - Knowledge Graphs - Smart Data
 
No, you don't need to learn python
No, you don't need to learn pythonNo, you don't need to learn python
No, you don't need to learn python
 
Graphs for Ai and ML
Graphs for Ai and MLGraphs for Ai and ML
Graphs for Ai and ML
 
Traits of a Good Engineer
Traits of a Good EngineerTraits of a Good Engineer
Traits of a Good Engineer
 
The Art of Refactoring
The Art of RefactoringThe Art of Refactoring
The Art of Refactoring
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
 
Cleaning Code - Tools and Techniques for Large Legacy Projects
Cleaning Code - Tools and Techniques for Large Legacy ProjectsCleaning Code - Tools and Techniques for Large Legacy Projects
Cleaning Code - Tools and Techniques for Large Legacy Projects
 
How Will Your ML Project Fail
How Will Your ML Project FailHow Will Your ML Project Fail
How Will Your ML Project Fail
 
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher ScientificEnabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data Science
 
李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
MLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
MLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
MLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
MLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
MLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
MLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
MLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 

Recently uploaded (20)

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 

Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC

  • 1. Think Big, Start Smart, Scale Fast Analytics Communication: Re-Introducing Complex Models
  • 2. 2 • Director of Data Science at Think Big • I work in the intersection of statistics and technology • But also business and analytics • Too often see data scientists limit themselves and their businesses Dan Mallinger
  • 3. 3 1. Importance of Communication 2. Lost Tools of Analytics Communication 3. Tricks for those in Regulated Environments 4. More Communication Today
  • 5. 5 • Familiar = Clear • Clear = Explainable • Explainable = Understood • Understood = Trustworthy “Explainable” Model Fallacy
  • 7. 7 Bad Communication and Black Boxes…
  • 8. 8 Why We Should Care: We Won’t Waste Money Alas, not even a 250Gb server was sufficient: even after waiting three days, the data couldn't even be loaded. […] Steve said it would be difficult for managers to accept a process that involved sampling.
  • 9. 9 hlm.html('Test1', test1_score__eoy~test1_score__boy + ... is_special_ed * perc_free_lunch ... other_score * support_rec ... (is_focal | inst_sid), data=kinder) Technically this is a regression… So simple anyone can understand it! Why We Should Care: You Can’t Explain Your Models Anyway
  • 10. 10 • If your model need to be re-fit every month, it probably has an eating disorder • Be a better communicator to yourself Why We Should Care: Some of Us Don’t Understand Our Models
  • 12. 12 • Predicting “Membership” (Not really, this is dummy outcome) • Pick a “black box” model • Build understanding Airline Data
  • 13. 13 Danger! Does Your Manager Know What Strata Are?! Manager Doesn’t Trust Samples?
  • 14. 14 • Easy: sapply(1:5000, function(i) { rand.rows <- sample.int(nrow(raw), size=10000) df <- raw[rand.rows, c(dep.cols, ind.cols)] m <- nnet(Member~., data=df, size=10) }) • Easier: library(bootstrap) • Bootstrap! – Simple, but underused – Resample data, rebuild models – Parametric and non-parametric bootstrapping (bias/variance) Gist of non-parametric: Do it a bunch of times, treat results as distribution for CI Manager Doesn’t Trust Samples?
  • 16. 16 • Bob has convinced his manager that his sampling strategy is acceptable (Good Job, Bob!) • But he hasn’t built trust in the model Now What?
  • 17. 17 Bob Doesn’t Explain Variables Like This…
  • 18. 18 • If X matters, then shuffling it should hurt our model • Then bootstrap for confidence intervals • Most R models have a method for this (see caret) Shining a light into the parameters of our black box Variable Importance
  • 19. 19 Shining a light into the parameters of our black box Variable Importance: Bob’s Data
  • 20. 20 • Similar to variable importance • How do relationships in our model play out in different settings? • How much does our model depend on accurate measurement? Sensitivity and Robustness
  • 21. 21 Sensitivity and Robustness Example My code wasn’t working, so thanks to: https://beckmw.wordpress.com/2013/10/07/sensitivity-analysis-for-neural-networks/
  • 22. 22 More Sensitivity and Robustness Manual variable permutation in R library(sensitivity)
  • 23. 23 • Bob’s manager has told him that black box models are not allowed • But Bob’s neural net performed better than anything else. Oh dear! Dang!
  • 24. 24 • Bob’s work in neural nets can be leveraged! • Generically: Prototype selection • Identify points on the decision boundary to improve model • Specifically: Extracting decision trees from neural nets Blackbox to Whitebox
  • 25. 25 Blackbox to Whitebox: Methodology “Extracting Decision Trees from Trained Neural Networks” - Krishnan & Bhattacharya Also: https://github.com/dvro/scikit-protopy
  • 26. 26 • Bob has shown how variables impact his black box • He’s shown how they behave in different contexts • He’s show how robust they are to errors • But he hasn’t told us why we should care Now What?
  • 27. 27 Accuracy, False Positive Rates, Confusions matrices are CONSTRUCTS Metrics and Assessment
  • 28. 28 • Enterprises are slow: Predict KPI not KRI • Give confidence bands, sensitivities, and impact of context changes • Build a story about the model internals and assumptions; tie to domain knowledge of audience • Explainability is up to the modeler, not the model * • Unless, of course, your regulator says otherwise! Conclusions