SlideShare a Scribd company logo
1 of 24
AI Next Conference: 7/24/2019
Machine Learning
Automated Data Visualization
Ram Seshadri
July 2019 Slide 1
and
AI Next Conference: 7/24/2019AI Next Conference: 7/24/2019
“Machine learning teams are still struggling to take advantage of ML
due to challenges with inflexible frameworks, lack of reproducibility,
collaboration issues, and immature software tools”
Cecelia Shao
Comet.ml
“Why is my Data Science team taking
sooo long to complete a simple project?”
-- A Frustrated CIO
Slide 2
Machine learning teams are still struggling to take advantage of ML due to
challenges with inflexible frameworks, lack of reproducibility, collaboration
issues, and immature software tools.
The Answer?
AI Next Conference: 7/24/2019
Faster
Visualization
Automatic
Feature
Selection
• Auto_ViML
Automatic
Model Selection
and Tuning
• Auto_ViML
One Click Model
Serving and
Production
Auto_ViML was designed along with AutoViz to Build Variant
Interpretable Machine Learning Models Fast!
__
● They are proprietary and expensive (lock-in)
● Black Boxes which are too complex to interpret
● Very little reproducibility outside of tool
HOWEVER CURRENT TOOLS ARE LIMITED BECAUSE...
• AutoViz
INTRODUCING A SIMPLER APPROACH TO AUTO-ML
Slide 3
How can we make DATA SCIENTISTS more productive?
AI Next Conference: 7/24/2019
AI Next Conference: 7/24/2019
●Open Source Tools for Faster Time to Insights with Design Goals as:
○Simple: Invoke them with a single Line of Code (each)
○Flexible: Suited to any kind of structured data set with no Prep required
○Incremental: Can be used by anyone from beginners to experts alike
○Experimental: Compare multiple visualization methods and models step by step
○Interpretable: get clear explanation of steps taken with validation graphs
○Reproducible: No Black Box. Reproducible model pipelines and outputs
○Extensible: Open Source with contributions from Python and DS community
I Built AutoViz and Auto_ViML to make my own life easier.
Hope it will do the same for you.
Slide 4
What is Auto_Viz and Auto_ViML?
AI Next Conference: 7/24/2019
What is AutoViz?
Slide 5
AutoViz enables you to automatically
visualize any data set with a Single Line of
Code. It automatically:
1. Selects a Random Sample from the Data
Set (if the Data Set is very large)
2. Selects most important features using
ML (if Number of Variables is very large)
3. Selects Best Methods to Visualize Data
for a given problem
4. Provides Charts to be saved in PNG,
JPG, and SVG Formats
OVERVIEW
AI Next Conference: 7/24/2019
Why AutoViz?
Slide 6
Help explain your hypotheses and variable selection better to others
BENEFITS
Systematic Look for insights systematically rather than through “gut instinct” or
domain knowledge
Simple Reduce features to the most important ones to deliver simple yet
powerful insights
Explainable
AI Next Conference: 7/24/2019
How AutoViz Works
Slide 7
Variable
Classification
Problem
Identification
Complex
Interactions
AutoViz classifies features into
highly granular data types to
determine how best to
represent them in Charts
AutoViz can visualize any
dataset for a given target:
Regression, Classification, Time
Series, Clustering and more
Most charts involve more than
one variable helping to deliver
powerful insights with minimal
effort
Select the Most
Important Features
Select the Best
Charts
Deliver them Fast!
AutoViz uses the powerful ML
algorithm, XGBoost, to select
important features given the
target variable
AutoViz selects the best ways
to visualize your data to extract
insights from your data
AutoViz selects statistically
valid sample data to visualize
(in case data set is very large)
Design Goals
Implementation
AutoViz PROCESS
AI Next Conference: 7/24/2019
Github: https://github.com/AutoViML/AutoViz
AutoViz: Boston Housing*
import AutoViz_Class as AV
AVC = AV.AutoViz_Class()
Just Import...
And Run AutoViz.
dft = AVC.AutoViz('', sep, target, df,lowess=True)
Results...
Slide 8
Thanks to UCI Machine Learning Repository for all data sets in this presentation:
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA:
University of California, School of Information and Computer Science.
AI Next Conference: 7/24/2019
AutoViz Example: Housing*
● Number of Rooms and Median Value of
Homes seem to be highly correlated
● As Age of Building increases, Median
Value decreases albeit slowly
INSIGHTS
● NOX and DIS seem to be highly
correlated though they seem
to have a polynomial or non-
linear relationship
Slide 9* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
AI Next Conference: 7/24/2019
AutoViz Example: Housing
● Both CRIM and ZN are highly skewed
● Both may require a transformation
INSIGHTS
● PTRATIO and DIS seem to be
somewhat skewed as well but
don’t require transformations
Slide 10* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
AI Next Conference: 7/24/2019
AutoViz Example: Boston
RM, LSTAT, TAX, INDUS, AGE, and CRIM seem to be
decently correlated with Target. May be worth
exploring if they come up as Important Features.
INSIGHTS
● Average Median Value of
homes varies widely by CHAS
and RAD. Hence would be
important features in any
model.
Slide 11* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
AI Next Conference: 7/24/2019
How to Build a better model?
Slide 13
Remove Low Information
and Redundant Features
Add Polynomial and
Interaction, Other
Features
Select Models from
Simple to Complex and
Perform Tuning
Add Entropy Binning,
Stacking to K-Means
Featurizers to model
Add Imbalanced sampling
and training
Perform Ensembling of
Multiple Types of models
BUILD A ViML Model!
(VARIANT INTERPRETABLE MACHINE LEARNING MODEL, Step by Step)
PROCESS
AI Next Conference: 7/24/2019
Why Auto_ViML?
Slide 15
MULTIPLE MODELS
TRANSPARENCY
FEATURE
ENGINEERING
AUTOMATIC
FEATURE
SELECTION
SYSTEMATIC Auto ViML was designed from the ground-up to mimic how a Data
Scientist would approach a Modeling Problem.
Enables selective model complexity by adding features and
complexity step by step
Provides Deep Insights into the Data Set with Full Transparency
Models with Fewer Features result in Simpler Models. Auto_ViML
Produces models with 10-90% Fewer Features than Regular Models
without Significant Loss of Predictive Power*
* Based on my experience. Your results may vary.
Build and test multiple models thru’ Hyper Tuning and Cross Validation
BENEFITS
AI Next Conference: 7/24/2019
Auto_ViML LETS YOU TRY MULTIPLE APPROACHES
Slide 16
You can access all the powerful features of with one line of Python Code after you import.
You can turn on and turn off features and flags to see how they impact Model.
TRY
MULTIPLE
APPROACHES
TO GET THE
BEST MODEL
INTERACTIONS
vs. NO
INTERACTIONS
BOOSTING
vs. BAGGING
ENSEMBLING
vs. STACKING
IMBALANCED
vs.
BALANCED
GRIDSEARCH
vs. RANDOM
SHAP vs.
FEATURE
IMPORTANCES
Just like a Data Scientist
would...
AI Next Conference: 7/24/2019
Github: https://github.com/AutoViML/Auto_ViML
Auto_ViML: Boston Housing
from Auto_ViML import Auto_ViML
Just Import...
model, features, trainm, testm = Auto_ViML(train,
target, test,
sample_submission='',
hyper_param='GS',
scoring_parameter='f1',
Boosting_Flag=None,
KMeans_Featurizer=False,
Add_Poly=0,
Stacking_Flag=False,
Binning_Flag=False,
Imbalanced_Flag=True,
verbose=0)
And Run Auto_VIML.
Slide 17
Get Model, Features and
transformed Train and
Test data...
Thanks to UCI Machine Learning Repository for all data sets in this presentation:
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA:
University of California, School of Information and Computer Science.
AI Next Conference: 7/24/2019
Here is an example of a Regression data set: Boston
Housing*. There are 13 predictors in the dataset.
But Auto_ViML finds that only 10 variables are needed
to get the job done. Also Watch the Feature
Importances.
Auto_ViML: Boston Housing*
Slide 18
DATA SET SIZE 506 x
14
TIME TAKEN
6 secs
Variables Selected
10
FEATURE REDUCTION
24%
Results:
Start with Linear Model
* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
AI Next Conference: 7/24/2019
Auto_ViML: Boston Housing
Slide 19
Results:
Move to Random Forests
Time Taken = 30 seconds
Thanks to UCI Machine Learning Repository for all data sets in this presentation:
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA:
University of California, School of Information and Computer Science.
AI Next Conference: 7/24/2019
Auto_ViML: Boston Housing
Slide 20
Results:
Close with XGBoost
* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
Slide 21
AI Next Conference: 7/24/2019
Auto_ViML: Boston Housing
Slide 22
Linear Model with Interaction Variables
Ensemble Model with Binning
Forests Model with Binning Numerics
XGBoost Model with Stacking
Multiple Models
* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
AI Next Conference: 7/24/2019 Slide 23
AI Next Conference: 7/24/2019
Auto_ViML: Wisconsin Breast Cancer
Slide 24
DATA SET SIZE
512 x 32
TIME TAKEN
12 Secs
The Wisconsin Breast Cancer* data set is a classic
Data Set: Auto_ViML took 12 Seconds to find the
best features and best model with Weighted F1
score of 100% on validation set using Linear model
Wisconsin Breast Cancer Data Set
FEATURE REDUCTION
52%
Macro Average ROC AUC
100%
Results:
Compare the results
to another model
using Deep Learning
and Keras
Link
“Hyperparameter
Optimization with
Keras” by Mikko
* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
AI Next Conference: 7/24/2019AI Next Conference: 7/24/2019
●What’s Missing / Could be Improved:
○No Feature Engineering: You can create your own or use kits like featuretools, etc.
○No Image/Video/NLP Support: At the moment, it removes these features from model considerations
○No Time Series modeling: Auto_TimeSeries is in the works. Stay Tuned.
○No Neural Networks or Deep Learning: You can add your own modules or use tools like Ludwig
○Model serving: Adding a module for test data transformation necessary
Slide 25
Next Steps for AutoViz and Auto_ViML...
●What’s Missing / Could be Improved:
○Build it into Existing Tools such that structured data can be Visualized Fast!
○Build it into Educational tools to make it easy for Students and Colleges (where small, structured
datasets are the Norm) to help Visualize data (as writing code is still very hard for Students)
○Add additional Visualizations such as Pie Charts, Mosaic Charts, etc.
○Build it into Industrial Instruments such as IoT tools so that large data sets can be visualized
Auto_ViML
AutoViz
AI Next Conference: 7/24/2019
THANK YOU
Slide 27

More Related Content

Similar to Auto visualization and viml

.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов
.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов
.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистовNETFest
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...Alok Singh
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer insteadForget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer insteadData Con LA
 
ODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in MLODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in MLBryan Bischof
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdfQualcomm Research
 
Continuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production ReliablyContinuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production ReliablyDr. Arif Wider
 
[DSC Europe 22] Show Me Your MVP! - Liliya Akhtyamova
[DSC Europe 22] Show Me Your MVP! - Liliya Akhtyamova[DSC Europe 22] Show Me Your MVP! - Liliya Akhtyamova
[DSC Europe 22] Show Me Your MVP! - Liliya AkhtyamovaDataScienceConferenc1
 
mca-5thSem-curriculum-and-Syllabus-2017-Batch.pdf
mca-5thSem-curriculum-and-Syllabus-2017-Batch.pdfmca-5thSem-curriculum-and-Syllabus-2017-Batch.pdf
mca-5thSem-curriculum-and-Syllabus-2017-Batch.pdfSanRock2
 
Lean and agile software because or despite rising complexity by Yves Caseau
Lean and agile software because or despite rising complexity by Yves CaseauLean and agile software because or despite rising complexity by Yves Caseau
Lean and agile software because or despite rising complexity by Yves CaseauInstitut Lean France
 
Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months...
Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months...Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months...
Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months...Michael Tougeron
 
Build and deploy your machine learning models effortlessly (2)
Build and deploy your machine learning models effortlessly (2)Build and deploy your machine learning models effortlessly (2)
Build and deploy your machine learning models effortlessly (2)Anam Mahmood
 
G107980 top-it-trends-atlanta-v1904b
G107980 top-it-trends-atlanta-v1904bG107980 top-it-trends-atlanta-v1904b
G107980 top-it-trends-atlanta-v1904bTony Pearson
 
Enterprise Architecture for MBSE and Virtual Manufacturing digital continuity...
Enterprise Architecture for MBSE and Virtual Manufacturing digital continuity...Enterprise Architecture for MBSE and Virtual Manufacturing digital continuity...
Enterprise Architecture for MBSE and Virtual Manufacturing digital continuity...Dr Nicolas Figay
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
 
Industry and academic partnerships july 2015 final
Industry and academic partnerships july 2015 finalIndustry and academic partnerships july 2015 final
Industry and academic partnerships july 2015 finalSteven Miller
 
Fabric for Deep Learning
Fabric for Deep LearningFabric for Deep Learning
Fabric for Deep LearningAnimesh Singh
 

Similar to Auto visualization and viml (20)

.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов
.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов
.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer insteadForget becoming a Data Scientist, become a Machine Learning Engineer instead
Forget becoming a Data Scientist, become a Machine Learning Engineer instead
 
ODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in MLODSC West 2022 – Kitbashing in ML
ODSC West 2022 – Kitbashing in ML
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
Continuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production ReliablyContinuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production Reliably
 
[DSC Europe 22] Show Me Your MVP! - Liliya Akhtyamova
[DSC Europe 22] Show Me Your MVP! - Liliya Akhtyamova[DSC Europe 22] Show Me Your MVP! - Liliya Akhtyamova
[DSC Europe 22] Show Me Your MVP! - Liliya Akhtyamova
 
mca-5thSem-curriculum-and-Syllabus-2017-Batch.pdf
mca-5thSem-curriculum-and-Syllabus-2017-Batch.pdfmca-5thSem-curriculum-and-Syllabus-2017-Batch.pdf
mca-5thSem-curriculum-and-Syllabus-2017-Batch.pdf
 
MicroShed Testing
MicroShed TestingMicroShed Testing
MicroShed Testing
 
Lean and agile software because or despite rising complexity by Yves Caseau
Lean and agile software because or despite rising complexity by Yves CaseauLean and agile software because or despite rising complexity by Yves Caseau
Lean and agile software because or despite rising complexity by Yves Caseau
 
Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months...
Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months...Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months...
Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months...
 
Build and deploy your machine learning models effortlessly (2)
Build and deploy your machine learning models effortlessly (2)Build and deploy your machine learning models effortlessly (2)
Build and deploy your machine learning models effortlessly (2)
 
G107980 top-it-trends-atlanta-v1904b
G107980 top-it-trends-atlanta-v1904bG107980 top-it-trends-atlanta-v1904b
G107980 top-it-trends-atlanta-v1904b
 
Enterprise Architecture for MBSE and Virtual Manufacturing digital continuity...
Enterprise Architecture for MBSE and Virtual Manufacturing digital continuity...Enterprise Architecture for MBSE and Virtual Manufacturing digital continuity...
Enterprise Architecture for MBSE and Virtual Manufacturing digital continuity...
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
Industry and academic partnerships july 2015 final
Industry and academic partnerships july 2015 finalIndustry and academic partnerships july 2015 final
Industry and academic partnerships july 2015 final
 
An approach for adapting a cobot workstation to human operator within a deep ...
An approach for adapting a cobot workstation to human operator within a deep ...An approach for adapting a cobot workstation to human operator within a deep ...
An approach for adapting a cobot workstation to human operator within a deep ...
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
 
Fabric for Deep Learning
Fabric for Deep LearningFabric for Deep Learning
Fabric for Deep Learning
 

More from Bill Liu

Walk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectWalk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectBill Liu
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Bill Liu
 
Productizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeProductizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeBill Liu
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroBill Liu
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsBill Liu
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixBill Liu
 
Practical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScalePractical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScaleBill Liu
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Bill Liu
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsBill Liu
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Bill Liu
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningBill Liu
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileBill Liu
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsBill Liu
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldBill Liu
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeBill Liu
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...Bill Liu
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917Bill Liu
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLPBill Liu
 

More from Bill Liu (20)

Walk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectWalk Through a Real World ML Production Project
Walk Through a Real World ML Production Project
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...
 
Productizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeProductizing Machine Learning at the Edge
Productizing Machine Learning at the Edge
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps Workflows
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
 
Practical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScalePractical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at Scale
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on Mobile
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with Microeconomics
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First World
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the Edge
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Auto visualization and viml

  • 1. AI Next Conference: 7/24/2019 Machine Learning Automated Data Visualization Ram Seshadri July 2019 Slide 1 and
  • 2. AI Next Conference: 7/24/2019AI Next Conference: 7/24/2019 “Machine learning teams are still struggling to take advantage of ML due to challenges with inflexible frameworks, lack of reproducibility, collaboration issues, and immature software tools” Cecelia Shao Comet.ml “Why is my Data Science team taking sooo long to complete a simple project?” -- A Frustrated CIO Slide 2 Machine learning teams are still struggling to take advantage of ML due to challenges with inflexible frameworks, lack of reproducibility, collaboration issues, and immature software tools. The Answer?
  • 3. AI Next Conference: 7/24/2019 Faster Visualization Automatic Feature Selection • Auto_ViML Automatic Model Selection and Tuning • Auto_ViML One Click Model Serving and Production Auto_ViML was designed along with AutoViz to Build Variant Interpretable Machine Learning Models Fast! __ ● They are proprietary and expensive (lock-in) ● Black Boxes which are too complex to interpret ● Very little reproducibility outside of tool HOWEVER CURRENT TOOLS ARE LIMITED BECAUSE... • AutoViz INTRODUCING A SIMPLER APPROACH TO AUTO-ML Slide 3 How can we make DATA SCIENTISTS more productive? AI Next Conference: 7/24/2019
  • 4. AI Next Conference: 7/24/2019 ●Open Source Tools for Faster Time to Insights with Design Goals as: ○Simple: Invoke them with a single Line of Code (each) ○Flexible: Suited to any kind of structured data set with no Prep required ○Incremental: Can be used by anyone from beginners to experts alike ○Experimental: Compare multiple visualization methods and models step by step ○Interpretable: get clear explanation of steps taken with validation graphs ○Reproducible: No Black Box. Reproducible model pipelines and outputs ○Extensible: Open Source with contributions from Python and DS community I Built AutoViz and Auto_ViML to make my own life easier. Hope it will do the same for you. Slide 4 What is Auto_Viz and Auto_ViML?
  • 5. AI Next Conference: 7/24/2019 What is AutoViz? Slide 5 AutoViz enables you to automatically visualize any data set with a Single Line of Code. It automatically: 1. Selects a Random Sample from the Data Set (if the Data Set is very large) 2. Selects most important features using ML (if Number of Variables is very large) 3. Selects Best Methods to Visualize Data for a given problem 4. Provides Charts to be saved in PNG, JPG, and SVG Formats OVERVIEW
  • 6. AI Next Conference: 7/24/2019 Why AutoViz? Slide 6 Help explain your hypotheses and variable selection better to others BENEFITS Systematic Look for insights systematically rather than through “gut instinct” or domain knowledge Simple Reduce features to the most important ones to deliver simple yet powerful insights Explainable
  • 7. AI Next Conference: 7/24/2019 How AutoViz Works Slide 7 Variable Classification Problem Identification Complex Interactions AutoViz classifies features into highly granular data types to determine how best to represent them in Charts AutoViz can visualize any dataset for a given target: Regression, Classification, Time Series, Clustering and more Most charts involve more than one variable helping to deliver powerful insights with minimal effort Select the Most Important Features Select the Best Charts Deliver them Fast! AutoViz uses the powerful ML algorithm, XGBoost, to select important features given the target variable AutoViz selects the best ways to visualize your data to extract insights from your data AutoViz selects statistically valid sample data to visualize (in case data set is very large) Design Goals Implementation AutoViz PROCESS
  • 8. AI Next Conference: 7/24/2019 Github: https://github.com/AutoViML/AutoViz AutoViz: Boston Housing* import AutoViz_Class as AV AVC = AV.AutoViz_Class() Just Import... And Run AutoViz. dft = AVC.AutoViz('', sep, target, df,lowess=True) Results... Slide 8 Thanks to UCI Machine Learning Repository for all data sets in this presentation: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
  • 9. AI Next Conference: 7/24/2019 AutoViz Example: Housing* ● Number of Rooms and Median Value of Homes seem to be highly correlated ● As Age of Building increases, Median Value decreases albeit slowly INSIGHTS ● NOX and DIS seem to be highly correlated though they seem to have a polynomial or non- linear relationship Slide 9* Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
  • 10. AI Next Conference: 7/24/2019 AutoViz Example: Housing ● Both CRIM and ZN are highly skewed ● Both may require a transformation INSIGHTS ● PTRATIO and DIS seem to be somewhat skewed as well but don’t require transformations Slide 10* Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
  • 11. AI Next Conference: 7/24/2019 AutoViz Example: Boston RM, LSTAT, TAX, INDUS, AGE, and CRIM seem to be decently correlated with Target. May be worth exploring if they come up as Important Features. INSIGHTS ● Average Median Value of homes varies widely by CHAS and RAD. Hence would be important features in any model. Slide 11* Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
  • 12. AI Next Conference: 7/24/2019 How to Build a better model? Slide 13 Remove Low Information and Redundant Features Add Polynomial and Interaction, Other Features Select Models from Simple to Complex and Perform Tuning Add Entropy Binning, Stacking to K-Means Featurizers to model Add Imbalanced sampling and training Perform Ensembling of Multiple Types of models BUILD A ViML Model! (VARIANT INTERPRETABLE MACHINE LEARNING MODEL, Step by Step) PROCESS
  • 13. AI Next Conference: 7/24/2019 Why Auto_ViML? Slide 15 MULTIPLE MODELS TRANSPARENCY FEATURE ENGINEERING AUTOMATIC FEATURE SELECTION SYSTEMATIC Auto ViML was designed from the ground-up to mimic how a Data Scientist would approach a Modeling Problem. Enables selective model complexity by adding features and complexity step by step Provides Deep Insights into the Data Set with Full Transparency Models with Fewer Features result in Simpler Models. Auto_ViML Produces models with 10-90% Fewer Features than Regular Models without Significant Loss of Predictive Power* * Based on my experience. Your results may vary. Build and test multiple models thru’ Hyper Tuning and Cross Validation BENEFITS
  • 14. AI Next Conference: 7/24/2019 Auto_ViML LETS YOU TRY MULTIPLE APPROACHES Slide 16 You can access all the powerful features of with one line of Python Code after you import. You can turn on and turn off features and flags to see how they impact Model. TRY MULTIPLE APPROACHES TO GET THE BEST MODEL INTERACTIONS vs. NO INTERACTIONS BOOSTING vs. BAGGING ENSEMBLING vs. STACKING IMBALANCED vs. BALANCED GRIDSEARCH vs. RANDOM SHAP vs. FEATURE IMPORTANCES Just like a Data Scientist would...
  • 15. AI Next Conference: 7/24/2019 Github: https://github.com/AutoViML/Auto_ViML Auto_ViML: Boston Housing from Auto_ViML import Auto_ViML Just Import... model, features, trainm, testm = Auto_ViML(train, target, test, sample_submission='', hyper_param='GS', scoring_parameter='f1', Boosting_Flag=None, KMeans_Featurizer=False, Add_Poly=0, Stacking_Flag=False, Binning_Flag=False, Imbalanced_Flag=True, verbose=0) And Run Auto_VIML. Slide 17 Get Model, Features and transformed Train and Test data... Thanks to UCI Machine Learning Repository for all data sets in this presentation: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
  • 16. AI Next Conference: 7/24/2019 Here is an example of a Regression data set: Boston Housing*. There are 13 predictors in the dataset. But Auto_ViML finds that only 10 variables are needed to get the job done. Also Watch the Feature Importances. Auto_ViML: Boston Housing* Slide 18 DATA SET SIZE 506 x 14 TIME TAKEN 6 secs Variables Selected 10 FEATURE REDUCTION 24% Results: Start with Linear Model * Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
  • 17. AI Next Conference: 7/24/2019 Auto_ViML: Boston Housing Slide 19 Results: Move to Random Forests Time Taken = 30 seconds Thanks to UCI Machine Learning Repository for all data sets in this presentation: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
  • 18. AI Next Conference: 7/24/2019 Auto_ViML: Boston Housing Slide 20 Results: Close with XGBoost * Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
  • 20. AI Next Conference: 7/24/2019 Auto_ViML: Boston Housing Slide 22 Linear Model with Interaction Variables Ensemble Model with Binning Forests Model with Binning Numerics XGBoost Model with Stacking Multiple Models * Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
  • 21. AI Next Conference: 7/24/2019 Slide 23
  • 22. AI Next Conference: 7/24/2019 Auto_ViML: Wisconsin Breast Cancer Slide 24 DATA SET SIZE 512 x 32 TIME TAKEN 12 Secs The Wisconsin Breast Cancer* data set is a classic Data Set: Auto_ViML took 12 Seconds to find the best features and best model with Weighted F1 score of 100% on validation set using Linear model Wisconsin Breast Cancer Data Set FEATURE REDUCTION 52% Macro Average ROC AUC 100% Results: Compare the results to another model using Deep Learning and Keras Link “Hyperparameter Optimization with Keras” by Mikko * Thanks to UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
  • 23. AI Next Conference: 7/24/2019AI Next Conference: 7/24/2019 ●What’s Missing / Could be Improved: ○No Feature Engineering: You can create your own or use kits like featuretools, etc. ○No Image/Video/NLP Support: At the moment, it removes these features from model considerations ○No Time Series modeling: Auto_TimeSeries is in the works. Stay Tuned. ○No Neural Networks or Deep Learning: You can add your own modules or use tools like Ludwig ○Model serving: Adding a module for test data transformation necessary Slide 25 Next Steps for AutoViz and Auto_ViML... ●What’s Missing / Could be Improved: ○Build it into Existing Tools such that structured data can be Visualized Fast! ○Build it into Educational tools to make it easy for Students and Colleges (where small, structured datasets are the Norm) to help Visualize data (as writing code is still very hard for Students) ○Add additional Visualizations such as Pie Charts, Mosaic Charts, etc. ○Build it into Industrial Instruments such as IoT tools so that large data sets can be visualized Auto_ViML AutoViz
  • 24. AI Next Conference: 7/24/2019 THANK YOU Slide 27