Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AutoML: Helping to Bridge Skills Gap Between Data Enthusiasts & Data Scientists

68 views

Published on

AutoML can automate data preparation, feature extraction, model selection, and model tuning. This can save a Data Scientist loads of time. So instead of hiring four Data Scientists, you may only need two, right?

It’s no secret the shortage of data science talent to help companies produce advanced analytics from their stockpiles of data. There is also a plethora of vendor tools available making promises of turning an analyst into the next great data scientist (which BTW, is possible).

From the depths of the hardcore mathematicians, statisticians and computer scientists (who created this stuff in the first place), have created more advanced tools automate the model creation process to help Data Scientists become more efficient, and (hopefully) better at our jobs.

I will demo AutoML, discuss some pros/cons, and what it can do for you.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

AutoML: Helping to Bridge Skills Gap Between Data Enthusiasts & Data Scientists

  1. 1. AUTOML: HELPING TO BRIDGE SKILLS GAP BETWEEN DATA ENTHUSIASTS & DATA SCIENTISTS By Josh Janzen – Data Scientist
  2. 2. AUTOML: AGENDA What is ML What is AutoML Animated Visualizations of ML vs AutoML AutoML code demos on Titanic Dataset Comparison of AutoML tools available
  3. 3. WHERE I LIVE IN TERMS OF DATA SCIENCE TOOLS
  4. 4. THE EMERGING FIELD OF DATA SCIENCE “The more I learn, the more I realize how much I don’t know.”
  5. 5. EXAMPLE OF ML (MACHINE LEARNING) 1. Gather historical data (years of weather, were you cold, hot, comfortable, activity levels, did you wear a coat) 2. Apply algorithm to learn relationships (learn impact of weather, activity, coat, to determine were you comfortable) 3. Predict on new data/future (is it a good idea to where a coat today?) NO. 95.2% chance of being comfortable by not wearing a coat Example: should I wear a coat today? ML finds relationships in large datasets to help us understand patterns and make predictions what is ML video
  6. 6. 1. There is a place for business analysis, and a place for ML. They are very different 2. ML is another tool to help drive value from data, especially with large, complex datasets 3. Both ML & Analysis need an understanding of the business to be successful 4. ML can’t solve every data problem 5. ML is a vast and growing field ML != Analysis ML is another tool to help further drive value from data. WHAT IS ML (MACHINE LEARNING)
  7. 7. WHAT IT ISN’TWHAT IT IS AUTOML • “Tools to make Data Scientists more efficient” • “..Data Science democratization” • AutoML makes ML available to the Data Enthusiasts • Simplifies the Machine Learning model building process by applying Computer Science and Statistical techniques to find an optimal model in an efficient amount of time. • The silver bullet to make all ML better • A proven to produce better results than a very experienced ML Data Scientist • Simpler and easier to use (at least not yet) • Another data buzz word like “big data”
  8. 8. WHAT IS A DATA SCIENCE ENTHUSIAST • The software developer who wants to try ML • The college student with aspirations to be a DS • The mid-level MGR looking to up their game • The analyst looking to differentiate their skills • Do not need advanced MATH & STATS skills Estimated from multiple sources including https://www.kdnuggets.com/2018/09/how-many-data-scientists-are- there.html
  9. 9. ML VS. AUTOML > > Prediction Score: 0.752 ML without AutoML Credit: Josh Janzen Data Scientist Visualize and Structure the Dataset Import Data Work w/Business to Identify Opportunity
  10. 10. ML VS. AUTOML > > Prediction Score: 0.752 ML without AutoML Credit: Josh Janzen Data Scientist Visualize and Structure the Dataset Import Data - Missing values - Outlier handling - Checking variable types Preprocessing - Feature selection - Feature transformation Feature Engineering - Split data train, valid, test - Import ML libraries - Try various algorithm(s) - Score models Partition Data & Model Selection - Evaluate model - Tune hyper parameters Model Tuning - Save best model - Run to make predictions Predict on New Data >> Prediction Score: 0.752 ML with AutoML Credit: Josh Janzen Data Scientist Visualize and Structure the Dataset Import Data Work w/Business to Identify Opportunity Work w/Business to Identify Opportunity SME
  11. 11. ML VS. AUTOML > > Prediction Score: 0.752 ML without AutoML Credit: Josh Janzen Data Scientist Visualize and Structure the Dataset Import Data - Missing values - Outlier handling - Checking variable types Preprocessing - Feature selection - Feature transformation Feature Engineering - Split data train, valid, test - Import ML libraries - Try various algorithm(s) - Score models Partition Data & Model Selection - Evaluate model - Tune hyper parameters Model Tuning - Save best model - Run to make predictions Predict on New Data >> Prediction Score: 0.752 ML with AutoML Credit: Josh Janzen Data Scientist Visualize and Structure the Dataset Import Data Work w/Business to Identify Opportunity Work w/Business to Identify Opportunity
  12. 12. ML VS. AUTOML > >> > Prediction Score: 0.752 - Missing values - Outlier handling - Checking variable types Preprocessing - Feature selection - Feature transformation Feature Engineering - Split data train, valid, test - Import ML libraries - Try various algorithm(s) - Score models Partition Data & Model Selection - Evaluate model - Tune hyper parameters Model Tuning - Save best model - Run to make predictions Predict on New Data ML without AutoML Credit: Josh Janzen Data Scientist Visualize and Structure the Dataset Import Data Prediction Score: 0.752 ML with AutoML Credit: Josh Janzen Data Scientist Visualize and Structure the Dataset Import Data - Automatically build and evaluate 100s of models - Review performance and variable importance Work w/Business to Identify Opportunity Work w/Business to Identify Opportunity
  13. 13. DEMO WITH TITANIC DATASET • Ipython notebook: https://github.com/donnemartin/data-science- ipython- notebooks/blob/master/kaggle/titanic.ipynb • Run MLBox from command line • Demo AutoML App • Show Azure AutoML tool
  14. 14. FEATURE REDUCTION ALGORITHM Source. https://www.analyticsvidhya.com/blog/2017/07/mlbox-library-automated-machine-learning/
  15. 15. AUTOML TOOL COMPARISON
  16. 16. AUTOML TOOL COMPARISON
  17. 17. AZURE ML RESULTS PROS: • No code to write • Lots of investment from Microsoft in this space CONS: • Slower than expected, took about 20 min • No easy way to create new predictions • Process not polished, easy to use as expected
  18. 18. NEXT FRONTIER: AUTO FEATURE ENGINEERING Source: https://towardsdatascience.com/feature-engineering-what-powers-machine-learning- 93ab191bcc2d
  19. 19. THE END Questions? Deck will be posted on my blog www.JoshJanzen.com

×