https://learn.xnextcon.com/event/eventdetails/W20040310
I will describe what is available in terms of Open Source and Proprietary tools for automating Data Science tasks and introduce 2 new tools: one to visualize any sized data set with one click, another: to try multiple ML models and techniques with a single call. I will provide the Github Repos for both for free in the talk.
These days, training of the Machine Learning models at the device Edge is still a risky endeavor. It is frequently considered a purely academic subject with little value for real-life product development.
In her talk, Vera will challenge this misconception, talk about the advantages of learning at the Edge and guide you through the Edge learning decision-making framework and design principles.
https://www.aicamp.ai/event/eventdetails/W2021102210
https://learn.xnextcon.com/event/eventdetails/W20040610
This talk explains how to practically bring the power of convolutional neural networks and deep learning to memory and power-constrained devices like smartphones. You will learn various strategies to circumvent obstacles and build mobile-friendly shallow CNN architectures that significantly reduce the memory footprint and therefore make them easier to store on a smartphone;
The talk also dives into how to use a family of model compression techniques to prune the network size for live image processing, enabling you to build a CNN version optimized for inference on mobile devices. Along the way, you will learn practical strategies to preprocess your data in a manner that makes the models more efficient in the real world.
Deep AutoViML For Tensorflow Models and MLOps WorkflowsBill Liu
deep_autoviml is a powerful new deep learning library with a very simple design goal: Make it as easy as possible for novices and experts alike to experiment with and build tensorflow.keras preprocessing pipelines and models in as few lines of code as possible.
deep_autoviml will enable data scientists, ML engineers and data engineers to fast prototype tensorflow models and data pipelines for MLOps workflows using the latest TF 2.4+ and keras preprocessing layers. You can now upload your saved model to any Cloud provider and make predictions out of the box since all the data preprocessing layers are attached to the model itself!
In this webinar, we will discuss the problems that deep_AutoViML can solve, its architecture design and demo how to build powerful TF.Keras models on structured data, NLP and Image data domains.
https://www.aicamp.ai/event/eventdetails/W2021080918
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsBill Liu
website: https://learn.xnextcon.com/event/eventdetails/W20051110
video: https://www.youtube.com/watch?v=8tG8PJC6oaU
In reinforcement learning (RL), an agent learns how to optimize performance solely by collecting experience in the real world or via a simulator. RL is being applied to problems such as decision making, process optimization (e.g., manufacturing and supply chains), ad serving, recommendations, self-driving cars, and algorithmic trading.
In this talk, I will discuss RLlib, a reinforcement learning library built on Ray with a strong focus on large-scale execution and scalability, ease-of-use for general users, as well as customizability for developers and researchers.
RLlib offers autonomous task-learning via many common RL algorithms and it scales from a laptop to a cluster with hundreds of machines. It is used by dozens of organizations, from startups to research labs to large organizations. You will see RLlib in action with a live demo.
Improving How We Deliver Machine Learning Models (XCONF 2019)David Tan
In this talk, we share some better ways of working that help us with some common challenges faced in a ML project.
Repos:
1. https://github.com/ThoughtWorksInc/ml-app-template
2. https://github.com/ThoughtWorksInc/ml-cd-starter-kit
Demo videos:
1. Dockerised setup https://www.youtube.com/watch?v=S6kWaXQ530k
2. Installing cross-cutting services (e.g. GoCD, MLFlow, EFK): https://www.youtube.com/watch?v=p8jKTlcpnks
3. Rolling back harmful models: https://www.youtube.com/watch?v=rNfrgaRTz7c
These days, training of the Machine Learning models at the device Edge is still a risky endeavor. It is frequently considered a purely academic subject with little value for real-life product development.
In her talk, Vera will challenge this misconception, talk about the advantages of learning at the Edge and guide you through the Edge learning decision-making framework and design principles.
https://www.aicamp.ai/event/eventdetails/W2021102210
https://learn.xnextcon.com/event/eventdetails/W20040610
This talk explains how to practically bring the power of convolutional neural networks and deep learning to memory and power-constrained devices like smartphones. You will learn various strategies to circumvent obstacles and build mobile-friendly shallow CNN architectures that significantly reduce the memory footprint and therefore make them easier to store on a smartphone;
The talk also dives into how to use a family of model compression techniques to prune the network size for live image processing, enabling you to build a CNN version optimized for inference on mobile devices. Along the way, you will learn practical strategies to preprocess your data in a manner that makes the models more efficient in the real world.
Deep AutoViML For Tensorflow Models and MLOps WorkflowsBill Liu
deep_autoviml is a powerful new deep learning library with a very simple design goal: Make it as easy as possible for novices and experts alike to experiment with and build tensorflow.keras preprocessing pipelines and models in as few lines of code as possible.
deep_autoviml will enable data scientists, ML engineers and data engineers to fast prototype tensorflow models and data pipelines for MLOps workflows using the latest TF 2.4+ and keras preprocessing layers. You can now upload your saved model to any Cloud provider and make predictions out of the box since all the data preprocessing layers are attached to the model itself!
In this webinar, we will discuss the problems that deep_AutoViML can solve, its architecture design and demo how to build powerful TF.Keras models on structured data, NLP and Image data domains.
https://www.aicamp.ai/event/eventdetails/W2021080918
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsBill Liu
website: https://learn.xnextcon.com/event/eventdetails/W20051110
video: https://www.youtube.com/watch?v=8tG8PJC6oaU
In reinforcement learning (RL), an agent learns how to optimize performance solely by collecting experience in the real world or via a simulator. RL is being applied to problems such as decision making, process optimization (e.g., manufacturing and supply chains), ad serving, recommendations, self-driving cars, and algorithmic trading.
In this talk, I will discuss RLlib, a reinforcement learning library built on Ray with a strong focus on large-scale execution and scalability, ease-of-use for general users, as well as customizability for developers and researchers.
RLlib offers autonomous task-learning via many common RL algorithms and it scales from a laptop to a cluster with hundreds of machines. It is used by dozens of organizations, from startups to research labs to large organizations. You will see RLlib in action with a live demo.
Improving How We Deliver Machine Learning Models (XCONF 2019)David Tan
In this talk, we share some better ways of working that help us with some common challenges faced in a ML project.
Repos:
1. https://github.com/ThoughtWorksInc/ml-app-template
2. https://github.com/ThoughtWorksInc/ml-cd-starter-kit
Demo videos:
1. Dockerised setup https://www.youtube.com/watch?v=S6kWaXQ530k
2. Installing cross-cutting services (e.g. GoCD, MLFlow, EFK): https://www.youtube.com/watch?v=p8jKTlcpnks
3. Rolling back harmful models: https://www.youtube.com/watch?v=rNfrgaRTz7c
Leverage the power of machine learning on windowsMia Chang
Note:
The Content was modified from the Microsoft Content team.
Deck Owner: Nitah Onsongo
Tech/Msg Review: Cesar De La Torre, Simon Tao, Clarke Rahrig
---
Event: Insider Dev Tour Berlin
Event Description: Microsoft is going on a world tour with the announcements of Build 2019. The Insider Dev Tour focuses on innovations related to Microsoft 365 from a developer's perspective.
Date: June 7th, 2019
Event link: https://www.microsoft.com/de-de/techwiese/news/best-of-build-insider-dev-tour-am-7-juni-in-berlin.aspx
Linkedin: http://linkedin.com/in/mia-chang/
Amazon SageMaker is a fully-managed platform that lets developers and data scientists build and scale machine learning solutions. First, we'll show you how SageMaker Ground Truth helps you label large training datasets. Then, using Jupyter notebooks, we'll show you how to build, train and deploy models using built-in algorithms and frameworks (TensorFlow, Apache MXNet, etc). Finally, we'll show you how to use 3rd-party models from the AWS marketplace.
"Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. Automating the end-to-end process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand."
In this talk we will discuss how QuSandbox and the Model Analytics Studio can be used in the selection of machine learning models. We will also illustrate AutoML frameworks through demos and examples and show you how to get started
Managing and Versioning Machine Learning Models in PythonSimon Frid
Practical machine learning is becoming messy, and while there are lots of algorithms, there is still a lot of infrastructure needed to manage and organize the models and datasets. Estimators and Django-Estimators are two python packages that can help version data sets and models, for deployment and effective workflow.
On-device machine learning: TensorFlow on AndroidYufeng Guo
Machine learning has traditionally been the solely performed on servers and high performance machines. But there is great value is having on-device machine learning for mobile devices. Doing ML inference on mobile devices has huge potential and is still in its early stages. However, it's already more powerful than most realize.
In this demo-oriented talk, you will see some examples of deep learning models used for local prediction on mobile devices. Learn how to use TensorFlow to implement a machine learning model that is tailored to a custom dataset, and start making delightful experiences today!
Kyrylo Perevozchykov "Continuous delivery for Machine Learning, the future of...Fwdays
MLOps itself is a derivative of DevOps, the thought being that there is an entire industry that exists for “Ops” for normal software, and that such an industry will need to emerge for ML as well. But it hasn’t yet. Various technologies has made it easy for people to build predictive models, so people have lots of predictive models now. But to get value out of models you have to deploy, monitor, and maintain them. Very few people know how to do this, even fewer than know how to build a good model in the first place.
This talk will be dedicated to the plans of what is MLOps, what is cases and how it will develop and evolve into a new industry.
Deploying and managing machine learning models at scale introduces new complexities. Fortunately, there are tools that simplify this process. In this talk we walk you through an end-to-end hands on example showing how you can go from research to production without much complexity by leveraging the Seldon Core and MLflow frameworks. We will train a set of ML models, and we will showcase a simple way to deploy them to a Kubernetes cluster through sophisticated deployment methods, including canary deployments, shadow deployments and we’ll touch upon richer ML graphs such as explainer deployments.
DataSciencePT #27 - Fifty Shades of Automated Machine LearningRui Quintino
Is "the sexiest job of 21st century", the Data Scientist, about to be automated? How & when can AutoML tools help on a typical machine learning lifecycle? What AutoML challenges are still open & what ML work will remain in the foreseeable future? Most importantly… will robots get all the fun & sex appeal? :) Some questions we'll try to tackle on this session.
*-Robots are not allowed in this session
As data science workloads grow, so does their need for infrastructure. But, is it fair to ask data scientists to also become infrastructure experts? If not the data scientists, then, who is responsible for spinning up and managing data science infrastructure? This talk will address the context in which ML infrastructure is emerging, walk through two examples of ML infrastructure tools for launching hyperparameter optimization jobs, and end with some thoughts for building better tools in the future.
Originally given as a talk at the PyData Ann Arbor meetup (https://www.meetup.com/PyData-Ann-Arbor/events/260380989/)
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016MLconf
Building a Machine Learning Platform at Quora: Each month, over 100 million people use Quora to share and grow their knowledge. Machine learning has played a critical role in enabling us to grow to this scale, with applications ranging from understanding content quality to identifying users’ interests and expertise. By investing in a reusable, extensible machine learning platform, our small team of ML engineers has been able to productionize dozens of different models and algorithms that power many features across Quora.
In this talk, I’ll discuss the core ideas behind our ML platform, as well as some of the specific systems, tools, and abstractions that have enabled us to scale our approach to machine learning.
http://ainyc19.xnextcon.com
I will describe what is available in terms of Open Source and Proprietary tools for automating Data Science tasks and introduce 2 new tools: one to visualize any sized data set with one click, another: to try multiple ML models and techniques with a single call. I will provide the Github Repos for both for free in the talk.
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and TricksSenturus
Senturus shares insights and tips on IBM Cognos 10 Framework Manager Metadata Modeling. View the video recording and download this deck: http://www.senturus.com/resources/cognos-framework-manager-metadata-modeling-tips-tricks/.
Topics Include:
• Use determinants, parameter maps and query macros to implement row level security
• Understand the use of determinants and their importance
• Enhance your metadata by leveraging parameter maps and query macros
See a live demonstration of implementing row-level security based on user attributes, dimensional modeling of relational query subjects and use of Model Design Accelerator.
Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.
Leverage the power of machine learning on windowsMia Chang
Note:
The Content was modified from the Microsoft Content team.
Deck Owner: Nitah Onsongo
Tech/Msg Review: Cesar De La Torre, Simon Tao, Clarke Rahrig
---
Event: Insider Dev Tour Berlin
Event Description: Microsoft is going on a world tour with the announcements of Build 2019. The Insider Dev Tour focuses on innovations related to Microsoft 365 from a developer's perspective.
Date: June 7th, 2019
Event link: https://www.microsoft.com/de-de/techwiese/news/best-of-build-insider-dev-tour-am-7-juni-in-berlin.aspx
Linkedin: http://linkedin.com/in/mia-chang/
Amazon SageMaker is a fully-managed platform that lets developers and data scientists build and scale machine learning solutions. First, we'll show you how SageMaker Ground Truth helps you label large training datasets. Then, using Jupyter notebooks, we'll show you how to build, train and deploy models using built-in algorithms and frameworks (TensorFlow, Apache MXNet, etc). Finally, we'll show you how to use 3rd-party models from the AWS marketplace.
"Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. Automating the end-to-end process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand."
In this talk we will discuss how QuSandbox and the Model Analytics Studio can be used in the selection of machine learning models. We will also illustrate AutoML frameworks through demos and examples and show you how to get started
Managing and Versioning Machine Learning Models in PythonSimon Frid
Practical machine learning is becoming messy, and while there are lots of algorithms, there is still a lot of infrastructure needed to manage and organize the models and datasets. Estimators and Django-Estimators are two python packages that can help version data sets and models, for deployment and effective workflow.
On-device machine learning: TensorFlow on AndroidYufeng Guo
Machine learning has traditionally been the solely performed on servers and high performance machines. But there is great value is having on-device machine learning for mobile devices. Doing ML inference on mobile devices has huge potential and is still in its early stages. However, it's already more powerful than most realize.
In this demo-oriented talk, you will see some examples of deep learning models used for local prediction on mobile devices. Learn how to use TensorFlow to implement a machine learning model that is tailored to a custom dataset, and start making delightful experiences today!
Kyrylo Perevozchykov "Continuous delivery for Machine Learning, the future of...Fwdays
MLOps itself is a derivative of DevOps, the thought being that there is an entire industry that exists for “Ops” for normal software, and that such an industry will need to emerge for ML as well. But it hasn’t yet. Various technologies has made it easy for people to build predictive models, so people have lots of predictive models now. But to get value out of models you have to deploy, monitor, and maintain them. Very few people know how to do this, even fewer than know how to build a good model in the first place.
This talk will be dedicated to the plans of what is MLOps, what is cases and how it will develop and evolve into a new industry.
Deploying and managing machine learning models at scale introduces new complexities. Fortunately, there are tools that simplify this process. In this talk we walk you through an end-to-end hands on example showing how you can go from research to production without much complexity by leveraging the Seldon Core and MLflow frameworks. We will train a set of ML models, and we will showcase a simple way to deploy them to a Kubernetes cluster through sophisticated deployment methods, including canary deployments, shadow deployments and we’ll touch upon richer ML graphs such as explainer deployments.
DataSciencePT #27 - Fifty Shades of Automated Machine LearningRui Quintino
Is "the sexiest job of 21st century", the Data Scientist, about to be automated? How & when can AutoML tools help on a typical machine learning lifecycle? What AutoML challenges are still open & what ML work will remain in the foreseeable future? Most importantly… will robots get all the fun & sex appeal? :) Some questions we'll try to tackle on this session.
*-Robots are not allowed in this session
As data science workloads grow, so does their need for infrastructure. But, is it fair to ask data scientists to also become infrastructure experts? If not the data scientists, then, who is responsible for spinning up and managing data science infrastructure? This talk will address the context in which ML infrastructure is emerging, walk through two examples of ML infrastructure tools for launching hyperparameter optimization jobs, and end with some thoughts for building better tools in the future.
Originally given as a talk at the PyData Ann Arbor meetup (https://www.meetup.com/PyData-Ann-Arbor/events/260380989/)
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016MLconf
Building a Machine Learning Platform at Quora: Each month, over 100 million people use Quora to share and grow their knowledge. Machine learning has played a critical role in enabling us to grow to this scale, with applications ranging from understanding content quality to identifying users’ interests and expertise. By investing in a reusable, extensible machine learning platform, our small team of ML engineers has been able to productionize dozens of different models and algorithms that power many features across Quora.
In this talk, I’ll discuss the core ideas behind our ML platform, as well as some of the specific systems, tools, and abstractions that have enabled us to scale our approach to machine learning.
http://ainyc19.xnextcon.com
I will describe what is available in terms of Open Source and Proprietary tools for automating Data Science tasks and introduce 2 new tools: one to visualize any sized data set with one click, another: to try multiple ML models and techniques with a single call. I will provide the Github Repos for both for free in the talk.
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and TricksSenturus
Senturus shares insights and tips on IBM Cognos 10 Framework Manager Metadata Modeling. View the video recording and download this deck: http://www.senturus.com/resources/cognos-framework-manager-metadata-modeling-tips-tricks/.
Topics Include:
• Use determinants, parameter maps and query macros to implement row level security
• Understand the use of determinants and their importance
• Enhance your metadata by leveraging parameter maps and query macros
See a live demonstration of implementing row-level security based on user attributes, dimensional modeling of relational query subjects and use of Model Design Accelerator.
Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.
A case study in using ibm watson studio machine learning services ibm devel...Einar Karlsen
This IBM Developer article shows various ways of predicting customer churn using IBM Watson Studio ranging from a semi-automated approach using the Model Builder, a diagrammatic approach using SPSS Modeler Flows to a fully programmed style using Jupyter notebooks.
Microsoft has released Automated ML technologies for developers through ML.NET, Azure ML Service, and Azure Databricks. This presenter is a data scientist and Microsoft architect, and will give a comprehensive overview of the utility and use case of this automated technology for production solutions. The presentation includes code you can try now.
Strata CA 2019: From Jupyter to Production Manu MukerjiManu Mukerji
Proposed title
From Jupyter to production
Description of the presentation
Jupyter is very popular for data science, data exploration and visualization, this talk is about how to use it in for AI/ML in a production environment.
General Flow of talk:
How things can go wrong with QA, Production releases when using a notebook
Common Jupyter ML examples
Standard ML flow
Training in production
Model creation
Testing in production
Papermill and Jupyter
Production workflows with Sagemaker
Speaker
Manu Mukerji is senior director of data, machine learning, and analytics at 8×8. Manu’s background lies in cloud computing and big data, working on systems handling billions of transactions per day in real time. He enjoys building and architecting scalable, highly available data solutions and has extensive experience working in online advertising and social media.
1) Learn about Myplanet's Headless CMS solution using Gatsby Preview and Contentful’s UI Extensions (https://www.contentful.com/resources/serverless/)
2) their Serverless project with IBM - using Apache OpenWhisk (https://www.ibm.com/cloud/functions)
3) how Myplanet got involved with AWS DeepRacer - a fun way to get started with Reinforcement Learning (RL), and their racing experience at re:Invent DeepRacer League (https://reinvent.awsevents.com/learn/deepracer/)
4) their Machine Learning (ML) research related to finding DeepRacer’s ideal line (https://medium.com/myplanet-musings/the-best-path-a-deepracer-can-learn-2a468a3f6d64).
BONUS: Two TED Talks referenced in the intro
5) When ideas have sex | Matt Ridley | Jul 14, 2010 https://www.ted.com/talks/matt_ridley_when_ideas_have_sex
6) Why The Best Leaders Make Love The Top Priority | Matt Tenney | Dec 5, 2019 https://www.youtube.com/watch?v=qCVoohdyI6I
VIDEO: https://youtu.be/ZH1xxmBNx5k
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...Amazon Web Services Korea
스폰서 발표 세션 | Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용
홍운표 데이터 사이언티스트, DataRobot
데이터로봇은 기존 분석 소프트웨어와 달리 자동화된 분석 플랫폼입니다. 현업 담당자는 데이터 정의만 완료되면 자신의 업무에 AI를 적용하여 업무 효율을 얻을 수 있고, 데이터 과학자도 기존 분석업무 대비 수십배의 효율성을 얻을 수 있습니다. 데이터로봇은 이렇게 기업 업무에 AI를 쉽게 적용하여, 비지니스 가치를 실현하도록 도와드릴 수 있습니다. 본 세션에서는 데이터로봇이 제공하는 자동화된 분석의 세부 기능을 살펴보고 제품 데모를 통해 자동화된 분석이 어떻게 분석 결과물의 품질을 높이고, 기존 분석 작업보다 훨씬 효율적인 업무를 수행할 수 있게 도와드리는지 확인하실 수 있습니다.
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
In the dynamic field of DevOps, the quest for efficiency and productivity is endless. This talk introduces a revolutionary toolkit: Large Language Models (LLMs), including ChatGPT, Gemini, and Claude, extending far beyond traditional coding assistance. We'll explore how LLMs can automate not just code generation, but also transform day-to-day operations such as crafting compelling cover letters for TPS reports, streamlining client communications, and architecting innovative DevOps solutions. Attendees will learn effective prompting strategies and examine real-life use cases, demonstrating LLMs' potential to redefine productivity in the DevOps landscape. Join us to discover how to harness the power of LLMs for a comprehensive productivity boost across your DevOps activities.
Reviewing progress in the machine learning certification journey
𝗦𝗽𝗲𝗰𝗶𝗮𝗹 𝗔𝗱𝗱𝗶𝘁𝗶𝗼𝗻 - Short tech talk on How to Network by Qingyue(Annie) Wang
C𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 AI and ML on Google Cloud by Margaret Maynard-Reid
𝗔 𝗳𝗼𝗰𝘂𝘀𝗲𝗱 𝗰𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 𝗠𝗟 𝗽𝗿𝗼𝗯𝗹𝗲𝗺 𝗳𝗿𝗮𝗺𝗶𝗻𝗴, 𝗺𝗼𝗱𝗲𝗹 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻, 𝗮𝗻𝗱 𝗳𝗮𝗶𝗿𝗻𝗲𝘀𝘀 by Sowndarya Venkateswaran.
A discussion on sample questions to aid certification exam preparation.
An interactive Q&A session to clarify doubts and questions.
Previewing next steps and topics, including course completions and material reviews.
Building machine learning muscle in your team & transitioning to make them do machine learning at scale. We also discuss about Spark & other relevant technologies.
Delivered at Pittsburgh Tech Fest - 6/10/2017
Knowledge is power, but is it if you're not using it? What if the application you delivered to your customers was extremely intelligent? It could retrieve, analyze and use the massive amounts of data that businesses are generating at an astronomical rate.
It could analyze business deals, predict potential issues, proactively recommend business decisions and estimate profit, loss and risks.
Those things provide direct benefits to your company. Churning through that data by hand doesn't. Enter Azure Machine Learning.
In this session you will learn how to integrate Azure Machine Learning into your existing applications and workflows with REST services. You will learn how to deliver a modular, maintainable solution to your customers that allows them to analyze their data.
You will learn to:
* Numerous ways to abstract business rules, workflows, AI (Machine Learning) and more into your applications
* How to Integrate Azure Machine Learning into your existing Applications and Processes
* Create Azure Machine Learning Experiments
* Retrieve the Score from an Azure Machine Learning Experiment and integrate it into your applications and processes
* Integrate numerous Machine Learning Experiments from the Azure Machine Learning Marketplace into your existing applications and processes
* Learn various concepts for abstracting and managing services and api's.
SigOpt's Fay Kallel, Head of Product, and Jim Blomo, Head of Engineering, describe the latest updates to SigOpt, a suite of features that help you manage your modeling process.
Pitfalls of machine learning in productionAntoine Sauray
Going from POC to production with Machine Learning can lead to many unexpected problems. We explore some of them in this presentation at the Nantes Machine Learning Meetup.
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Benjamin Bengfort
This is an overview of the goals and roadmap for the Yellowbrick model visualization library (www.scikit-yb.org). If you're interested in contributing to Yellowbrick or writing visualizers, this is a good place to get started.
In the presentation we discuss the expected workflow of data scientists interacting with the model selection triple and Scikit-Learn. We describe the Yellowbrick API and it's relationship to the Scikit-Learn API. We introduce our primary object: the Visualizer, an estimator that learns from data and displays it visually. Finally we describe the requirements for developing for Yellowbrick, the tools and utilities in place and how to get started.
Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the Scikit-Learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines Scikit-Learn with Matplotlib in the best tradition of the Scikit-Learn documentation, but to produce visualizations for your models!
This presentation was given during the opening session of the 2017 Spring DDL Research Labs.
Walk Through a Real World ML Production ProjectBill Liu
Success in productionizing ML models is difficult to achieve due to tools, processes and operational procedures. In this session, we demonstrate how data scientists and ML engineers collaborate and efficiently deploy models to production with the Wallaroo platform.
Using a real world scenario we will click down into the ML production journey that Data Scientists and ML engineers go through to take ML models into production. In this session you will learn:
The current pain points and blockers to production
The 2 persona roles in the ML production process. Data Scientist (DS) and ML Engineer
How the ML engineer creates a workspace in Wallaroo, and invites the DS to collaborate
How the DS uploads and deploys models to WL performing simple validation checks on output
How the ML Engineer can check model health (inference speed, etc)
How the DS checks logs, looks for anomalies
How the DS switches model in the pipeline
Speakers: Nina Zumel, Martin Bald
Redefining MLOps with Model Deployment, Management and Observability in Produ...Bill Liu
Tech talk: https://www.aicamp.ai/event/eventdetails/W2022052410
What happens after your machine learning models are deployed in production? How do you make sure that your model performance does not degrade as data and the world change?
The constantly changing data creates challenges for data scientists and engineering teams on how to detect which models have been affected and how to get their ML applications up and running seamlessly.
In this session we will take a deep dive into the new ML model monitoring and drift detection technology. We will discuss:
- How to track the ongoing accuracy of their models in production
- How to immediately detect drift before it causes significant damage to the business
- How to locate the cause of model drifting in live environments.
We will also discuss how data scientists and ML engineers can collaborate effectively using their respective tools to identify issues and take the necessary actions with a live demo and a real world use case.
Speaker: Younes Amar, Head of Product Wallaroo AI.
Resources: https://docs.wallaroo.ai/
Attention Is All You Need.
With these simple words, the Deep Learning industry was forever changed. Transformers were initially introduced in the field of Natural Language Processing to enhance language translation, but they demonstrated astonishing results even outside language processing. In particular, they recently spread in the Computer Vision community, advancing the state-of-the-art on many vision tasks. But what are Transformers? What is the mechanism of self-attention, and do we really need it? How did they revolutionize Computer Vision? Will they ever replace convolutional neural networks?
These and many other questions will be answered during the talk.
In this tech talk, we will discuss:
- A piece of history: Why did we need a new architecture?
- What is self-attention, and where does this concept come from?
- The Transformer architecture and its mechanisms
- Vision Transformers: An Image is worth 16x16 words
- Video Understanding using Transformers: the space + time approach
- The scale and data problem: Is Attention what we really need?
- The future of Computer Vision through Transformers
Speaker: Davide Coccomini, Nicola Messina
Website: https://www.aicamp.ai/event/eventdetails/W2021101110
Metaflow: The ML Infrastructure at NetflixBill Liu
Metaflow was started at Netflix to answer a pressing business need: How to enable an organization of data scientists, who are not software engineers by training, build and deploy end-to-end machine learning workflows and applications independently. We wanted to provide the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure: data, compute, orchestration, and versioning.
Today, the open-source Metaflow powers hundreds of business-critical ML projects at Netflix and other companies from bioinformatics to real estate.
In this talk, you will learn about:
- What to expect from a modern ML infrastructure stack.
- Using Metaflow to boost the productivity of your data science organization, based on lessons learned from Netflix.
- Deployment strategies for a full stack of ML infrastructure that plays nicely with your existing systems and policies.
https://www.aicamp.ai/event/eventdetails/W2021080510
AI stands on three pillars: algorithms, hardware and training data. While the first two have already become commodities on the market, the latter - reliable labelled data - is still a bottleneck in the industry.
Need to add twice as much data to the training set to improve your model? Want to validate the accuracy of a new classificator in an hour? Or maybe you are building a human-in-the-loop process with 90% of cases processed automatically and the trickiest 10% of cases fine-tuned by people in real time. You can do it all with crowdsourcing, but only with crowdsourcing done right.
In this talk, we will discuss how the new generation of methods and tools allows to collect high quality human labelled data on a large scale and why every ML specialist should know how to use crowdsourcing.
You will learn from the talk:
* Understand the applicability, benefits and limits of the crowdsourcing approach.
* Integrate an on-demand workforce into your processes and build human-in-the-loop processes.
* Control the quality and accuracy of data labeling to develop high performing ML models.
* Understand the full-cycle crowdsourcing project
Speaker: Daria Baidakova(Toloka)
Building large scale transactional data lake using apache hudiBill Liu
Data is a critical infrastructure for building machine learning systems. From ensuring accurate ETAs to predicting optimal traffic routes, providing safe, seamless transportation and delivery experiences on the Uber platform requires reliable, performant large-scale data storage and analysis. In 2016, Uber developed Apache Hudi, an incremental processing framework, to power business critical data pipelines at low latency and high efficiency, and helps distributed organizations build and manage petabyte-scale data lakes.
In this talk, I will describe what is APache Hudi and its architectural design, and then deep dive to improving data operations by providing features such as data versioning, time travel.
We will also go over how Hudi brings kappa architecture to big data systems and enables efficient incremental processing for near real time use cases.
Speaker: Satish Kotha (Uber)
Apache Hudi committer and Engineer at Uber. Previously, he worked on building real time distributed storage systems like Twitter MetricsDB and BlobStore.
website: https://www.aicamp.ai/event/eventdetails/W2021043010
Deep Reinforcement Learning and Its ApplicationsBill Liu
What is the most exciting AI news in recent years? AlphaGo!
What are key techniques for AlphaGo? Deep learning and reinforcement learning (RL)!
What are application areas for deep RL? A lot! In fact, besides games, deep RL has been making tremendous achievements in diverse areas like recommender systems and robotics.
In this talk, we will introduce deep reinforcement learning, present several applications, and discuss issues and potential solutions for successfully applying deep RL in real life scenarios.
https://www.aicamp.ai/event/eventdetails/W2021042818
Big Data and AI in Fighting Against COVID-19Bill Liu
Website: https://learn.xnextcon.com/event/eventdetails/W20070810
As the COVID-19 pandemic sweeps the globe, big data and AI have emerged as crucial tools for everything from diagnosis and epidemiology to therapeutic and vaccine development.
In this talk, we collect and review how big data is fighting back against COVID-19. We also provide a deep diving for two interesting use cases: 1) Use NLP and BERT to answer scientific questions. 2) Covid-19 data lake from Databricks, Google and Amazon
Agenda:
Introduction
Supercomputers for Scientific Research
Covid-19 Tracking and Prediction
Covid-19 Research and Diagnosis
Use Case 1 NLP and BERT to answer scientific questions
Use Case 2 Covid-19 Data Lake and Platform
Build computer vision models to perform object detection and classification w...Bill Liu
event: https://learn.xnextcon.com/event/eventdetails/W20042918
video:
description: Computer Vision has received significant attention over the recent years, both within academia, and industry. As the state-of-the-art rapidly improves, the art-of-the-possible follows , offering innovative forms of computer vision applications for different scenarios.
In this talk, Ramine will cover the background and development of computer vision, and demonstrate how to use AWS to build robust, computer vision models to perform object detection and classification.
Key Takeaways:
Understand the history of Computer Vision
Learn how to use Amazon SageMaker to build and Deploy Computer Vision Models
How to orchestrate multiple models for implementing a real-world use case
Causal Inference in Data Science and Machine LearningBill Liu
Event: https://learn.xnextcon.com/event/eventdetails/W20042010
Video: https://www.youtube.com/channel/UCj09XsAWj-RF9kY4UvBJh_A
Modern machine learning techniques are able to learn highly complex associations from data, which has led to amazing progress in computer vision, NLP, and other predictive tasks. However, there are limitations to inference from purely probabilistic or associational information. Without understanding causal relationships, ML models are unable to provide actionable recommendations, perform poorly in new, but related environments, and suffer from a lack of interpretability.
In this talk, I provide an introduction to the field of causal inference, discuss its importance in addressing some of the current limitations in machine learning, and provide some real-world examples from my experience as a data scientist at Brex.
Monthly AI Tech Talks in Toronto 2019-08-28
https://www.meetup.com/aittg-toronto
The talk will cover the end-to-end details including contextual and linguistic feature extraction, vectorization, n-grams, topic modeling, named entity resolution which are based on concepts from mathematics, information retrieval and natural language processing. We will also be diving into more advanced feature engineering strategies such as word2vec, GloVe and fastText that leverage deep learning models.
In addition, attendees will learn how to combine NLP features with numeric and categorical features and analyze the feature importance from the resulting models.
The following libraries will be used to demonstrate the aforementioned feature engineering techniques: spaCy, Gensim, fasText and Keras in Python.
https://www.meetup.com/aittg-toronto/events/261940480/
weekly AI tech talk #85 ml-agents Enabling Learned Behaviors with Reinforceme...Bill Liu
https://learn.xnextcon.com/event/eventdetails/W19061910
Behaviors in games---and in the real world---are often difficult to program explicitly. Reinforcement learning (RL) has shown success in learning behaviors based on a simple defined reward function that incentivises correct behavior.
Unity ML-Agents toolkit enables Unity developers to train reinforcement learning models to control behaviors within their games. Once these models are trained, they can be integrated across platforms into a game build via the Unity Inference Engine.
Furthermore, by enabling communication between a Unity build and Python code, ML-Agents enables RL researchers to use Unity games as training environments.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 6
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
1. Machine Learning
Ram Seshadri
March 2020 Slide 1
AutoViz and Auto_ViML
Faster Time to Insights
using Automated
Visualization and Machine
Learning
2. “Machine learning teams are still struggling to take advantage of ML
due to challenges with inflexible frameworks, lack of reproducibility,
collaboration issues, and immature software tools”
Cecelia Shao
Comet.ml
“Why is my Data Science team taking sooo
long to complete a simple project?”
-- A Frustrated CIO
Slide 2
Machine learning teams are still struggling to take advantage of ML due to
challenges with inflexible frameworks, lack of reproducibility, collaboration
issues, and immature software tools.
The Answer?
3. 7/24/2019
Faster
Visualization
Automatic
Feature
Selection
• Auto_ViML
Automatic
Model Selection
and Tuning
• Auto_ViML
One Click Model
Serving and
Production
Auto_ViML was designed along with AutoViz to Build Variant
Interpretable Machine Learning Models Fast!
__
● They are proprietary and expensive (lock-in)
● Black Boxes which are too complex to interpret
● Very little reproducibility outside of tool
HOWEVER CURRENT TOOLS ARE LIMITED BECAUSE...
• AutoViz
INTRODUCING A SIMPLER APPROACH TO AUTO-ML
Slide 3
How can we make DATA SCIENTISTS more productive?
4. ●Open Source Tools for Faster Time to Insights with Design Goals as:
○ Simple: Invoke them with a single Line of Code (each)
○ Flexible: Suited to any kind of structured data set with no Prep required
○ Incremental: Can be used by anyone from beginners to experts alike
○ Experimental: Compare multiple visualization methods and models step by step
○ Interpretable: get clear explanation of steps taken with validation graphs
○ Reproducible: No Black Box. Reproducible model pipelines and outputs
○ Extensible: Open Source with contributions from Python and DS community
I Built AutoViz and Auto_ViML to make my own life easier.
Hope it will do the same for you.
Slide 4
What is Auto_Viz and Auto_ViML?
5. AutoViz and Auto_ViML do not completely eliminate the need for data scientists. But they speed up some
steps in the ML workflow which makes data scientists more productive!
Slide 5
Where do they fit in ML Workflow?
AutoViz Auto_ViML
6. What is AutoViz?
Slide 6
AutoViz enables you to automatically
visualize any data set with a Single Line of
Code. It automatically:
1. Selects a Random Sample from the Data
Set (if the Data Set is very large)
2. Selects most important features using
ML (if Number of Variables is very large)
3. Selects Best Methods to Visualize Data
for a given problem
4. Provides Charts to be saved in PNG,
JPG, and SVG Formats
OVERVIEW
7. Why AutoViz?
Slide 7
Help explain your hypotheses and variable selection better to others
BENEFITS
Systematic Look for insights systematically rather than through “gut instinct” or
domain knowledge
Simple Reduce features to the most important ones to deliver simple yet
powerful insights
Explainable
8. How AutoViz Works
Slide 8
Variable
Classification
Problem
Identification
Complex
Interactions
AutoViz classifies features into
highly granular data types to
determine how best to
represent them in Charts
AutoViz can visualize any
dataset for a given target:
Regression, Classification, Time
Series, Clustering and more
Most charts involve more than
one variable helping to deliver
powerful insights with minimal
effort
Select the Most
Important Features
Select the Best
Charts
Deliver them Fast!
AutoViz uses the powerful ML
algorithm, XGBoost, to select
important features given the
target variable
AutoViz selects the best ways
to visualize your data to extract
insights from your data
AutoViz selects statistically
valid sample data to visualize
(in case data set is very large)
Design Goals
Implementation
AutoViz PROCESS
9. https://github.com/AutoViML/AutoViz
AutoViz
from autoviz.AutoViz_Class import AutoViz_Class
AV = AutoViz_Class()
2. Next Import...
3. Run AutoViz.
dft = AV.AutoViz('', sep, target, df)
See Results...
Slide 9
Thanks to UCI Machine Learning Repository for all data sets in this presentation:
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA:
University of California, School of Information and Computer Science.
1. First, Install...
pip install autoviz
Installing and Running
AutoViz is as easy as 1, 2, 3!
Source Code:
10. pip install autoviz+
AutoViz Downloads
Slide 10
AutoViz has now
been downloaded
more than 17K times+
AutoViz Downloads
* As of August 23, 2019
Chart Source Courtesy:
https://packaging.python.org/guides/analyzing-pypi-package-downloads/
+ Stats Source Courtesy:
PePy org
11. AutoViz: Housing*
● Number of Rooms and Median Value of
Homes seem to be highly correlated
● As Age of Building increases, Median
Value decreases albeit slowly
INSIGHTS
● NOX and DIS seem to be highly
correlated though they seem
to have a polynomial or
non-linear relationship
Slide 11* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
12. AutoViz: Housing*
● Both CRIM and ZN are highly skewed
● Both may require a transformation
INSIGHTS
● PTRATIO and DIS seem to be
somewhat skewed as well but
don’t require transformations
Slide 12* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
13. AutoViz: Housing*
RM, LSTAT, TAX, INDUS, AGE, and CRIM seem to be
decently correlated with Target. May be worth
exploring if they come up as Important Features.
INSIGHTS
● Average Median Value of
homes varies widely by CHAS
and RAD. Hence would be
important features in any
model.
Slide 13* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
14. Build your first baseline model using:
1. Features selected by AutoViz
2. Iterate through models and visualization until
satisfied with result
How AutoViz enables better Modeling
Slide 14
AutoViz
Look at Charts and Graphs generated by AutoViz:
1. Sometimes, AV can generate over 1000 charts!
2. Save Charts you like best by right click and save
3. You also have the option of getting charts in
SVG or PNG or JPEG formats
Call AutoViz from Python Jupyter Notebooks:
1. Give directory path+filename, separator in
data, target variable name (can be empty
string)
2. If DataFrame available instead of filename,
give name of DataFrame in above
3. Run AutoViz
Select Variables:
1. That appear most promising to Modeling goal
2. Rerun AutoViz by removing variables or adding
new variables that provide more insights
1. Call AutoViz with your Filename and target
2. Look at Charts to derive Insights into Data3. Add/Remove Variables and re-run AutoViz
4. Build your baseline model with AutoViz features
15. How to Build a better model?
Slide 15
Remove Low Information
and Redundant Features
Add Polynomial and
Interaction, Other
Features
Select Models from
Simple to Complex and
Perform Tuning
Add Entropy Binning,
Stacking to K-Means
Featurizers to model
Add Imbalanced sampling
and training
Perform Ensembling of
Multiple Types of models
BUILD A ViML Model!
(VARIANT INTERPRETABLE MACHINE LEARNING MODEL, Step by Step)
PROCESS
16. What is Auto_ViML?
Slide 16
INSIGHTFUL
INTERPRETA
BLE
VARIANT
REPRODUCI
BLE
ITERATIVE
ViML helps you
try as many as
15 different
models with one
API
ViML reduces
features to the
bare minimum
(as much 10-90%
reduction in
features)
ViML is fully
reproducible by
explaining its
steps (full
transparency)
Delivers insights
into which
models and
techniques will
work best with
your data
ViML helps you
build you more
complex models
after trying out
simpler options
VARIANT INTERPRETABLE MACHINE LEARNING MODELS with one API
17. Why Auto_ViML?
Slide 17
MULTIPLE MODELS
TRANSPARENCY
FEATURE
ENGINEERING
AUTOMATIC
FEATURE
SELECTION
SYSTEMATIC Auto ViML was designed from the ground-up to mimic how a Data
Scientist would approach a Modeling Problem.
Enables selective model complexity by adding features and complexity
step by step
Provides Deep Insights into the Data Set with Full Transparency
Models with Fewer Features result in Simpler Models. Auto_ViML
Produces models with 10-99% Fewer Features than Regular Models
without Significant Loss of Predictive Power*
* Based on my experience. Your results may vary.
Build and test multiple models thru’ Hyper Tuning and Cross Validation
BENEFITS
18. Now with
CatBoost!!
Auto_ViML LETS YOU TRY MULTIPLE APPROACHES
Slide 18
You can access all the powerful features of with one line of Python Code after you import.
You can turn on and turn off features and flags to see how they impact Model.
TRY
MULTIPLE
APPROACHES
TO GET THE
BEST MODEL
INTERACTIONS
vs. NO
INTERACTIONS
BOOSTING
vs. BAGGING
ENSEMBLING
vs. STACKING
IMBALANCED
vs.
BALANCED
GRIDSEARCH
vs. RANDOM
FEATURE
IMPORTANCES
Keep Upgrading
Auto_ViML version
since it is updated
monthly!
Predictions
from 12
models
Downsampling
supported
HyperOpt
coming
SHAP included
0 = No Intxns
1 = Pairwise Intxns
2 = Squared Vars
19. model, features, trainm, testm = Auto_ViML(train, target, test, sample_submission='',
hyper_param='GS', scoring_parameter='rmse',
feature_reduction=True,
Boosting_Flag=None, KMeans_Featurizer=False, Add_Poly=0,
Stacking_Flag=False, Binning_Flag=False, Imbalanced_Flag=True,
verbose=0)
Github: https://github.com/AutoViML/Auto_ViML
Auto_ViML
from autoviml.Auto_ViML import Auto_ViML
2. Next, Import...
3. Run Auto_VIML.
Slide 19
Get a fully trained Model, best Features and transformed Train and
Test data...
Thanks to UCI Machine Learning Repository for all data sets in this presentation:
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA:
University of California, School of Information and Computer Science.
Installing and Running
Auto_ViML is as easy as
1, 2, 3!
pip install autoviml
1. First Install...
20. pip install autoviml
Auto_ViML Downloads
Slide 20
Chart Source Courtesy:
https://packaging.python.org/guides/analyzing-pypi-package-downloads/
Auto_ViML has now
nearly 50K
downloads+
Auto_ViML Downloads
* As of March 20, 2020
+ Stats Source Courtesy:
PePy org
21. Here is an example of a Regression data set: Boston
Housing*. There are 13 predictors in the dataset.
But Auto_ViML finds that only 10 variables are needed
to get the job done. For example:
['RM', 'LSTAT', 'NOX', 'PTRATIO', 'CRIM', 'TAX',
'CHAS', 'B', 'RAD', 'ZN']
Auto_ViML: Boston Housing*
Slide 21
DATA SET SIZE 506 x
14
TIME TAKEN
6 secs
Variables Selected
10
FEATURE REDUCTION
24%
Results:
Start with Linear Model
* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
22. Auto_ViML: Boston Housing
Slide 22
Results:
Move to Random Forests
Time Taken = 30 seconds
Thanks to UCI Machine Learning Repository for all data sets in this presentation:
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA:
University of California, School of Information and Computer Science.
23. Auto_ViML: Boston Housing
Slide 23
Results:
Close with XGBoost
* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
25. Auto_ViML: Boston Housing
Slide 25
Linear Model with Interaction Variables
Ensemble Model with Binning
Forests Model with Binning Numerics
XGBoost Model with Stacking
Multiple Models
* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
26. Auto_ViML: Wisconsin Breast Cancer
Slide 26
DATA SET SIZE
512 x 32
TIME TAKEN
12 Secs
The Wisconsin Breast Cancer* data set is a classic
Data Set: Auto_ViML took 12 Seconds to find the
best features and best model with Weighted F1
score of 100% on validation set using Linear model
Wisconsin Breast Cancer Data Set
FEATURE REDUCTION
52%
Macro Average ROC AUC
100%
Results:
Compare the results
to another model
using Deep Learning
and Keras
Link
“Hyperparameter
Optimization with
Keras” by Mikko
* Thanks to UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
27. ● What’s Missing / Could be Improved:
○ No Feature Engineering: You can create your own or use kits like featuretools, etc.
○ No Image/Video/NLP Support: At the moment, it removes these features from model considerations
○ No Time Series modeling: Auto_TimeSeries is in the works. Stay Tuned.
○ No Neural Networks or Deep Learning: You can add your own modules or use tools like Ludwig
○ Model serving: Adding a module for test data transformation necessary
Slide 27
Next Steps for AutoViz and Auto_ViML...
● What’s Missing / Could be Improved:
○ Build it into Existing Tools such that structured data can be Visualized Fast!
○ Build it into Educational tools to make it easy for Students and Colleges (where small, structured
datasets are the Norm) to help Visualize data (as writing code is still very hard for Students)
○ Add additional Visualizations such as Pie Charts, Mosaic Charts, etc.
○ Build it into Industrial Instruments such as IoT tools so that large data sets can be visualized
Auto_ViML
AutoViz
28. AutoViz + Auto_ViML = POWERFUL INSIGHTS
Slide 28
LOAD DATA SET
RUN AUTOVIZ TO VISUALIZE
ENGINEER FEATURES
RUN AUTO_ViML AGAIN
ADD / REMOVE FEATURES
RUN AUTO_ViML
BEST MODEL and FEATURES
SELECTED
BUILD PIPELINE
SERVE MODEL
Not in Scope