General Introduction to
AI/ML/DL/DS
Roopesh Kohad
Artificial Intelligence
● Ability to perform tasks normally requiring human intelligence, such as visual
perception, speech recognition, decision-making, and translation between
languages.
● Ability of a computer program or a machine to think and learn
● Ability to correctly interpret external data, to learn from such data, and to use
those learnings to achieve specific goals and tasks through flexible adaptation
● Ability to mimic human cognition
● A program that can sense, reason, act and adapt
Source: Various sources on the internet including wikipedia
Evolution of Industry
Source: https://blogs.worldbank.org/digital-development/what-korea-s-strategy-manage-implications-artificial-intelligence
Impact of Artificial Intelligence
3 Stages of Artificial Intelligence
Source: AI & Data Preparation – Avoiding ‘Garbage-In, Garbage-Out’
Strong vs Weak AI
Source: Gödel, Consciousness and the Weak vs. Strong AI Debate
Artificial Intelligent Systems that we encounter?
Artificial Intelligent Systems that we encounter?
● Weak or Narrow AI i.e. in narrow field of application
● Examples
○ Recommendation Systems
○ Chatbots
○ Virtual Assistants
○ Robots
● Weak or Narrow AI is what is leading to most of the Automation !!
What is an intelligent System?
● Intelligence is an experience that one gets by interacting with system
● Intelligence is intangible like other attributes - fast, secure, usable, intuitive
● Intelligent system is non-deterministic
● Intelligence is adding value to businesses
● A System appears to be intelligent
● Are Personal Computers or typical programs (Browser etc.) intelligent?
Artificial Intelligence Scope
Source: What's required for a machine to be intelligent
AI vs ML vs DL vs DS
Source: Link
Data Science
● Data Science
● Extract Knowledge or Insights from Data
● Understand and analyze actual phenomena" with data
● Whether the data contains enough information to make predictions
The Scientific Method
Source: Getting Insights Using Data Science Skills and the Scientific Method
Source: Data Science is Multidisciplinary
Data Science Venn Diagram
Source: The Data Science Venn Diagram
Activities in Data Science
● Data Exploration & Preparation
○ Collection & loading
● Data Representation & Transformation
○ Tabular, DataFrame etc.
● Computing with Data
○ Programming
● Data Modelling
○ Predictive Modeling
● Data Visualization and Presentation
○ Charting, graphs etc
● Science about Data Science
○ What works, What doesn’t works
What are different types of Data?
● Structured Data
○ Relational Databases
● Semi-structured Data
○ NoSQL Databases
○ XML, JSON
● Unstructured Data
○ Image, audio, text
What is Dataset?
● A collection of related sets of information that is composed of separate
elements but can be manipulated as a unit by a computer.
● Popular datasets
○ Iris Flower Data Set
○ MNIST handwritten digits database
○ Kaggle Datasets
○ Data.World
● Other datasets
○ Open Government Data (OGD) platform of India
● How do we obtain Data if not made available as Datasets?
○ Access to Database
○ APIs
○ Web Scraping
Web Scraping
● Crawling the web to extract information
● In the absence of Database, API access
● Python Frameworks
○ Apache Nutch
○ Scrapy
○ BeautifulSoup
○ Selenium!!!
● Scriptless
○ import.io
Jupyter Notebook
● Jupyter
● Open-source web application
● REPL programming environment
● Create and share documents that contain
○ Live Code
○ Equations
○ Visualizations
○ Narrative text
● Uses include
○ Data cleaning and transformation
○ Numerical simulation
○ Statistical modeling
○ Data visualization
○ Machine learning
Python ML Ecosystem
NumPy
Pandas
DataFrame in Spreadsheet
Matplotlib
NumFOCUS & PyData
● NumFOCUS is a nonprofit supporting open source scientific computing.
● PyData is our flagship educational program
● Projects include Jupyter, pandas, NumPy, Matplotlib
● PyData
○ A community for developers and users of open source data tools
○ They have a Meetup in PUNE
SciPy Lectures
● One document to learn numerics, science, and data with Python
● SciPy lectures which gives end-to-end introduction to all SciPy libs.
● SciPy Lectures
Analytics
Where does Data come into picture?
Source: Machine Learning for Dummies: Part 1
Machine Learning
● Ability to learn without being explicitly programmed
● A computer program is said to learn from experience 'E', with respect to some
class of tasks 'T' and performance measure 'P' if its performance at tasks in
'T' as measured by 'P' improves with experience ‘E’
● Machine Learning an approach to achieve Artificial Intelligence
● Machine Learning is an algorithm that can learn from data without relying on
conventional programming
● Machine Learning is a field of computer science that gives computers the
ability to learn without being explicitly programmed
● Machine learning is more like Data Mining and statistics
Machine Learning Types
Source: What is machine learning?
Source: Machine Learning Types #2
Source: Regression or Classification? Linear or Logistics?
Workflow of Machine Learning Project
Source: A Tool To Build Future For Non Experienced Candidates: Machine Learning
Steps to build Machine Learning System
Source: Building a Machine Learning Model from A-Z
Data Preparation
Steps:
● Query Data
● Clean Data
○ Deal with missing values
○ Remove outliers
● Format Data
More like an ETL step!!
Feature Engineering
“Process of transforming raw data into features that better represent the
underlying problem to the predictive models, resulting in improved model accuracy
on unseen data.”
Steps:
● Brainstorm features
● Create features
● Check how the features work with the model
● Start again from first until the features work perfectly
Data Modeling
Performance Measure - Metrics
Mathematical / Statistical way of measuring performance of ML Model
● Classification Accuracy
● Logarithmic Loss
● Confusion Matrix
● Area under Curve
● F1 Score
● Mean Absolute Error
● Mean Squared Error
Performance Measure - Other Approaches
● Testing by End User or Crowd testing
○ Test with real users
● Equivalence classes or ranges of output or tolerance
○ Assert (somewhat expected ~ actual)
● Ranking of output
○ Instead of Pass/Fail, rank outputs
● Comparison Test
○ Compare with a competing system
Machine Learning Algorithms
Housing Price prediction
● Predict Sale Price of a House based on attributes
● Test Data
Linear Regression
Linear Regression in one variable
Linear Regression in one variable
Linear Regression in one variable
Polynomial Regression
Logistics Regression
Scikit-Learn
● https://scikit-learn.org
● Free machine learning library for the Python programming language
● Features various classification, regression and clustering algorithms
● Examples:
○ Linear Regression
○ Support Vector Machines (SVM)
○ Random forests
○ Gradient boosting
○ K-means
● Interoperate with the Python numerical and scientific libraries NumPy and
SciPy.
Sample Code
Logistics Regression
K-Nearest Neighbours
Model vs Algorithm
● Model is what you get when you run the Algorithm over your training data
and what you use to make predictions on new data.
● A Model is a Function which takes inputs and gives an output (prediction)
● You can generate a new Model with the same Algorithm but with different
data, OR
● You can get a new Model from the same data but with a different Algorithm
or different hyperparameter of same Algorithm
● Model is unique to your project and deployed to make predictions.
Model Deployment
● A model or “predictor” or “classifier” is a piece of code/function which runs and
gives output. It could be a,
○ Python module
○ Containerized Docker image
○ A Serverless Function
● How do deploy a simple ML model on your own?
○ As a RESTful API
○ Using Pickle library and then hosting on a Flask webserver.
Choosing right Machine Learning Model
What kind of problems ML can solve?
● Problems which could be solved in <1 sec
○ Eg. identify picture
● Problems which require experience
○ A doctor is able to see X-Ray and tell diagnose
○ Hiring shortlisting
● Problems which ML cannot solve?
○ Solving mathematical equations
○ Writing prose
Machine Learning Data Science
Collect Data → Train Model → Deploy to
start getting predictions or classifications
Collect Data → Analyze →
Hypotheses/Actions/Suggestions
Output is a Software Output is a slide deck of
recommendations
Could be OUTSOURCED
(s/w development)
Better INHOUSE
(tied to business)
Engineering Discipline Multidisciplinary
Make a model which makes good prediction
because we have labeled train/test sets.
Ask questions
Design Experiments
Why
What
What can we do to change the
outcome?
Data Science vs Machine Learning
Universal Approximation Theorem
A feedforward network with a single layer is sufficient to represent any function,
but the layer may be unfeasibly large and may fail to learn and generalize
correctly.
— Ian Goodfellow
Deep Learning Neural Network
Types of Neural Networks
● CNN (Convolution Neural Network)
● RNN (Recurrent Neural Network)
● LSTM (Long Short Term Memory)
● GAN (Generative Adversarial Network)
Convolution Neural Network - ConvNet
● ConvNet takes an image and differentiates one from another
● Analogous to connectivity pattern of Neurons in the Human Brain
● Inspired by the organization of the Visual Cortex
● Captures Spatial and Temporal dependencies
● Convolution Layer to extract high level features
○ A kernel filter NxN matrix scans entire MxM image
● Pooling layer to reduce dimension of convoluted features
● Convolution and Pooling phase are “Feature Extraction” phase
● Flatten the final output and feed it to a regular Neural Network for
classification purposes.
ConvNet - Convolution Phase
ConvNet - Pooling Phase
Digit Recognizer ConvNet
ML vs DL
Source: ML vs DL
Deep Learning Frameworks
What is Tensorflow?
● Tensorflow is an open source library to help you develop and train ML models
● Tensorflow playground
● Tensorflow Tutorial
What is Keras?
● Keras in a high-level API to develop Neural Networks
● Capable of running on top of TensorFlow, CNTK, or Theano.
● Keras Getting Started
Cloud Platforms
● AWS
● GCP
● AZURE
AWS AI Services
AWS ML/DL Services
● Sage Maker
○ Build → Train → Deploy Machine Learning Models
● Deep Learning AMIs
AWS AI Learning & Certification
● Training: ML Training
● Certification: Machine Learning Speciality
Microsoft Azure AI Platform
Microsoft Azure - Learning
● Microsoft Professional Program - AI
○ Now being retired
● Microsoft Learn
○ Search via Role/Product
● AI School
○ Dedicated AI academy
● Microsoft has tied up with edX
Google AI Platform
Google AI Training & Certification
● Google has tied up with Coursera for their training
● Training - Data & Machine Learning path
● Machine Learning with TensorFlow on Google Cloud Platform Specialization
● Certification - Data & Machine Learning
Kaggle
● Online community of data scientists and machine learners, owned by Google
● Datasets
● Notebooks
● Competitions
Resources
1. CGP Grey: How Machines Learn
2. 3Blue1Brown: Neural Networks
3. nVIDIA: What’s the Difference Between Artificial Intelligence, Machine
Learning, and Deep Learning?
4. State of AI
5. DataMeet is a community of Data Science and Open Data enthusiasts from
India.
6. A visual introduction to Probability & Statistics
Roles
● Data Scientist
○ Examine Data and provide Insights
○ Make presentation to Team / Executive
○ Storytelling
● Machine Learning Engineer
○ Build, Train, Test & Improve ML/DL models
● Data Engineer
○ Organize Data
○ Make sure data is stored in easily accessible, secure and cost-effective way
Where to start?
● Try hands with Jupyter Notebook, try hands using SciPy stack
● Take part in some Kaggle contests
● Look into your projects and see if they are candidates for ML/DL?
General introduction to AI ML DL DS

General introduction to AI ML DL DS

  • 1.
  • 2.
    Artificial Intelligence ● Abilityto perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. ● Ability of a computer program or a machine to think and learn ● Ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation ● Ability to mimic human cognition ● A program that can sense, reason, act and adapt Source: Various sources on the internet including wikipedia
  • 3.
    Evolution of Industry Source:https://blogs.worldbank.org/digital-development/what-korea-s-strategy-manage-implications-artificial-intelligence
  • 4.
  • 5.
    3 Stages ofArtificial Intelligence Source: AI & Data Preparation – Avoiding ‘Garbage-In, Garbage-Out’
  • 6.
    Strong vs WeakAI Source: Gödel, Consciousness and the Weak vs. Strong AI Debate
  • 7.
  • 8.
    Artificial Intelligent Systemsthat we encounter? ● Weak or Narrow AI i.e. in narrow field of application ● Examples ○ Recommendation Systems ○ Chatbots ○ Virtual Assistants ○ Robots ● Weak or Narrow AI is what is leading to most of the Automation !!
  • 9.
    What is anintelligent System? ● Intelligence is an experience that one gets by interacting with system ● Intelligence is intangible like other attributes - fast, secure, usable, intuitive ● Intelligent system is non-deterministic ● Intelligence is adding value to businesses ● A System appears to be intelligent ● Are Personal Computers or typical programs (Browser etc.) intelligent?
  • 10.
    Artificial Intelligence Scope Source:What's required for a machine to be intelligent
  • 11.
    AI vs MLvs DL vs DS Source: Link
  • 12.
    Data Science ● DataScience ● Extract Knowledge or Insights from Data ● Understand and analyze actual phenomena" with data ● Whether the data contains enough information to make predictions
  • 13.
    The Scientific Method Source:Getting Insights Using Data Science Skills and the Scientific Method
  • 14.
    Source: Data Scienceis Multidisciplinary
  • 15.
    Data Science VennDiagram Source: The Data Science Venn Diagram
  • 16.
    Activities in DataScience ● Data Exploration & Preparation ○ Collection & loading ● Data Representation & Transformation ○ Tabular, DataFrame etc. ● Computing with Data ○ Programming ● Data Modelling ○ Predictive Modeling ● Data Visualization and Presentation ○ Charting, graphs etc ● Science about Data Science ○ What works, What doesn’t works
  • 17.
    What are differenttypes of Data? ● Structured Data ○ Relational Databases ● Semi-structured Data ○ NoSQL Databases ○ XML, JSON ● Unstructured Data ○ Image, audio, text
  • 18.
    What is Dataset? ●A collection of related sets of information that is composed of separate elements but can be manipulated as a unit by a computer. ● Popular datasets ○ Iris Flower Data Set ○ MNIST handwritten digits database ○ Kaggle Datasets ○ Data.World ● Other datasets ○ Open Government Data (OGD) platform of India ● How do we obtain Data if not made available as Datasets? ○ Access to Database ○ APIs ○ Web Scraping
  • 19.
    Web Scraping ● Crawlingthe web to extract information ● In the absence of Database, API access ● Python Frameworks ○ Apache Nutch ○ Scrapy ○ BeautifulSoup ○ Selenium!!! ● Scriptless ○ import.io
  • 20.
    Jupyter Notebook ● Jupyter ●Open-source web application ● REPL programming environment ● Create and share documents that contain ○ Live Code ○ Equations ○ Visualizations ○ Narrative text ● Uses include ○ Data cleaning and transformation ○ Numerical simulation ○ Statistical modeling ○ Data visualization ○ Machine learning
  • 21.
  • 22.
  • 23.
  • 24.
  • 26.
  • 27.
    NumFOCUS & PyData ●NumFOCUS is a nonprofit supporting open source scientific computing. ● PyData is our flagship educational program ● Projects include Jupyter, pandas, NumPy, Matplotlib ● PyData ○ A community for developers and users of open source data tools ○ They have a Meetup in PUNE
  • 28.
    SciPy Lectures ● Onedocument to learn numerics, science, and data with Python ● SciPy lectures which gives end-to-end introduction to all SciPy libs. ● SciPy Lectures
  • 29.
  • 30.
    Where does Datacome into picture?
  • 31.
    Source: Machine Learningfor Dummies: Part 1
  • 32.
    Machine Learning ● Abilityto learn without being explicitly programmed ● A computer program is said to learn from experience 'E', with respect to some class of tasks 'T' and performance measure 'P' if its performance at tasks in 'T' as measured by 'P' improves with experience ‘E’ ● Machine Learning an approach to achieve Artificial Intelligence ● Machine Learning is an algorithm that can learn from data without relying on conventional programming ● Machine Learning is a field of computer science that gives computers the ability to learn without being explicitly programmed ● Machine learning is more like Data Mining and statistics
  • 33.
    Machine Learning Types Source:What is machine learning?
  • 34.
  • 35.
    Source: Regression orClassification? Linear or Logistics?
  • 36.
    Workflow of MachineLearning Project Source: A Tool To Build Future For Non Experienced Candidates: Machine Learning
  • 37.
    Steps to buildMachine Learning System Source: Building a Machine Learning Model from A-Z
  • 38.
    Data Preparation Steps: ● QueryData ● Clean Data ○ Deal with missing values ○ Remove outliers ● Format Data More like an ETL step!!
  • 39.
    Feature Engineering “Process oftransforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.” Steps: ● Brainstorm features ● Create features ● Check how the features work with the model ● Start again from first until the features work perfectly
  • 40.
  • 41.
    Performance Measure -Metrics Mathematical / Statistical way of measuring performance of ML Model ● Classification Accuracy ● Logarithmic Loss ● Confusion Matrix ● Area under Curve ● F1 Score ● Mean Absolute Error ● Mean Squared Error
  • 42.
    Performance Measure -Other Approaches ● Testing by End User or Crowd testing ○ Test with real users ● Equivalence classes or ranges of output or tolerance ○ Assert (somewhat expected ~ actual) ● Ranking of output ○ Instead of Pass/Fail, rank outputs ● Comparison Test ○ Compare with a competing system
  • 43.
  • 44.
    Housing Price prediction ●Predict Sale Price of a House based on attributes ● Test Data
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
    Scikit-Learn ● https://scikit-learn.org ● Freemachine learning library for the Python programming language ● Features various classification, regression and clustering algorithms ● Examples: ○ Linear Regression ○ Support Vector Machines (SVM) ○ Random forests ○ Gradient boosting ○ K-means ● Interoperate with the Python numerical and scientific libraries NumPy and SciPy.
  • 52.
  • 53.
    Model vs Algorithm ●Model is what you get when you run the Algorithm over your training data and what you use to make predictions on new data. ● A Model is a Function which takes inputs and gives an output (prediction) ● You can generate a new Model with the same Algorithm but with different data, OR ● You can get a new Model from the same data but with a different Algorithm or different hyperparameter of same Algorithm ● Model is unique to your project and deployed to make predictions.
  • 54.
    Model Deployment ● Amodel or “predictor” or “classifier” is a piece of code/function which runs and gives output. It could be a, ○ Python module ○ Containerized Docker image ○ A Serverless Function ● How do deploy a simple ML model on your own? ○ As a RESTful API ○ Using Pickle library and then hosting on a Flask webserver.
  • 55.
    Choosing right MachineLearning Model
  • 57.
    What kind ofproblems ML can solve? ● Problems which could be solved in <1 sec ○ Eg. identify picture ● Problems which require experience ○ A doctor is able to see X-Ray and tell diagnose ○ Hiring shortlisting ● Problems which ML cannot solve? ○ Solving mathematical equations ○ Writing prose
  • 58.
    Machine Learning DataScience Collect Data → Train Model → Deploy to start getting predictions or classifications Collect Data → Analyze → Hypotheses/Actions/Suggestions Output is a Software Output is a slide deck of recommendations Could be OUTSOURCED (s/w development) Better INHOUSE (tied to business) Engineering Discipline Multidisciplinary Make a model which makes good prediction because we have labeled train/test sets. Ask questions Design Experiments Why What What can we do to change the outcome? Data Science vs Machine Learning
  • 60.
    Universal Approximation Theorem Afeedforward network with a single layer is sufficient to represent any function, but the layer may be unfeasibly large and may fail to learn and generalize correctly. — Ian Goodfellow
  • 61.
  • 62.
    Types of NeuralNetworks ● CNN (Convolution Neural Network) ● RNN (Recurrent Neural Network) ● LSTM (Long Short Term Memory) ● GAN (Generative Adversarial Network)
  • 63.
    Convolution Neural Network- ConvNet ● ConvNet takes an image and differentiates one from another ● Analogous to connectivity pattern of Neurons in the Human Brain ● Inspired by the organization of the Visual Cortex ● Captures Spatial and Temporal dependencies ● Convolution Layer to extract high level features ○ A kernel filter NxN matrix scans entire MxM image ● Pooling layer to reduce dimension of convoluted features ● Convolution and Pooling phase are “Feature Extraction” phase ● Flatten the final output and feed it to a regular Neural Network for classification purposes.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
    What is Tensorflow? ●Tensorflow is an open source library to help you develop and train ML models ● Tensorflow playground ● Tensorflow Tutorial
  • 70.
    What is Keras? ●Keras in a high-level API to develop Neural Networks ● Capable of running on top of TensorFlow, CNTK, or Theano. ● Keras Getting Started
  • 71.
  • 72.
  • 73.
    AWS ML/DL Services ●Sage Maker ○ Build → Train → Deploy Machine Learning Models ● Deep Learning AMIs
  • 74.
    AWS AI Learning& Certification ● Training: ML Training ● Certification: Machine Learning Speciality
  • 75.
  • 76.
    Microsoft Azure -Learning ● Microsoft Professional Program - AI ○ Now being retired ● Microsoft Learn ○ Search via Role/Product ● AI School ○ Dedicated AI academy ● Microsoft has tied up with edX
  • 77.
  • 78.
    Google AI Training& Certification ● Google has tied up with Coursera for their training ● Training - Data & Machine Learning path ● Machine Learning with TensorFlow on Google Cloud Platform Specialization ● Certification - Data & Machine Learning
  • 79.
    Kaggle ● Online communityof data scientists and machine learners, owned by Google ● Datasets ● Notebooks ● Competitions
  • 80.
    Resources 1. CGP Grey:How Machines Learn 2. 3Blue1Brown: Neural Networks 3. nVIDIA: What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning? 4. State of AI 5. DataMeet is a community of Data Science and Open Data enthusiasts from India. 6. A visual introduction to Probability & Statistics
  • 81.
    Roles ● Data Scientist ○Examine Data and provide Insights ○ Make presentation to Team / Executive ○ Storytelling ● Machine Learning Engineer ○ Build, Train, Test & Improve ML/DL models ● Data Engineer ○ Organize Data ○ Make sure data is stored in easily accessible, secure and cost-effective way
  • 82.
    Where to start? ●Try hands with Jupyter Notebook, try hands using SciPy stack ● Take part in some Kaggle contests ● Look into your projects and see if they are candidates for ML/DL?