2. Artificial Intelligence
● Ability to perform tasks normally requiring human intelligence, such as visual
perception, speech recognition, decision-making, and translation between
languages.
● Ability of a computer program or a machine to think and learn
● Ability to correctly interpret external data, to learn from such data, and to use
those learnings to achieve specific goals and tasks through flexible adaptation
● Ability to mimic human cognition
● A program that can sense, reason, act and adapt
Source: Various sources on the internet including wikipedia
3. Evolution of Industry
Source: https://blogs.worldbank.org/digital-development/what-korea-s-strategy-manage-implications-artificial-intelligence
8. Artificial Intelligent Systems that we encounter?
● Weak or Narrow AI i.e. in narrow field of application
● Examples
○ Recommendation Systems
○ Chatbots
○ Virtual Assistants
○ Robots
● Weak or Narrow AI is what is leading to most of the Automation !!
9. What is an intelligent System?
● Intelligence is an experience that one gets by interacting with system
● Intelligence is intangible like other attributes - fast, secure, usable, intuitive
● Intelligent system is non-deterministic
● Intelligence is adding value to businesses
● A System appears to be intelligent
● Are Personal Computers or typical programs (Browser etc.) intelligent?
12. Data Science
● Data Science
● Extract Knowledge or Insights from Data
● Understand and analyze actual phenomena" with data
● Whether the data contains enough information to make predictions
16. Activities in Data Science
● Data Exploration & Preparation
○ Collection & loading
● Data Representation & Transformation
○ Tabular, DataFrame etc.
● Computing with Data
○ Programming
● Data Modelling
○ Predictive Modeling
● Data Visualization and Presentation
○ Charting, graphs etc
● Science about Data Science
○ What works, What doesn’t works
17. What are different types of Data?
● Structured Data
○ Relational Databases
● Semi-structured Data
○ NoSQL Databases
○ XML, JSON
● Unstructured Data
○ Image, audio, text
18. What is Dataset?
● A collection of related sets of information that is composed of separate
elements but can be manipulated as a unit by a computer.
● Popular datasets
○ Iris Flower Data Set
○ MNIST handwritten digits database
○ Kaggle Datasets
○ Data.World
● Other datasets
○ Open Government Data (OGD) platform of India
● How do we obtain Data if not made available as Datasets?
○ Access to Database
○ APIs
○ Web Scraping
19. Web Scraping
● Crawling the web to extract information
● In the absence of Database, API access
● Python Frameworks
○ Apache Nutch
○ Scrapy
○ BeautifulSoup
○ Selenium!!!
● Scriptless
○ import.io
20. Jupyter Notebook
● Jupyter
● Open-source web application
● REPL programming environment
● Create and share documents that contain
○ Live Code
○ Equations
○ Visualizations
○ Narrative text
● Uses include
○ Data cleaning and transformation
○ Numerical simulation
○ Statistical modeling
○ Data visualization
○ Machine learning
27. NumFOCUS & PyData
● NumFOCUS is a nonprofit supporting open source scientific computing.
● PyData is our flagship educational program
● Projects include Jupyter, pandas, NumPy, Matplotlib
● PyData
○ A community for developers and users of open source data tools
○ They have a Meetup in PUNE
28. SciPy Lectures
● One document to learn numerics, science, and data with Python
● SciPy lectures which gives end-to-end introduction to all SciPy libs.
● SciPy Lectures
32. Machine Learning
● Ability to learn without being explicitly programmed
● A computer program is said to learn from experience 'E', with respect to some
class of tasks 'T' and performance measure 'P' if its performance at tasks in
'T' as measured by 'P' improves with experience ‘E’
● Machine Learning an approach to achieve Artificial Intelligence
● Machine Learning is an algorithm that can learn from data without relying on
conventional programming
● Machine Learning is a field of computer science that gives computers the
ability to learn without being explicitly programmed
● Machine learning is more like Data Mining and statistics
36. Workflow of Machine Learning Project
Source: A Tool To Build Future For Non Experienced Candidates: Machine Learning
37. Steps to build Machine Learning System
Source: Building a Machine Learning Model from A-Z
38. Data Preparation
Steps:
● Query Data
● Clean Data
○ Deal with missing values
○ Remove outliers
● Format Data
More like an ETL step!!
39. Feature Engineering
“Process of transforming raw data into features that better represent the
underlying problem to the predictive models, resulting in improved model accuracy
on unseen data.”
Steps:
● Brainstorm features
● Create features
● Check how the features work with the model
● Start again from first until the features work perfectly
41. Performance Measure - Metrics
Mathematical / Statistical way of measuring performance of ML Model
● Classification Accuracy
● Logarithmic Loss
● Confusion Matrix
● Area under Curve
● F1 Score
● Mean Absolute Error
● Mean Squared Error
42. Performance Measure - Other Approaches
● Testing by End User or Crowd testing
○ Test with real users
● Equivalence classes or ranges of output or tolerance
○ Assert (somewhat expected ~ actual)
● Ranking of output
○ Instead of Pass/Fail, rank outputs
● Comparison Test
○ Compare with a competing system
51. Scikit-Learn
● https://scikit-learn.org
● Free machine learning library for the Python programming language
● Features various classification, regression and clustering algorithms
● Examples:
○ Linear Regression
○ Support Vector Machines (SVM)
○ Random forests
○ Gradient boosting
○ K-means
● Interoperate with the Python numerical and scientific libraries NumPy and
SciPy.
53. Model vs Algorithm
● Model is what you get when you run the Algorithm over your training data
and what you use to make predictions on new data.
● A Model is a Function which takes inputs and gives an output (prediction)
● You can generate a new Model with the same Algorithm but with different
data, OR
● You can get a new Model from the same data but with a different Algorithm
or different hyperparameter of same Algorithm
● Model is unique to your project and deployed to make predictions.
54. Model Deployment
● A model or “predictor” or “classifier” is a piece of code/function which runs and
gives output. It could be a,
○ Python module
○ Containerized Docker image
○ A Serverless Function
● How do deploy a simple ML model on your own?
○ As a RESTful API
○ Using Pickle library and then hosting on a Flask webserver.
57. What kind of problems ML can solve?
● Problems which could be solved in <1 sec
○ Eg. identify picture
● Problems which require experience
○ A doctor is able to see X-Ray and tell diagnose
○ Hiring shortlisting
● Problems which ML cannot solve?
○ Solving mathematical equations
○ Writing prose
58. Machine Learning Data Science
Collect Data → Train Model → Deploy to
start getting predictions or classifications
Collect Data → Analyze →
Hypotheses/Actions/Suggestions
Output is a Software Output is a slide deck of
recommendations
Could be OUTSOURCED
(s/w development)
Better INHOUSE
(tied to business)
Engineering Discipline Multidisciplinary
Make a model which makes good prediction
because we have labeled train/test sets.
Ask questions
Design Experiments
Why
What
What can we do to change the
outcome?
Data Science vs Machine Learning
59.
60. Universal Approximation Theorem
A feedforward network with a single layer is sufficient to represent any function,
but the layer may be unfeasibly large and may fail to learn and generalize
correctly.
— Ian Goodfellow
62. Types of Neural Networks
● CNN (Convolution Neural Network)
● RNN (Recurrent Neural Network)
● LSTM (Long Short Term Memory)
● GAN (Generative Adversarial Network)
63. Convolution Neural Network - ConvNet
● ConvNet takes an image and differentiates one from another
● Analogous to connectivity pattern of Neurons in the Human Brain
● Inspired by the organization of the Visual Cortex
● Captures Spatial and Temporal dependencies
● Convolution Layer to extract high level features
○ A kernel filter NxN matrix scans entire MxM image
● Pooling layer to reduce dimension of convoluted features
● Convolution and Pooling phase are “Feature Extraction” phase
● Flatten the final output and feed it to a regular Neural Network for
classification purposes.
69. What is Tensorflow?
● Tensorflow is an open source library to help you develop and train ML models
● Tensorflow playground
● Tensorflow Tutorial
70. What is Keras?
● Keras in a high-level API to develop Neural Networks
● Capable of running on top of TensorFlow, CNTK, or Theano.
● Keras Getting Started
76. Microsoft Azure - Learning
● Microsoft Professional Program - AI
○ Now being retired
● Microsoft Learn
○ Search via Role/Product
● AI School
○ Dedicated AI academy
● Microsoft has tied up with edX
78. Google AI Training & Certification
● Google has tied up with Coursera for their training
● Training - Data & Machine Learning path
● Machine Learning with TensorFlow on Google Cloud Platform Specialization
● Certification - Data & Machine Learning
79. Kaggle
● Online community of data scientists and machine learners, owned by Google
● Datasets
● Notebooks
● Competitions
80. Resources
1. CGP Grey: How Machines Learn
2. 3Blue1Brown: Neural Networks
3. nVIDIA: What’s the Difference Between Artificial Intelligence, Machine
Learning, and Deep Learning?
4. State of AI
5. DataMeet is a community of Data Science and Open Data enthusiasts from
India.
6. A visual introduction to Probability & Statistics
81. Roles
● Data Scientist
○ Examine Data and provide Insights
○ Make presentation to Team / Executive
○ Storytelling
● Machine Learning Engineer
○ Build, Train, Test & Improve ML/DL models
● Data Engineer
○ Organize Data
○ Make sure data is stored in easily accessible, secure and cost-effective way
82. Where to start?
● Try hands with Jupyter Notebook, try hands using SciPy stack
● Take part in some Kaggle contests
● Look into your projects and see if they are candidates for ML/DL?