Machine Learning
–
A Shallow dive
Gopi Krishna Nuti
Lead Data Scientist,Autodesk
Vice President, MUST Research
ngopikrishna.public@gmail.com
gopi.nuti@autodesk.com
The Digital Industry Forces
Social Media Mobility Analytics Cloud Robotics
Automation IoT
How did Data Science Start?
• Statistics describes past and present
• A necessity to predict future based on the knowledge of the past and present
• Mathematically verifiable decision making as opposed to “hunch” and “gut feel”
• Result is – Applied Statistics.
• Combination of factors
• Advent of high performance computers
• Exponential rise of digital data
• Artificial Intelligence and Data mining techniques
• Combined with marketing savvy, this became Machine Learning and Data Science.
What is Artificial Intelligence?
• Unfortunately, there is no universally accepted definition.
• A general description : A study of how to make computers do things which,
at the moment, people do better.
MundaneTasks
• Perception
• Vision, Speech
• Natural Language
• Understanding
• Generation
• Translation
FormalTasks
• Games
• Chess
• Checkers
• Backgammon
• Mathematics
• Geometry
• Logic
• Calculus
ExpertTasks
• Engineering
• ScientificAnalysis
• Financial Analysis
Easy for Humans Easy for Computers
So,What’s the difference?
• Are you concerned with Decision Making? – Artificial Intelligence
• Are you only predicting future or describing the present? – Machine
Learning
• Are you doing machine learning in a way that emulates human mind?
– Deep Learning
What is Analytics
• Artificial Intelligence
• Machine Learning
• Data Science
• Image/VideoAnalytics
• Speech Analytics
• Natural Language Processing
• Statistics
• Big Data
• Big DataAnalytics
Artificial Intelligence
Techniques to enable a computer to mimic human intelligence
Machine Learning
Using Algorithms to learn from and make predictions about data without
having to explicitly code for it
Deep Learning
Emulate the learning approach of human beings to gain certain types of
knowledge
Machine
Learning
Machine Learning
Data Analytics
Descriptive Predictive Prescriptive
Image/Video
Analytics
Speech
Analytics
Natural
Language
Processing
Data AnalyticsVs Statistics (Data Science)
Image courtesy of Datascientistinsights.com
Data Science Data Analytics
Mathematics of
explaining population
relationships based on
samples.
• Extracting valuable
information out of
data
• Predict values for
new data
Scarcity of Data Abundance of Data
Hypothesis comes first Data comes first
Macro Decisioning Micro Decisioning
Classical ProgrammingVs Machine Learning
Rules Data Answers
Answers Data Rules
Classical
Programming
Machine
Learning
Classical Programming
StartTeaching
Wait for a
reasonable
time
Is lesson
over
Continue Class
Are students
interested in
listening
Tell a joke
Run away
Machine Learning
• Start with Historical Data
• Formulate the problem as a
mathematical equation.
• Feed data and equation to the
machine and let it come up
with values.
S.
No
Time
elapsed
Interest
Level
ActionTaken Resultant Interest
Level
1 0 High Start the class High
2 0 High Tell a joke High
3 0 Medium Shout on students Low
4 0 Medium Tell a joke High
5
..
.. 15 Low Tell a joke Medium
.. 15 Medium Continue the class Low
.. 15 High Scold the students Low
.. 15 Low Tell a joke Low
..
..
..
.. 60 Low Continue the class Low
.. 60 Low Run away High
Machine Learning – Mathematical Formulation
• y = f(x)
• Action to take = f(Time Elapsed, Historical Interest levels, actions taken,
resultant interest level)
• y – DependentVariable
• x - IndependentVariables
Big DataVs Analytics
Roles &
Activities
Data Scientist
Data Engineer
DBA
Performance Engineer
Hardware Developer
Machine Learning Pipeline
Image courtesyWestern Digital
Data Analytics – Basic
concepts
Gopi Krishna Nuti
Data Analytics
• InformationTypes
• Structured Data
• Semi structured Data
• Unstructured Data
• Examples
What to see when you see data
S. No Character Id Name Creator of the
character
Year of First
publication
Number of Films
made (until 2019
Dec)
1 1123 Superman Jerry Siegel 1938 11
2 7856 Ironman Stan Lee 1963 8
3 3614 Captain America Stan Lee 1941 7
4 1578 Albus Dumbledore JK Rowling 1997 9
5 15725 Chacha Chowdary Pran 1971 0
6 007 James Bond Ian Fleming 1953 27
Levels of
Data Nominal
•Algebraic operations are not
possible
Ordinal
•Logical operations are possible
but not mathematical
operations . Ex: Account
Number
Interval
•Addition/Subtraction is
possible but not
multiplication/division
•Interval between two
continuous elements is always
same and meaningful
•Zero is arbitrary
•Ex: Temperature
Ratio:
•Zero makes sense and negative
values are not possible
•Mean, Median, Mode etc can
be calculated
•Account Balance
Data Analytics – Machine LearningTypes
Supervised
Learning
Unsupervised
Learning
Predictive
Analytics
Descriptive
Analytics
Learning modes
• Regression
• Linear Regression, Polynomial Regression, SupportVector Regression, DecisionTree,
Random Forest,
• Classification
• Logistic Regression, k-Nearest Neighbours, SupportVector Machines, Naïve Bayes,
DecisionTree, Random Forest
• Predict outcome for new data
Supervised Learning - Predictive
• Clustering
• k-Means, Hierarchical Clustering
• Affinity Analysis
• Market Basket Analysis
• Association Rule Mining
Unsupervised Learning - Descriptive
RegressionVs Classification
Regression Classification
Dependent
variable
Continuous Categorical
Purpose Predict output value using
training data
Group the output to a class
Output level Ratio or Interval Ordinal or Nominal
ClusteringVs Affinity Analysis
Clustering Affinity Analysis
Purpose Identify similarities across
rows of a table
Identify similarities across
columns of a table
Error in Machine Learning
• Error – an unavoidable mathematical fact in ML
• Is this true?
1
3
∗ 3 = 1
• Error is the difference between predicted value and actual value.
ErrorVs Bug
• Error is not same as bugs.
• Both can’t be completely eliminated.
• A particular bug might be fixed. A particular error might be minimized. But
eliminating as a whole is very difficult.
• But they are still not same
Why does error occur
• Imagine a world where Sir Isaac Newton was never born.
• Today we are building the relationship between Force, Mass and
Acceleration
• Machine Learning formulation F = f(m, a)
• Linear Regression
• 𝑓 = 𝛽0 + 𝛽𝑚𝑚 + 𝛽𝑎𝑎
• Polynomial Regression
• 𝑓 = 𝛽0 + 𝛽𝑚𝑚𝑥 + 𝛽𝑎𝑎𝑦
• Error is ∈ = 𝑓 − 𝑓.This is rarely Zero.
S. No Mass Acceleration Force
1 1 4 4
2 3243 2 6486
3 2 1 2
4 5231 6 31386
5 446 3 1338
Important
Concepts
• Feature Engineering
• Dimensionality Reduction
• Principal Component Analysis
• Training Data,Validation Data,Testing Data
• Outliers and Missing value treatment
• Overfit & Underfit
• Precision & Recall
• Feature Scaling
• Manhattan Distance, Mahalanobis Distance,
Euclidean Distance
Model
performance
comparison
Regression Models
• R-Square and Adjusted R-Square
Classification Models
• True Positives, False Positives,True Negatives, False Negatives –
Confusion Matrix
• Precision, Recall, F1 Score
• Specificity, Sensitivity, ROC, AUC, Gini Index
Deep Learning
Gopi Krishna Nuti
Neural Networks
• Artificial Neural Network
• Perceptron
output= 0 if ∑wjxj≤ threshold
1 if ∑wjxj> threshold
Deep Network
vs
Shallow Network
Image courtesyQuora
Neural
Networks
• Activation Functions
• Gradient Descent & Loss
• Advantages of Neural Networks
• With enough training data, can represent any
function. NAND Gate representation.
• In words of Elon Musk, “It’s quite simple, really”.
• UniversalApproximationTheory
• But why do we need a deep network?
• Disadvantages and work arounds
• GPUs
Approach for
building Models
• p-value
• Feature Selection
• Forward Selection
• Backward Elimination
• Bidirectional Elimination
• Score Comparison
Further references
• EmergingTrends in Artificial Intelligence https://www.slideshare.net/gopikrishnanuti/modern-trends-
in-artificial-intelligence-a-deeper-review
• InferenceTrends in Industry https://www.slideshare.net/gopikrishnanuti/inferene-trends-in-industry
• ComputerVision – Old problems and New Solutions
https://www.slideshare.net/gopikrishnanuti/computer-vision-old-problems-new-solutions
• Classification vis-à-vis Ranking in Machine Learning
https://www.slideshare.net/gopikrishnanuti/classification-vis-avis-ranking-gopi
Further Reading
• A book introducing Machine Learning from basics
through Supervised and Unsupervised learning for
beginners
https://www.amazon.in/Machine-Learning-Engineers-
Gopi-
Krishna/dp/9389024870/ref=sr_1_2?dchild=1&keywor
ds=machine+learning+for+engineers&qid=1616195333&s
r=8-2
MUST Research
MUST Research is dedicated to promote excellence and competence in the field of data science, cognitive
computing, artificial intelligence, machine learning, advanced analytics for the benefit of the mankind - it’s
a must.
Our vision is to build an ecosystem that enables interaction between academia and enterprise, help them in
resolving problems and make them aware of the latest developments in the cognitive era to provide
solutions, guidance or training, organize lectures, seminars and workshops, collaborate on scientific
programs and societal missions.
•India’s largest AI community with 500+ data scientists
•Award winning robots – Softie built in collaboration with
Microsoft®
https://www.youtube.com/watch?v=jQ8Gq2HWxiA
•Multiple demonstrations of our robots MUSTie and MUSTani
https://www.youtube.com/watch?v=AewM3TsjoBk
•Letter of appreciation from Govt of Telangana for our
contributions

Ml - A shallow dive

  • 1.
    Machine Learning – A Shallowdive Gopi Krishna Nuti Lead Data Scientist,Autodesk Vice President, MUST Research ngopikrishna.public@gmail.com gopi.nuti@autodesk.com
  • 2.
    The Digital IndustryForces Social Media Mobility Analytics Cloud Robotics Automation IoT
  • 3.
    How did DataScience Start? • Statistics describes past and present • A necessity to predict future based on the knowledge of the past and present • Mathematically verifiable decision making as opposed to “hunch” and “gut feel” • Result is – Applied Statistics. • Combination of factors • Advent of high performance computers • Exponential rise of digital data • Artificial Intelligence and Data mining techniques • Combined with marketing savvy, this became Machine Learning and Data Science.
  • 4.
    What is ArtificialIntelligence? • Unfortunately, there is no universally accepted definition. • A general description : A study of how to make computers do things which, at the moment, people do better. MundaneTasks • Perception • Vision, Speech • Natural Language • Understanding • Generation • Translation FormalTasks • Games • Chess • Checkers • Backgammon • Mathematics • Geometry • Logic • Calculus ExpertTasks • Engineering • ScientificAnalysis • Financial Analysis Easy for Humans Easy for Computers
  • 5.
    So,What’s the difference? •Are you concerned with Decision Making? – Artificial Intelligence • Are you only predicting future or describing the present? – Machine Learning • Are you doing machine learning in a way that emulates human mind? – Deep Learning
  • 6.
    What is Analytics •Artificial Intelligence • Machine Learning • Data Science • Image/VideoAnalytics • Speech Analytics • Natural Language Processing • Statistics • Big Data • Big DataAnalytics Artificial Intelligence Techniques to enable a computer to mimic human intelligence Machine Learning Using Algorithms to learn from and make predictions about data without having to explicitly code for it Deep Learning Emulate the learning approach of human beings to gain certain types of knowledge
  • 7.
    Machine Learning Machine Learning Data Analytics DescriptivePredictive Prescriptive Image/Video Analytics Speech Analytics Natural Language Processing
  • 8.
    Data AnalyticsVs Statistics(Data Science) Image courtesy of Datascientistinsights.com Data Science Data Analytics Mathematics of explaining population relationships based on samples. • Extracting valuable information out of data • Predict values for new data Scarcity of Data Abundance of Data Hypothesis comes first Data comes first Macro Decisioning Micro Decisioning
  • 9.
    Classical ProgrammingVs MachineLearning Rules Data Answers Answers Data Rules Classical Programming Machine Learning
  • 10.
    Classical Programming StartTeaching Wait fora reasonable time Is lesson over Continue Class Are students interested in listening Tell a joke Run away
  • 11.
    Machine Learning • Startwith Historical Data • Formulate the problem as a mathematical equation. • Feed data and equation to the machine and let it come up with values. S. No Time elapsed Interest Level ActionTaken Resultant Interest Level 1 0 High Start the class High 2 0 High Tell a joke High 3 0 Medium Shout on students Low 4 0 Medium Tell a joke High 5 .. .. 15 Low Tell a joke Medium .. 15 Medium Continue the class Low .. 15 High Scold the students Low .. 15 Low Tell a joke Low .. .. .. .. 60 Low Continue the class Low .. 60 Low Run away High
  • 12.
    Machine Learning –Mathematical Formulation • y = f(x) • Action to take = f(Time Elapsed, Historical Interest levels, actions taken, resultant interest level) • y – DependentVariable • x - IndependentVariables
  • 13.
  • 14.
    Roles & Activities Data Scientist DataEngineer DBA Performance Engineer Hardware Developer
  • 15.
    Machine Learning Pipeline ImagecourtesyWestern Digital
  • 16.
    Data Analytics –Basic concepts Gopi Krishna Nuti
  • 17.
    Data Analytics • InformationTypes •Structured Data • Semi structured Data • Unstructured Data • Examples
  • 18.
    What to seewhen you see data S. No Character Id Name Creator of the character Year of First publication Number of Films made (until 2019 Dec) 1 1123 Superman Jerry Siegel 1938 11 2 7856 Ironman Stan Lee 1963 8 3 3614 Captain America Stan Lee 1941 7 4 1578 Albus Dumbledore JK Rowling 1997 9 5 15725 Chacha Chowdary Pran 1971 0 6 007 James Bond Ian Fleming 1953 27
  • 19.
    Levels of Data Nominal •Algebraicoperations are not possible Ordinal •Logical operations are possible but not mathematical operations . Ex: Account Number Interval •Addition/Subtraction is possible but not multiplication/division •Interval between two continuous elements is always same and meaningful •Zero is arbitrary •Ex: Temperature Ratio: •Zero makes sense and negative values are not possible •Mean, Median, Mode etc can be calculated •Account Balance
  • 20.
    Data Analytics –Machine LearningTypes Supervised Learning Unsupervised Learning Predictive Analytics Descriptive Analytics
  • 21.
    Learning modes • Regression •Linear Regression, Polynomial Regression, SupportVector Regression, DecisionTree, Random Forest, • Classification • Logistic Regression, k-Nearest Neighbours, SupportVector Machines, Naïve Bayes, DecisionTree, Random Forest • Predict outcome for new data Supervised Learning - Predictive • Clustering • k-Means, Hierarchical Clustering • Affinity Analysis • Market Basket Analysis • Association Rule Mining Unsupervised Learning - Descriptive
  • 22.
    RegressionVs Classification Regression Classification Dependent variable ContinuousCategorical Purpose Predict output value using training data Group the output to a class Output level Ratio or Interval Ordinal or Nominal
  • 23.
    ClusteringVs Affinity Analysis ClusteringAffinity Analysis Purpose Identify similarities across rows of a table Identify similarities across columns of a table
  • 24.
    Error in MachineLearning • Error – an unavoidable mathematical fact in ML • Is this true? 1 3 ∗ 3 = 1 • Error is the difference between predicted value and actual value.
  • 25.
    ErrorVs Bug • Erroris not same as bugs. • Both can’t be completely eliminated. • A particular bug might be fixed. A particular error might be minimized. But eliminating as a whole is very difficult. • But they are still not same
  • 26.
    Why does erroroccur • Imagine a world where Sir Isaac Newton was never born. • Today we are building the relationship between Force, Mass and Acceleration • Machine Learning formulation F = f(m, a) • Linear Regression • 𝑓 = 𝛽0 + 𝛽𝑚𝑚 + 𝛽𝑎𝑎 • Polynomial Regression • 𝑓 = 𝛽0 + 𝛽𝑚𝑚𝑥 + 𝛽𝑎𝑎𝑦 • Error is ∈ = 𝑓 − 𝑓.This is rarely Zero. S. No Mass Acceleration Force 1 1 4 4 2 3243 2 6486 3 2 1 2 4 5231 6 31386 5 446 3 1338
  • 27.
    Important Concepts • Feature Engineering •Dimensionality Reduction • Principal Component Analysis • Training Data,Validation Data,Testing Data • Outliers and Missing value treatment • Overfit & Underfit • Precision & Recall • Feature Scaling • Manhattan Distance, Mahalanobis Distance, Euclidean Distance
  • 28.
    Model performance comparison Regression Models • R-Squareand Adjusted R-Square Classification Models • True Positives, False Positives,True Negatives, False Negatives – Confusion Matrix • Precision, Recall, F1 Score • Specificity, Sensitivity, ROC, AUC, Gini Index
  • 29.
  • 30.
    Neural Networks • ArtificialNeural Network • Perceptron output= 0 if ∑wjxj≤ threshold 1 if ∑wjxj> threshold
  • 31.
  • 32.
    Neural Networks • Activation Functions •Gradient Descent & Loss • Advantages of Neural Networks • With enough training data, can represent any function. NAND Gate representation. • In words of Elon Musk, “It’s quite simple, really”. • UniversalApproximationTheory • But why do we need a deep network? • Disadvantages and work arounds • GPUs
  • 33.
    Approach for building Models •p-value • Feature Selection • Forward Selection • Backward Elimination • Bidirectional Elimination • Score Comparison
  • 35.
    Further references • EmergingTrendsin Artificial Intelligence https://www.slideshare.net/gopikrishnanuti/modern-trends- in-artificial-intelligence-a-deeper-review • InferenceTrends in Industry https://www.slideshare.net/gopikrishnanuti/inferene-trends-in-industry • ComputerVision – Old problems and New Solutions https://www.slideshare.net/gopikrishnanuti/computer-vision-old-problems-new-solutions • Classification vis-à-vis Ranking in Machine Learning https://www.slideshare.net/gopikrishnanuti/classification-vis-avis-ranking-gopi
  • 36.
    Further Reading • Abook introducing Machine Learning from basics through Supervised and Unsupervised learning for beginners https://www.amazon.in/Machine-Learning-Engineers- Gopi- Krishna/dp/9389024870/ref=sr_1_2?dchild=1&keywor ds=machine+learning+for+engineers&qid=1616195333&s r=8-2
  • 37.
    MUST Research MUST Researchis dedicated to promote excellence and competence in the field of data science, cognitive computing, artificial intelligence, machine learning, advanced analytics for the benefit of the mankind - it’s a must. Our vision is to build an ecosystem that enables interaction between academia and enterprise, help them in resolving problems and make them aware of the latest developments in the cognitive era to provide solutions, guidance or training, organize lectures, seminars and workshops, collaborate on scientific programs and societal missions. •India’s largest AI community with 500+ data scientists •Award winning robots – Softie built in collaboration with Microsoft® https://www.youtube.com/watch?v=jQ8Gq2HWxiA •Multiple demonstrations of our robots MUSTie and MUSTani https://www.youtube.com/watch?v=AewM3TsjoBk •Letter of appreciation from Govt of Telangana for our contributions