3. SIT1305- MACHINE LEARNING
TEXT / REFERENCE BOOKS
Ethem Alpaydin, “Introduction to Machine Learning”,
MIT Press,2004
Tom Mitchell, “Machine Learning”, McGraw Hill,
1997.
February 21, 2023 3
SIT1305 Machine Learning
4. UNIT 1 INTRODUCTION TO MACHINE LEARNING
• Machine learning - examples of machine
learning applications - Learning associations -
Classification - Regression - Unsupervised
learning - Supervised Learning - Learning
class from examples - PAC learning -
Noise,model selection and generalization -
Dimension of supervised machine learning
algorithm.
February 21, 2023 SIT1305 Machine Learning 4
5. What is machine learning?
• A branch of artificial intelligence, concerned
with the design and development of
algorithms that allow computers to evolve
behaviors based on empirical data.
• As intelligence requires knowledge, it is
necessary for the computers to acquire
knowledge.
February 21, 2023 SIT1305 Machine Learning 5
8. Machine Learning
• “Field of study that gives computers the ability to learn
without being explicitly programmed”
• “Learning is any process by which a system improves
performance from experience”
February 21, 2023 SIT1305 Machine Learning 8
9. February 21, 2023 SIT1305 Machine Learning 9
Traditional Programming vs Machine Learning
10. How a software developer creates
a solution
February 21, 2023 SIT1305 Machine Learning 10
11. How a data engineer develops a
solution using machine learning
February 21, 2023 SIT1305 Machine Learning 11
12. What is Machine Learning?
Aspect of AI: creates knowledge
Definition:
“changes in [a] system that ... enable [it] to do the same task
or tasks drawn from the same population more efficiently
and more effectively the next time.'' (Simon 1983)
There are two ways that a system can improve:
1. By acquiring new knowledge
– acquiring new facts
– acquiring new skills
2. By adapting its behavior
– solving problems more accurately
– solving problems more efficiently
February 21, 2023 SIT1305 Machine Learning 12
13. Tom Mitchell provides a more modern
definition:
“A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E.”
February 21, 2023 SIT1305 Machine Learning 13
14. What is Learning?
• Herbert Simon: “Learning is any process by
which a system improves performance from
experience.”
• What is the task?
– Classification
– Categorization/clustering
– Problem solving / planning / control
– Prediction
– others
February 21, 2023 SIT1305 Machine Learning 14
15. Example
• Imagine you have some sets of the pair of numbers.
• Put only 1 number of the pair into a machine to predict the
other half of the pair.
(2,4),(3,6),(4,8)
The computer program has to predict the second number for
(5,?)
The program first needs to find the logic between the pairs
and then apply the same logic to predict the number.
To find that logic is called “machine learning”.
So that after finding the logic it can apply the same logic to
predict each number.
February 21, 2023 SIT1305 Machine Learning 15
20. February 21, 2023 SIT1305 Machine Learning 20
When Should You Use Machine
Learning?
• Consider using machine learning when you have a
complex task or problem involving a large amount of
data and lots of variables, but no existing formula or
equation.
• For example, machine learning is a good option if you
need to handle situations like these:
– Hand-written rules and equations are too complex—as in
face recognition and speech recognition.
– The rules of a task are constantly changing—as in fraud
detection from transaction records.
– The nature of the data keeps changing, and the program
needs to adapt—as in automated trading, energy
demand forecasting, and predicting shopping trends.
21. • Machine Learning is used when:
– Human expertise does not exist (navigating on
Mars),
– Humans are unable to explain their expertise
(speech recognition)
– Solution changes in time (routing on a computer
network)
– Solution needs to be adapted to particular cases
(user biometrics)
February 21, 2023 SIT1305 Machine Learning 21
22. February 21, 2023 SIT1305 Machine Learning 22
Multidisciplinary Field
Machine learning is primarily concerned
with the accuracy and effectiveness of
the computer system.
psychological models
data
mining
cognitive science
decision theory
information theory
databases
machine
learning
neuroscience
statistics
evolutionary
models
control theory
23. February 21, 2023 SIT1305 Machine Learning 23
Why now?
• Flood of available data (especially with the advent of
internet)
• Increasing computational power.
• Growing progress in available algorithms and theory
developed by researchers.
• Increasing support from industries.
26. February 21, 2023 SIT1305 Machine Learning 26
Applications of Machine learning
•Machine learning is a buzzword for today's technology, and it is growing
very rapidly day by day.
•We are using machine learning in our daily life even without knowing it
such as Google Maps, Google assistant, Alexa, etc.
27. February 21, 2023 SIT1305 Machine Learning 27
Real-World Applications
• With the rise in big data, machine learning has become particularly
important for solving problems in areas like these:
– Computational finance, for credit scoring and algorithmic
trading
– Image processing and computer vision, for face recognition,
motion detection, and object detection
– Computational biology, for tumor detection, drug discovery, and
DNA sequencing
– Energy production, for price and load forecasting
– Automotive, aerospace, and manufacturing, for predictive
maintenance
– Natural language processing
29. “Telephone took 75 years to reach 50
million users, radio 38 yrs, television
13 yrs, Internet 4 yrs, Facebook 19
months, Pokemon Go 19
days. AarogyaSetu, India’s app to fight
COVID-19 has reached 50 mn users in
just 13 days-fastest ever globally for
an App,” Kant said in his tweet.
Machine Learning Trends
The app will calculate this based on their interaction with
others, using cutting edge Bluetooth technology, algorithms
and artificial intelligence.
February 21, 2023 SIT1305 Machine Learning 29
30. February 21, 2023 SIT1305 Machine Learning 30
What Machine Learning can do
Finding which category an object belongs to -- by
Classification Algorithm
Finding what is strange -- by Anomaly Detection Algorithm
Finding how much and how many -- by Regression
Algorithm
Finding how data is arranged – by Clustering Algorithm
What should I do next -- by Reinforcement Algorithm
31. February 21, 2023 SIT1305 Machine Learning 31
How Do You Decide Which Algorithm
to Use?
• Choosing the right algorithm can seem overwhelming—there
are dozens of supervised and unsupervised machine learning
algorithms, and each takes a different approach to learning.
• There is no best method or one size fits all. Finding the right
algorithm is partly just trial and error—even highly
experienced data scientists can’t tell whether an algorithm
will work without trying it out.
• But algorithm selection also depends on the size and type of
data you’re working with, the insights you want to get from
the data, and how those insights will be used.
32. February 21, 2023 SIT1305 Machine Learning 32
Questions to Consider Before You
Start
• Every machine learning workflow begins with three questions:
– What kind of data are you working with?
– What insights do you want to get from it?
– How and where will those insights be applied?
• Your answers to these questions help you decide whether to
use supervised or unsupervised learning.
34. Understanding Machine Learning
Machine Learning vs Statistical Inference vs Pattern
Recognition vs Data Mining
Perspective 1
same concepts evolving in different scientific traditions
• Statistical Inference (SI): field of Applied Mathematics
• Machine Learning (ML): field of AI
• Pattern Recognition (PR): branch of Computer Science
focused on perception problems (image processing,
speech recognition, etc.)
• Data Mining (DM): field of Database Engineering
SIT1305 Machine Learning 34
35. Perspective 2: slight conceptual differences
• Statistical Inference: inference based on probabilistic
models built on data. Located at the intersection of
Mathematics and Artificial Intelligence (AI)
• Machine Learning: methods tend to be more heuristic in
nature
• Pattern Recognition: most authors defend it is the same
thing as machine learning
• Data Mining: applied machine learning. Involves issues such
as data pre-processing, data cleaning, transformation,
integration or visualization. Involves machine learning, plus
computer science and database systems. 35
Understanding Machine Learning
February 21, 2023 SIT1305 Machine Learning 35
36. Designing a Learning System
• Choose the training experience
• Choose exactly what is too be learned, i.e. the target
function.
• Choose how to represent the target function.
• Choose a learning algorithm to infer the target
function from the experience.
Environment/
Experience
Learner
Knowledge
Performance
Element
February 21, 2023 SIT1305 Machine Learning 36
37. Types of Machine Learning
February 21, 2023 SIT1305 Machine Learning 37
38. 38
Supervised learning
– Generates a function that maps inputs to desired
outputs.
• For example, in a classification problem, the
learner approximates a function mapping a
vector into classes by looking at input-output
examples of the function
– Probably, the most common paradigm
– E.g., decision trees, support vector machines, Naïve
Bayes, k-Nearest Neighbors, …
Learning Paradigms
February 21, 2023 SIT1305 Machine Learning 38
40. 40
• Unsupervised learning
– Labels are not known during training
– E.g., clustering, association learning
• Semi-supervised learning
– Combines both labeled and unlabeled examples to
generate an appropriate function or classifier
– E.g., Transductive Support Vector Machine
Learning Paradigms
February 21, 2023 SIT1305 Machine Learning 40
42. Machine learning structure
• semisupervised learning
The goal of a semi-supervised model is to classify some of the
unlabeled data using the labeled information set.
• Speech Analysis
• Protein Sequencing
• Web content analysis
February 21, 2023 SIT1305 Machine Learning 42
43. Reinforcement Learning
• In the absence of training dataset, it is bound to learn from its
experience.
• We have an agent and a reward, with many hurdles in between.
The agent is supposed to find the best possible path to reach the
reward.
Types of Reinforcement: positive, negative, punishment, and
extinction.
February 21, 2023 SIT1305 Machine Learning 43
44. Machine learning structure
• Reinforcement learning
– It is concerned with how an agent should take actions in an
environment so as to maximize some notion of cumulative
reward.
• Reward given if some evaluation metric improved
• Punishment in the reverse case
– E.g., Q-learning, Sarsa
February 21, 2023 SIT1305 Machine Learning 44
46. Algorithms
• Supervised learning
– Prediction
– Classification (discrete labels), Regression (real values)
• Unsupervised learning
– Clustering
– Probability distribution estimation
– Finding association (in features)
– Dimension reduction
• Semi-supervised learning
• Reinforcement learning
– Decision making (robot, chess machine)
February 21, 2023 SIT1305 Machine Learning 46
47. 47
• Classification
– Learn a way to classify unseen examples, based on a
set of labeled examples, e.g., classify songs by
emotion categories. E.g., decision trees (e.g., C5.4)
• Regression
– Learn a way to predict continuous output values,
based on a set of labeled examples, e.g., predict
software development effort in person months
– Sometimes regarded as numeric classification
(outputs are continuous instead of discrete)
– E.g., Support Vector Regression
ML Algorithms
February 21, 2023 SIT1305 Machine Learning 47
48. 48
• Association
– Find any association among features, not just input-
output associations (e.g., in a supermarket, find that
clients who buys apples also buys cereals)
– E.g., Apriori
• Clustering
– Find natural grouping among data
– E.g., K-means clustering, DBSCAN, Heirarchial
clustering
ML Algorithms
February 21, 2023 SIT1305 Machine Learning 48
50. Training and testing
• Training is the process of making the system able to learn.
• No free lunch rule:
– Training set and testing set come from the same distribution
– Need to make some assumptions or bias
February 21, 2023 SIT1305 Machine Learning 50
51. • There are several factors affecting the performance:
– Types of training provided
– The form and extent of any initial background
knowledge
– The type of feedback provided
– The learning algorithms used
• Two important factors:
– Modeling
– Optimization
Performance
February 21, 2023 SIT1305 Machine Learning 51
52. 52
• Different ML traditions propose different approaches
inspired by real-world analogies
– Neural networks researchers: emphasize analogies
to neurobiology
– Case-based learning: human memory
– Genetic algorithms: evolution
– Rule induction: heuristic search
– Analytic methods: reasoning in formal logic
• Again, different notation and terminology
Machine Learning Traditions
February 21, 2023 SIT1305 Machine Learning 52
54. 54
• Black-box
– Learned model internals are practically incomprehensible
• E.g., Neural Networks, Support Vector Machines
• Transparent-box
– Learned model internals are understandable, interpretable
• E.g., explicit rules, decision-trees
• Instance-based or case-based learning
– Represents knowledge in terms of specific cases or
experiences
– Relies on flexible matching methods to retrieve these cases
and apply them to new situations
– E.g., k-Nearest Neighbors
Learning Paradigms
February 21, 2023 SIT1305 Machine Learning 54
56. Machine Learning touching our Daily Life
Walmart use Robots in
their stores for inventory
management, packing,
pricing checks
Restaurants
have Robot
chefs and
Waiters
February 21, 2023 SIT1305 Machine Learning 56
57. Michelangelo, an internal ML-as-a-service platform that
democratizes machine learning and makes scaling AI to
meet the needs of business as easy as requesting a ride.
Machine Learning touching our Daily Life
February 21, 2023 SIT1305 Machine Learning 57
58. Machine Learning touching our Daily Life
Song Recommendations
based on mood and interest
Data Acquisition from Tamr – Enterprise
Data Unification Company
Content specific vaccines
for Children
February 21, 2023 SIT1305 Machine Learning 58
59. Amazon – Game Changer of the Decade
February 21, 2023 SIT1305 Machine Learning 59
60. Machine Learning in Civil Engineering
Design of Construction Management System
Prediction of the Severity of Earthquakes
Better analysis of monitoring the construction
health
Analysis of Environmental Engineering
Design of Highway and transportation
Engineering fo the prediction of Transport arrivals
and pedestrian movement analysis
Use of Machine learning in surveying,
Geotechnical and Geospatial Engineering
February 21, 2023 SIT1305 Machine Learning 60
61. Machine Learning in Mechanical
Engineering
Cognitive Science of a Machine
Use of IoT and Big Data Analytics
On site performance of devices
Non-linear root cause analysis
Tools for analytics and operations
February 21, 2023 SIT1305 Machine Learning 61
66. Predicting mechanical failure
• By continuously monitoring data (power plant,
manufacturing unit operations) and providing
them to smart decision support systems,
manufacturers can predict the probability of
failure.
• Predictive maintenance is an emerging field in
industrial applications that helps in determining
the condition of in-service equipment to estimate
the optimum time of maintenance.
• ML-based predictive maintenance saves cost
and time on routine or preventive maintenance.
February 21, 2023 SIT1305 Machine Learning 66
67. AI for automatically segmenting brain
tumors
Artificial Intelligence has a broad scope in
healthcare devices and applications.
Makes analysis, treatment, and monitoring of
tumors more effective.
NVIDIA has developed a 3D MRI brain tumor
segmentation using deep-learning and 3D
magnetic resonance imaging technologies.
February 21, 2023 SIT1305 Machine Learning 67
73. Robotics and ML
Areas that robots are used:
Industrial robots
Military, government and space robots
Service robots for home, healthcare, laboratory
Why are robots used?
Dangerous tasks or in hazardous environments
Repetitive tasks
High precision tasks or those requiring high quality
Labor savings
Control technologies:
Autonomous (self-controlled), tele-operated (remote
control)
February 21, 2023 SIT1305 Machine Learning 73
74. Industrial Robots
• Uses for robots in manufacturing:
– Welding
– Painting
– Cutting
– Dispensing
– Assembly
– Polishing/Finishing
– Material Handling
• Packaging, Palletizing
• Machine loading
February 21, 2023 SIT1305 Machine Learning 74
75. Industrial Robots
• Uses for robots in industry/Manufacturing
– Automotive
– Packaging
February 21, 2023 SIT1305 Machine Learning 75
77. Military/Government Robots
Soldiers in Afghanistan being trained how to defuse a landmine using a PackBot.
February 21, 2023 SIT1305 Machine Learning 77
78. Military Robots
• Aerial drones (UAV) Military suit
February 21, 2023 SIT1305 Machine Learning 78
79. Space Robots
• Mars Rovers – Spirit and Opportunity
– Autonomous navigation features with human
remote control and oversight
February 21, 2023 SIT1305 Machine Learning 79
81. Medical/Healthcare Applications
DaVinci surgical robot by Intuitive Surgical.
St. Elizabeth Hospital is one of the local hospitals using this robot. You can
see this robot in person during an open house (website).
Japanese health care assistant suit
(HAL - Hybrid Assistive Limb)
Also… Mind-controlled
wheelchair using NI LabVIEW
February 21, 2023 SIT1305 Machine Learning 81
83. Programs with the ability
to learn and reason like
humans
Algorithms with the ability
to learn without being
explicitly programmed
Subset of Machine
Learning in which Artificial
Neural Networks adapt
and learn from vast
amounts of data.
AI vs Machine Learning vs Deep Learning
February 21, 2023 SIT1305 Machine Learning 84
91. Top Machine Learning Software Tools
Software Platform
Written in
language
Algorithms or Features
Scikit Learn Linux, Mac OS,
Windows
Python,
Cython, C,
C++
Classification, Regression
Clustering, Preprocessing
Model Selection
Dimensionality reduction.
PyTorch Linux, Mac OS,
Windows
Python, C++,
CUDA
Autograd Module, Optim
Module, nn Module
TensorFlow Linux, Mac OS,
Windows
Python, C++,
CUDA
Provides a library for dataflow
programming.
Weka
Waikato
Environment
for
Knowledge
Analysis
Linux, Mac OS,
Windows
Java Data preparation, Classification
Regression, Clustering
Visualization, Association rules
mining
February 21, 2023 SIT1305 Machine Learning 92
92. Top Machine Learning Software Tools
Software Platform
Written in
language
Algorithms or Features
KNIME
Konstanz
Information
Miner
Linux, Mac
OS,
Windows
Java Can work with large data volume.
Supports text mining & image
mining through plugins
Colab Cloud
Service
- Supports libraries of PyTorch,
Keras, TensorFlow, and OpenCV
Apache
Mahout
Cross-
platform
Java
Scala
Preprocessors, Regression
Clustering, Recommenders
Distributed Linear Algebra.
Accors.Net Cross-
platform
C# Classification, Regression,
Distribution, Clustering
Hypothesis Tests & Kernel
Methods, Image, Audio & Signal.
& Vision
February 21, 2023 SIT1305 Machine Learning 93
93. Top Machine Learning Software Tools
Software Platform
Written in
language
Algorithms or Features
Shogun Windows
Linux
UNIX
Mac OS
C++ Regression, Classification
Clustering, Support vector
machines, Dimensionality
reduction, Online learning etc.
Keras.io Cross-
platform
Python API for neural networks ,
supports CNN
Rapid
Miner
Cross-
platform
Java Data loading & Transformation
Data preprocessing &
visualization.
Oryx2 Cross
Platform
Python collaborative filtering,
classification, regression , DL,
CNN
February 21, 2023 SIT1305 Machine Learning 94
94.
95. Supervised Learning
• Train the algorithm using data which is well labelled
that means some data is already tagged with correct
answers
Ex: Given an basket filled with different kind of
fruits and train the algorithm with all different
fruits
96. Supervised Learning
• Classification: When the output variable is a category
such as “Red” or “Blue” or “disease” and “No-
disease”
• Regression: When the output variable is continuous
values such as “dollars” or “weight”. It is used for
continuous values attributes.
Ex: Continuous marks of 70 students for particular
subject Ass1 Ass2
80 70
83 72
……
97.
98. Unsupervised Learning
• Training the algorithm using information without any
guidance that is neither classified or labelled
• Set of data given based on similarities, pattern and
differences it is grouped.
Ex: Given an image having both animals and birds
Technique: Clustering
99.
100.
101. Parameters Supervised machine learning technique Unsupervised machine learning technique
Process
In a supervised learning model, input and
output variables will be given.
In unsupervised learning model, only input
data will be given
Input Data Algorithms are trained using labeled data.
Algorithms are used against data which is not
labeled
Algorithms Used
Support vector machine, Neural network,
Linear and logistics regression, random
forest, and Classification trees.
Unsupervised algorithms can be divided into
different categories: like Cluster algorithms,
K-means, Hierarchical clustering, etc.
Computational
Complexity
Supervised learning is a simpler method.
Unsupervised learning is computationally
complex
Use of Data
Supervised learning model uses training
data to learn a link between the input and
the outputs.
Unsupervised learning does not use output
data.
Accuracy of
Results
Highly accurate and trustworthy method. Less accurate and trustworthy method.
Real Time Learning Learning method takes place offline. Learning method takes place in real time.
Number of Classes Number of classes is known. Number of classes is not known.
Main Drawback
Classifying big data can be a real challenge
in Supervised Learning.
You cannot get precise information regarding
data sorting, and the output as data used in
unsupervised learning is not labeled.
102. Questions
1. A computer program is said to learn from experience E with respect to
some task T and some performance measure P if its performance on T, as
measured by P, improves with experience E. Suppose we feed a learning
algorithm a lot of historical weather data, and have it learn to predict
weather. In this setting, what is E?
1. Suppose you are working on weather prediction, and you would like to
predict whether or not it will be raining at 5pm tomorrow. You want to
use a learning algorithm for this. Would you treat this as a classification
or a regression problem?
2. Suppose you are working on stock market prediction, and you would like
to predict the price of a particular stock tomorrow (measured in dollars).
You want to use a learning algorithm for this. Would you treat this as a
classification or a regression problem?
February 21, 2023 SIT1305 MACHINE LEARNING 103
103. 1. Take a collection of 1000 essays written on the US Economy, and find a
way to automatically group these essays into a small number of groups of
essays that are somehow "similar" or "related". :=
This is an unsupervised learning/clustering problem (similar to the
Google News example in the lectures).
2. Given a large dataset of medical records from patients suffering from
heart disease, try to learn whether there might be different clusters of
such patients for which we might tailor separate treatements. :=
This can be addressed using an unsupervised learning, clustering,
algorithm, in which we group patients into different clusters.
3. Given genetic (DNA) data from a person, predict the odds of him/her
developing diabetes over the next 10 years. :=
This can be addressed as a supervised learning, classification,
problem, where we can learn from a labeled dataset comprising different
people's genetic data, and labels telling us if they had developed
diabetes.
4. Given 50 articles written by male authors, and 50 articles written by
female authors, learn to predict the gender of a new manuscript's author
(when the identity of this author is unknown). :=
This can be addressed as a supervised learning, classification,
problem, where we learn from the labeled data to predict gender.
February 21, 2023 SIT1305 MACHINE LEARNING 104
104. 5. In farming, given data on crop yields over the last 50 years, learn to
predict next year's crop yields. :=
This can be addresses as a supervised learning problem, where we
learn from historical data (labeled with historical crop yields) to predict
future crop yields.
6. Examine a large collection of emails that are known to be spam email, to
discover if there are sub-types of spam mail. :=
This can addressed using a clustering (unsupervised learning)
algorithm, to cluster spam mail into sub-types.
7. Examine the statistics of two football teams, and predicting which team
will win tomorrow's match (given historical data of teams' wins/losses to
learn from). :=
This can be addressed using supervised learning, in which we learn
from historical records to make win/loss predictions.
February 21, 2023 SIT1305 MACHINE LEARNING 105
106. Learning Class by Example
• Class C of a “family car”
– Prediction: Is car x a family car?
– Knowledge extraction: What do people expect from a
family car?
• Output:
Positive (+) and negative (–) examples
• Input representation:
x1: price, x2 : engine power
111. February 21, 2023 SIT1305 Machine Learning 112
Examples of Machine Learning Applications
Tasks are classified into two categories:
1. Descriptive- characterize the properties of data
2. Predictive – Inference on current data to make predictions
Functionalities:
1.Class/Concept Description
2. Associations
3.Classification and Prediction
4.Clustering
5.Outliers
112. 1.Class/Concept Description
• Class/Concept refers to the data to be associated with the
classes or concepts.
– Data Characterization − This refers to summarizing data of
class under study. This class under study is called as Target
Class.
– Data Discrimination − It refers to the mapping or
classification of a class with some predefined group or
class.
February 21, 2023 SIT1305 Machine Learning 113
113. February 21, 2023 SIT1305 Machine Learning 114
2. Associations
• Find frequent elements/items
• Find the associations between them/relationships
• Single dimensional association rule
Buys(‘X’, ‘Computer’) Buys(‘X’, ‘Software’)
• Multidimensional association rule
Age(‘X’, ’40-60’) and income(‘X’, ’50-60 lakhs’) Buys(‘X’,
‘Computer’)
Algorithms:
1. Apriori
2. Frequent pattern growth tree
114. February 21, 2023 SIT1305 Machine Learning 115
Classification
• Example: Credit scoring
• Differentiating between
low-risk and high-risk
customers from their
income and savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
Model
115. February 21, 2023 SIT1305 Machine Learning 116
Prediction: Regression
• Example: Price of a used car
• x : car attributes
y : price
y = g (x | θ )
g ( ) model,
θ parameters
y = wx+w0
117. Learning Associations
• Imagine that you are a sales manager at AllElectronics, and
you are talking to a customer who recently bought a PC and a
digital camera from the store. What should you recommend
to her/him next?
• Frequent patterns and association rules are the knowledge
that you want to mine in such a scenario.
• Finding frequent patterns plays an essential role in mining
associations, correlations, and many other interesting
relationships among data.
• Moreover, it helps in data classification, clustering, and other
data mining tasks.
February 21, 2023 SIT1305 Machine Learning 118
118. What is Association Mining?
• Motivation: finding regularities in data
– What products were often purchased together?
– What are the subsequent purchases after buying a PC?
– What kinds of DNA are sensitive to this new drug?
– Can we automatically classify web documents?
February 21, 2023 SIT1305 Machine Learning 119
119. • Frequent patterns are patterns that appear frequently in a
data set.
– a set of items, such as milk and bread, that appear
frequently together in a transaction data set is a frequent
itemset.
– A subsequence,such as buying first a PC, then a digital
camera, and then a memory card, if it occurs frequently in
a shopping history database, is a (frequent) sequential
pattern.
– A substructure can refer to different structural forms, such
as subgraphs, subtrees, or sublattices, which may be
combined with itemsets or subsequences.
• If a substructure occurs frequently, it is called a (frequent)
structured pattern.
February 21, 2023 SIT1305 Machine Learning 120
120. February 21, 2023 SIT1305 Machine Learning 121
Association Rules
• An association rule is an implication of the form X → Y where
X is the antecedent and Y is the consequent of the rule.
• To find the dependency between two items X and Y.
• P (Y | X ) probability that somebody who buys X also buys Y
where X and Y are products/services.
121. Example Association Rule
90% of transactions that purchase bread and butter also
purchase milk
“IF” part = antecedent
“THEN” part = consequent
“Item set” = the items (e.g., products) comprising the
antecedent or consequent
• Antecedent and consequent are disjoint(i.e., have no items in
common.
Antecedent: bread and butter
Consequent: milk
Confidence factor:90%
February 21, 2023 SIT1305 Machine Learning 122
122. Three measures
• Support of the association rule X→Y :
• Confidence of the association rule X→Y :
• Lift or interest of the association rule X→Y :
• Goal:Find all rules that satisfy the user-specified minimum support(min.sup) and
minimum confidence(min.conf).
February 21, 2023 SIT1305 Machine Learning 123
}
{
#
}
{
#
)
,
(
)
,
(
support
customers
Y
and
X
bought
who
customers
Y
X
P
Y
X
}
{
#
}
{
#
)
(
)
,
(
)
|
(
)
(
X
bought
who
customers
Y
and
X
bought
who
customers
X
P
Y
X
P
X
Y
P
Y
X
confidence
)
(
)
|
(
)
(
)
(
)
,
(
)
(
Y
P
X
Y
P
Y
P
X
P
Y
X
P
Y
X
Lift
124. February 21, 2023 SIT1305 Machine Learning 125
Market Basket Analysis: A Motivating
Example
125. Market Basket Analysis: A Motivating
Example
• Market basket analysis, the earliest form of frequent pattern mining
for association rules.
• Frequent itemset mining leads to the discovery of associations and
correlations among items in large transactional or relational data
sets.
• With massive amounts of data continuously being collected and
stored, many industries are becoming interested in mining such
patterns from their databases.
• Help in many business decision-making processes such as catalog
design, cross-marketing, and customer shopping behavior analysis.
February 21, 2023 SIT1305 Machine Learning 126
126. Apriori Algorithm
• Proposed by Agrawal et al. in 1996.
• Initially used for Market Basket Analysis to find how items
purchased by customers are related.
• Two steps:
1. Finding frequent itemsets, that is ,those which have
enough support.
2. Converting them to rules with enough confidence, by
splitting the items into two, as items in the antecedent
and items in the consequent.
February 21, 2023 SIT1305 Machine Learning 127
127. Association rule mining
• Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other
items in the transaction.
February 21, 2023 SIT1305 Machine Learning 128
128. Transaction data: supermarket data
• Market basket transactions:
t1: {bread, cheese, milk}
t2: {apple, eggs, salt, yogurt}...
...
tn: {biscuit, eggs, milk}
• Concepts:
• An item: an item/article in a basket
• I:the set of all items sold in the store
• A transaction: items purchased in a basket; it may have TID
(transaction ID)
• A transactional dataset: A set of transactions
February 21, 2023 SIT1305 Machine Learning 129
129. Definition: Frequent Itemset
• Itemset
– A collection of one or more items
• Example: {Milk, Bread, Diaper}
– k-itemset
• An itemset that contains k items
• Support count (σ)
–Frequency of occurrence of an itemset
–E.g. σ({Milk, Bread,Diaper}) = 2
• Support
•Fraction of transactions that contain an itemset
E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent Itemset
•An itemset whose support is greater than or equal to a
min.support threshold σ({Milk, Bread,Diaper}) = 2
February 21, 2023 SIT1305 Machine Learning 130
131. Example
• A database has nine transactions ,that is, |D|=9.Minimum
support count is 2 and Minimum confidence is 60%.
a) Find all frequent itemsets using Apriori.
b) List all the strong association rules (with support s and
confidence c).
February 21, 2023 SIT1305 Machine Learning 132
132. • Step-1: K=1
– Create a table containing
support count of each item
present in dataset – called
C1(candidate set)
– compare candidate set item’s
support count with minimum
support count(given
min_support=2). This gives us
itemset L1.
February 21, 2023 SIT1305 Machine Learning 133
C1
L1
133. • Step-2: K=2
– Generate candidate set C2
using L1 (this is called join
step). Condition of joining Lk-1
and Lk-1 is that it should have
(K-2) elements in common.
– Check all subsets of an itemset
are frequent or not and if not
frequent remove that itemset.
– Now find support count of
these itemsets by searching in
dataset.
February 21, 2023 SIT1305 Machine Learning 134
L1
C2
134. • compare candidate (C2)
support count with
minimum support count,this
gives us itemset L2.
February 21, 2023 SIT1305 Machine Learning 135
C2
L2
135. • Step-3:
–Generate candidate set C3 using L2 (join step). Condition of joining Lk-1 and
Lk-1 is that it should have (K-2) elements in common.
–for L2, first element should match.
Check if all subsets of these itemsets are frequent or not and if not, then
remove that itemset.
–(Here subset of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which are frequent. For
{I2, I3, I4}, subset {I3, I4} is not frequent so remove it. Similarly check for
every itemset)
–find support count of these remaining itemset by searching in dataset.
February 21, 2023 SIT1305 Machine Learning 136
L3
Itemset
I1,I2,I3
I1,I2,I5
I1,I2,I4
I1,I3,I5
I2,I3,I4
I2, I3, I5
I2, I4, I5
C3
L2
136. • Step-4:
• Generate candidate set C4 using L3 (join step). Condition of
joining Lk-1 and Lk-1 (K=4) is that, they should have (K-2)
elements in common. So here, for L3, first 2 elements (items)
should match.
• Check all subsets of these itemsets are frequent or not (Here
itemset formed by joining L3 is {I1, I2, I3, I5} so its subset
contains {I1, I3, I5}, which is not frequent).
• So no itemset in C4.
• Stop , because no frequent itemsets are found further.
February 21, 2023 SIT1305 Machine Learning 137
137. Apriori Algorithm
• Apriori employs an iterative approach known as a level-wise
search,where k-itemsets are used to explore (k+1)-itemsets.
Apriori property:All nonempty subsets of a frequent itemset must also be
frequent.
A two step process is followed:
1. The Join step: To find Lk, a set of candidate k-itemsets is
generated by joining l Lk-1 with itself. The set of candidates is
denoted Ck.
2. The Prune step: Ck is a superset of Lk-1,that is ,its members
may or may not be frequent, but all of the frequent k-
itemsets are included in Ck. To reduce the size of Ck the
Apriori property is used.
February 21, 2023 SIT1305 Machine Learning 138
139. • Generation of strong association rule.
• Calculate confidence of each rule.
February 21, 2023 SIT1305 Machine Learning 140
140. February 21, 2023 SIT1305 Machine Learning 141
Generating Association Rules
Rule generation for Itemset {I1, I2, I5} from L3
141. Rule generation for Itemset {I1, I2, I3} from L3
Rules Confidence
{I1 ^I2} → I3 2/4=0.5=50%
{I2^I3} → I1 2/4=0.5=50%
{I1^I3} → I2 2/4=0.5=50%
I3→ {I1 ^I2} 2/5=0.4=40%
I1→ {I2^I3} 2/6=0.33=33.33%
I2→ {I1^I3} 2/7=0.28=28%
February 21, 2023 SIT1305 Machine Learning 142
As the given threshold or minimum confidence is 60%, no rules can be
considered as the strong association rules for the given problem.
146. Vapnik–Chervonenkis (VC) dimension
• Assume we have a dataset containing N points.
• These N points can be labeled in 2N ways as positive and
negative examples.
• Therefore , 2N different learning problems can be defined by N
data points.
• If for any of these problems, we can find a hypotheses h ϵ H
that separates the positive examples from negative ,then we
say H shatters N points.
• That is, any learning problem definable by N examples can be
learned with no error by a hypothesis drawn from H.
• The maximum number of points that can be shattered by H is
called the Vapnik-Chervonenkis(VC) dimension of H, is
denoted as VC(H), and measures the capacity of H.
•
February 21, 2023 SIT1305 Machine Learning 147
147. Vapnik–Chervonenkis (VC) dimension
• The Vapnik–Chervonenkis (VC) dimension is a measure of the
capacity (complexity, expressive power, richness, or flexibility)
of a set of functions that can be learned by a statistical binary
classification algorithm.
• It is defined as the cardinality of the largest set of points that
the algorithm can shatter.
• It was originally defined by Vladimir Vapnik and Alexey
Chervonenkis.
February 21, 2023 SIT1305 Machine Learning 148
148. • When choosing a classifier for your data, an obvious question
to ask is “What kind of data can this classifier classify?”.
For example,
– if you know your points can easily be separated by a single
line, you may opt to choose a simple linear classifier,
– whereas if you know your points will be in many separate
groups, you may opt to choose a more powerful classifier
such as a random forest or multilayer perceptron.
• This fundamental question can be answered using a
classifier’s VC dimension, which is a concept from
computational learning theory that formally quantifies the
power of a classification algorithm.
February 21, 2023 SIT1305 Machine Learning 149
149. Example
• The VC dimension for a linear classifier is at least 3, since it
can shatter this configuration of 3 points.
• In each of the 2³ = 8 possible assignment of positive and
negative, the classifier is able to perfectly separate the two
classes.
February 21, 2023 SIT1305 Machine Learning 150
151. • Now, we show that a linear classifier is lower than 4.
• In this configuration of 4 points, the classifier is unable to
segment the positive and negative classes in at least one
assignment.
• Two lines would be necessary to separate the two classes in
this situation.
• We actually need to prove that there does not exist a 4 point
configuration that can be shattered, but the same logic
applies to other configurations, so, for brevity’s sake, this
example is good enough.
February 21, 2023 SIT1305 Machine Learning 152
153. Applications of VC dimension
• In most cases, the exact VC dimension of a classifier is not so
important.
• Rather, it is used more so to classify different types of
algorithms by their complexities.
• For example, the class of simple classifiers could include basic
shapes like lines, circles, or rectangles, whereas a class of
complex classifiers could include classifiers such as multilayer
perceptrons, boosted trees, or other nonlinear classifiers.
• The complexity of a classification algorithm, which is directly
related to its VC dimension, is related to the trade-off
between bias and variance.
February 21, 2023 SIT1305 Machine Learning 154
154. Bias and Variance
• Machine learning models are an incredible powerful and
useful tool for data scientists.
• When building a model, it is important to remember that with
predictions comes prediction errors.
• These errors are due to a combination of bias and variance
which have a trade-off relationship.
• Understanding these fundamentals is just the first step to
building an accurate model and avoiding the pitfalls of under-
fitting and over-fitting.
February 21, 2023 SIT1305 Machine Learning 155
155. • In supervised machine learning,we are approximating a target
function(f) that maps input variables (X) to an output variable
(Y). The relationship is mathematically expressed as:
• where e represents the total error. The total error can actually
be further split into three parts:
February 21, 2023 SIT1305 Machine Learning 156
156. Bias
• Bias, or bias error, can be defined as the difference between
the expected prediction of our model and the correct value
which we are trying to predict.
• High bias can cause our model to miss significant relations
between our features (X) and target outputs (Y) so it cannot
learn the training data or generalize to new data.
• This is also known as under-fitting. Under-fitted models are
forced to make a lot of assumptions which can cause
inaccurate predictions.
February 21, 2023 SIT1305 Machine Learning 157
157. Variance
• Variance is the variability of a model prediction for a given
data point.
• It is the error from sensitivity to small fluctuations in the
training data.
• When there is high variance, this can cause random noise (e)
to be introduced into the training data rather than the
intended outputs (Y).
• High variance is also known as over-fitting data. When the
data is over-fitted, the model essentially learns the training
data too well and therefore cannot generalize to new data.
• The last error term is the irreducible error. Irreducible error is
essentially the amount of noise from factors outside of our
control and cannot be removed.
February 21, 2023 SIT1305 Machine Learning 158
158. Example
line of best fit.
• In the left graph below, we can see that the line is simple and
does not follow many of the data points, thus showing high
bias.
• The right graph below shows a line that follows almost every
data point, even ones that may be noise or outliers, showing
high variance.
• Our goal is to find a balance between these two extremes so
that the majority of data points are explained with an
appropriate amount of noise.
February 21, 2023 SIT1305 Machine Learning 159
159. • The relationship between bias and variance can also be
visualized using a target example.
February 21, 2023 SIT1305 Machine Learning 160
161. Prevent Underfitting and Overfitting
Underfitting:
• Make sure there is enough training data so that the error/cost
function (e.g. MSE or SSE) is sufficiently minimized
Overfitting:
• Limit the number of features or adjustable parameters in the
model. As the number of features increases, the complexity of
the model also increases, thus creating a higher chance of
overfitting.
• Shorten the training so the model doesn’t “over-learn” the
training data.
• Add some form of regularization term to the error/cost
function to encourage smoother network mappings (Ridge or
Lasso regression are commonly used techniques)
February 21, 2023 SIT1305 Machine Learning 162
162. Modelling supervised learning
• Given training set of labelled examples, learning algorithm
generates a hypothesis. Run hypothesis on test set to check
how good it is.
• But how good really? May be training and test data consists of
bad examples so the hypothesis doesn’t generalize well.
• Insight: Introduce probabilities to measure degree of
certainty and correctness.
• With high probability an efficient learning algorithm will find a
hypothesis that is approximately identical to the hidden target
function.
• Intuition : A hypothesis built from a large amount of training
data is unlikely to be wrong i.e. Probably approximately
correct(PAC).
February 21, 2023 SIT1305 Machine Learning 163
165. • In PAC learning, given a class C , and examples drawn from
some unknown but fixed probability distribution, p(x) , we
want to find the number of examples, N ,such that with
probability at least 1-δ , the hypothesis h has error at most ϵ,
for arbitrary δ≤1/2 and ϵ>0.
P{CΔh ≤ ϵ}≥ 1- δ
where CΔh is the region of difference between C and h.
February 21, 2023 SIT1305 Machine Learning 166
183. Model Selection & Generalization
• Consider the case of learning a Boolean function from
examples.
• In a Boolean function, all inputs and the output are binary.
• There are 2d possible ways to write d binary values and
therefore, with d inputs,the training set has at most 2d
examples.
• As shown in table , each of these can be labeled as 0 or 1, and
therefore, there are possible Boolean functions of d inputs.
February 21, 2023 SIT1305 Machine Learning 184
d
2
2
184. • Each distinct training example removes half the hypotheses,
namely, those whose guesses are wrong.
• For example, let us say we have x1 = 0, x2 = 1 and the output
is 0; this removes h5, h6, h7, h8, h13, h14, h15, h16.
• This is one way to interpret learning: we start with all possible
hypothesis and as we see more training examples, we remove
those hypotheses that are not consistent with the training
data.
• In the case of a Boolean function, to end up with a single
hypothesis we need to see all 2d training examples.
February 21, 2023 SIT1305 Machine Learning 185
185. • Ill-posed Problem:
– If the given training set contains only a small subset of all
possible instances, and if we know what the output should
be for only a small percentage of the cases—the solution is
not unique.
– This is an example of an ill-posed problem where the data
by itself is not sufficient to find a unique solution.
• Inductive Bias:
– Because learning is ill-posed, and data by itself is not
sufficient to find the solution, we should make some extra
assumptions to have a unique solution with the data we
have.
– The set of assumptions made to have learning possible is
called the inductive bias of the learning algorithm.
February 21, 2023 SIT1305 Machine Learning 186
186. • Model Selection:
– Learning is not possible without inductive bias, and now
the question is how to choose the right bias. This is called
model selection, which is choosing between possible H.
• Generalization:
– How well a model trained on the training set predicts the
right output for new instances is called generalization.
February 21, 2023 SIT1305 Machine Learning 187
Editor's Notes
Auto text expander
The aim is to get the model to generalize and classify new inputs appropriately. We can reduce the variance by having a more complex model, but then we risk having more bias. Inversely, if we have a model with lower complexity, the bias may be low, but then we risk having higher variance. Bias has a negative slope in response to model complexity while variance has a positive slope.