DSCI 552 MACHINE LEARNING FOR
DATA SCIENCE
Ke-Thia Yao
Lecture 1, 12 January 2023
1
Textbook
2
 Ethem Alpaydin
 Introduction to Machine
Learning, Fourth Edition
 MIT Press
 ISBN 9780262043793
Optional Textbook for Scikit-Learn
 Aurélien Géron
 Hands-On Machine Learning with
Scikit-Learn, Keras, and
TensorFlow, 3rd Edition
 Available online through USC
library
https://libraries.usc.edu/databases/safari-
books
 Scikit-Learn website provides
excellent documentation and user
guides
 https://scikit-
learn.org/stable/index.html
3
Office Hours
4
 USC ISI Office:
 4676 Admiralty Way, Suite 835
 Marina del Rey, CA 90292
 (310) 448-8297
 kyao@isi.edu
 USC Marina del Rey Shuttle
 http://transnet.usc.edu/index.php/bus-map-schedules/
 Office Hours:
 Tuesdays 2-4PM on Zoom
https://usc.zoom.us/j/95896335860?pwd=MkhtMEsvR1BsUThvU3hMYjNHZE5Gd
z09&from=addon
 Thursdays 2-4PM on campus, location TDB
Grading
5
 Homework / Programming Assignments: 35%
 Class participation: 5%
 Midterm: 20%
 Final Exam: 20%
 Semester Project: 20%
Viterbi Code of Academic Integrity
"A Community of Honor"
6
We are the USC Viterbi School of Engineering, a community of
academic and professional integrity. As students, faculty, and staff our
fundamental purpose is the pursuit of knowledge and truth. We
recognize that ethics and honesty are essential to this mission and
pledge to uphold the highest standards of these principles. As
responsible men and women of engineering, our lifelong commitment
is to respect others and be fair in all endeavors. Our actions will reflect
and promote a community of honor.
Schedule
7
Date Topic
12-Jan-23 Introduction to ML, Supervised learning, Bias, K-nearest neighbor vs
19-Jan-23 Bayesian decision theory, Naïve Bayes, Jupyter, SciKit Learn
26-Jan-23 Parametric Methods, Bias/Variance Trade-off
2-Feb-23 Nonparametric methods, Decision Trees
9-Feb-23 Dimension reduction
16-Feb-23 Clustering
23-Feb-23 Linear Discrimination, Multilayer Perceptrons,
2-Mar-23 Midterm
9-Mar-23 Deep Learning
16-Mar-23 Spring Recess
23-Mar-23 Local Models, Kernel Machines
30-Mar-23 Graph Models, Boltzmann Machines, Quantum Adiabetic Annealer
6-Apr-23 Hidden Markov Models
13-Apr-23 Combining Multiple Learners
20-Apr-23 Reinforcement Learning
27-Apr-23 Presentation
What is Machine Learning
8
 Machine learning is the science (and art) of programming computers so they
can learn from data
 Here is a slightly more general definition:
[Machine learning is the] field of study that gives computers the ability to learn
without being explicitly programmed.
Arthur Samuel, 1959
 And a more engineering-oriented one:
A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P,
improves with experience E.
Tom Mitchell, 1997
Why Use Machine Learning
9
Traditional approach Machine learning approach
Why “Learn” ?
10
 Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
 There is no need to “learn” to calculate payroll
 Learning is used when:
 Human expertise does not exist (navigating on Mars),
 Humans are unable to explain their expertise (speech recognition)
 Solution changes in time (routing on a computer network)
 Solution needs to be adapted to particular cases (user biometrics)
Big Data
11
 Widespread use of personal computers and wireless communication
leads to “big data”
 We are both producers and consumers of data
 Data is not random, it has structure, e.g., customer behavior
 We need “big theory” to extract that structure from data for
(a) Understanding the process
(b) Making predictions for the future
 Cheaper computational power (e.g., GPUs).
Why Mine Data? Scientific Viewpoint
12
 Data collected and stored at
enormous speeds (GB/hour)
 remote sensors on a satellite
 telescopes scanning the skies
 microarrays generating gene expression data
 scientific simulations generating terabytes of data
 Traditional techniques infeasible for raw data
 Data mining may help scientists
 in classifying and segmenting data
 in Hypothesis Formation
Why Mine Data? Commercial Viewpoint
13
 Lots of data is being collected
and warehoused
 Web data, e-commerce
 purchases at department/
grocery stores
 Bank/Credit Card
transactions
 Computers have become cheaper and more powerful
 Competitive Pressure is Strong
 Provide better, customized services for an edge (e.g. in Customer
Relationship Management)
Big Data Opportunity
14
 Unlock significant value by making information transparent and usable
 Collect and store more accurate and detailed data in digital form
 Allows ever-narrower segmentation of customers and provide precise
tailored products & services
 Sophisticated analytics to substantially improve decision making
 Improve the next generation of products and services
Source: McKinsey & Company
Big Data Opportunity (cont.)
15
McKinsey Report
 Data have swept into every industry and business function and are
now an important factor of production, alongside labor and capital.
 The use of big data will become a key basis of competition and
growth for individual firms.
 The use of big data will underpin new waves of productivity growth
and consumer surplus.
 There will be a shortage of talent necessary for organizations to take
advantage of big data.
Data Mining
16
 Retail: Market basket analysis, Customer relationship management
(CRM)
 Finance: Credit scoring, fraud detection
 Manufacturing: Control, robotics, troubleshooting
 Medicine: Medical diagnosis
 Telecommunications: Spam filters, intrusion detection
 Bioinformatics: Motifs, alignment
 Web mining: Search engines
 ...
What We Talk About When We Talk About
“Learning”
17
 Learning general models from a data of particular examples
 Data is cheap and abundant (data warehouses, data marts);
knowledge is expensive and scarce.
 Example in retail: Customer transactions to consumer behavior:
People who bought “Blink” also bought “Outliers” (www.amazon.com)
 Build a model that is a good and useful approximation to the data.
What is Machine Learning?
18
 Optimize a performance criterion using example data or past
experience
 Role of Statistics: Inference from a sample
 Role of Computer science: Efficient algorithms to
 Solve the optimization problem
 Representing and evaluating the model for inference
 Role of domain knowledge
 Selecting the right attributes, representation and datasets
Machine Learning Tasks
19
 Supervised Learning
 Classification
 Regression
 Unsupervised Learning
 Association
 Reinforcement Learning
Supervised Learning: Classification
 Given training set with labels
 Predict label for a new instance that is not in the training set
20
Classification
21
 Example: Credit scoring
 Differentiating between
low-risk and high-risk
customers from their
income and savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
Classification: Applications
22
 Aka Pattern recognition
 Face recognition: Pose, lighting, occlusion (glasses, beard), make-up,
hair style
 Character recognition: Different handwriting styles.
 Speech recognition: Temporal dependency.
 Medical diagnosis: From symptoms to illnesses
 Biometrics: Recognition/authentication using physical and/or
behavioral characteristics: Face, iris, signature, etc
 Outlier/novelty detection:
Face Recognition
23
Training examples of a person
Test images
ORL dataset,
AT&T Laboratories, Cambridge UK
Supervised Learning: Regression
 Given training set with target numerical values
 Predict target for a new instance that is not in the training set
24
Regression
 Example: Price of a used car
 x : car attributes
y : price
y = g (x | θ )
g ( ) model,
θ parameters
25
y = wx+w0
Regression Applications
26
 Navigating a car: Angle of the steering
 Kinematics of a robot arm
α1= g1(x,y)
α2= g2(x,y)
α1
α2
(x,y)
 Response surface design
Supervised Learning: Uses
27
 Prediction of future cases: Use the rule to predict the output for future
inputs
 Knowledge extraction: The rule is easy to understand
 Compression: The rule is simpler than the data it explains
 Outlier detection: Exceptions that are not covered by the rule, e.g.,
fraud
Unsupervised Learning
28
 Learning “what normally happens”
 Clustering: Grouping similar instances
 Example applications
 Customer segmentation in CRM
 Image compression: Color quantization
 Bioinformatics: Learning motifs (nucleotide or amino-acid sequence patterns)
Unsupervised Learning: Clustering
29
 Given training set with no labels
 Group similar instances into clusters
Unsupervised Learning:
Anomaly Detection
30
 Given training set with no labels
 Assign new instance as either normal or anomaly
Unsupervised Learning:
Dimension Reduction
31
 Given training set with high number of features (say images)
 Output training set with lower number of features
Learning Associations
32
 Basket analysis
 Given training dataset containing baskets of products/services
 Find
P (Y | X ) probability that somebody who buys X also buys Y where X
and Y are products/services.
Example: P ( chips | beer ) = 0.7
Reinforcement Learning
33
 Learning a policy: A sequence of
outputs
 No supervised output but delayed
reward
 Credit assignment problem
 Game playing
 Robot in a maze
 Multiple agents, partial
observability, ...
The data mining process
34
Data Mining Process
35
Time to
complete
Importance to
success
1. Exploring the problem
20% 80%
2. Exploring the solution
3. Implementation specification
4. Data mining
80% 20%
a. Data preparation
b. Data surveying
c. Data modeling
Inductive Bias
36
 Important decisions in learning systems:
 Structure of the model (language)
 Order to search the space of structures
 Way that overfitting to the particular training data is avoided
 Type of inductive bias:
 Language bias
 Search bias
 Overfitting-avoidance bias
37
𝑦𝑦 = 0.5
Linear Least Square Classification
38
 Predictive Method: Linear regression
 Find the best line f(x) that divides the space into positive and negative
regions
Linear Least Square Fit Bias
39
 Language bias
 The function f(x) is linear
 Search bias
 Analytical solution minimizing the error, sum of the squares
 Overfitting-avoidance bias
 Not needed. Language is too simple.
K Nearest Neighbor
40
 Let the k nearest neighbor vote for classification
41
42
Low Bias,
High Variance
K Nearest Neighbor Bias
43
 Language bias
 Represent point by its k nearest neighbor
 Search bias
 Deterministic
 Overfitting-avoidance bias
 Adjust k using validation/dev data set
Model Selection Using Holdout Validation
44
45
Optimal Bayes Decision Boundary
46
Decision Boundaries
47
Generalization as search
48
 Inductive learning: find a concept description that fits the data
 Example: rule sets as description language
 Enormous, but finite, search space
 Simple solution:
 enumerate the concept space
 eliminate descriptions that do not fit examples
 surviving descriptions contain target concept
Bias and Learning Example
49
ID Pump
Type
Pump
Size
Max
Load
Pump
Eff.
Class
1 A Large Low High Normal
2 B Small High Low Failure
3 B Large High High Normal
4 … … … … …
 Attributes
 ID is integer
 Pump Type is {A, B}
 Pump Size is {Large, Small}
 Max Load is {High, Low}
 Pump Eff. Is {High, Low}
 Class is {Normal, Failure}
• Ignoring the ID and Class attributes, how many
distinct instances are possible?
The size of the instance space is = 2*2*2*2 = 16
Modeling Language: Hypothesis Space
50
 Suppose for this problem, the four attributes (instance language) precisely capture
the features of the domain
 Let the instance space 𝐼𝐼
 Instances <type, size, load, efficiency>
𝑖𝑖1=<A, Large, High, High>
𝑖𝑖2=<A, Large, High, Low>
𝑖𝑖3=<A, Large, Low, High>
…
𝑖𝑖15=<B, Small, Low, High>
𝑖𝑖16=<B, Small, Low, Low>
}
,
,
,
,
{ 16
3
2
1 i
i
i
i
I 
=
Power Set
51
 Power set of a set S is set of all possible subset of S.
 Power set of {a, b, c} is
Hypothesis Space
52
 Let the modeling language be the power set of I
}}
,
,
,
{
,
},
,
,
{
,
},
,
,
{
},
,
,
{
},
,
{
,
},
,
{
},
,
{
},
{
,
},
{
},
{
{{},
2
16
2
1
16
15
14
4
2
1
3
2
1
16
15
3
1
2
1
16
2
1
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
H I





=
=
}
,
,
{
;
65536
2 65536
1
16
h
h
H
H 
=
=
=
What is the size of modeling language (hypothesis space)?
Learning Algorithm
53
 A hypothesis h is consistent with the training instance i,
 if i is labeled normal, then h contains i
 If i is labeled failure , then h does not contain i
 Learning the model
 Initially, let set of candidate C = H
 Remove all hypothesis for C that is not consistent with the instances in the
training set
 Classification: given an instance i, each hypothesis h in C votes
 +1 if the hypothesis h is consistent with i
 -1 otherwise
Example Training
54
 Training set:
a = normal, c = failure
 Candidate hypothesis consistent with training set must contain instance
a, and not instance c
Is Unbiased Learning Possible?
55
 There are only 16 unique instances
 Suppose the training set contains 15 instances (i1, i2, …, i15), and they
are all labeled failure
 What is the content of candidate set C?
 C = { {}, {i16} }
 What is the vote count for i16
 Zero
Summary
56
 Machine learning
 Analysis of often large amounts of data to find unsuspected patterns and to
summarize in novel ways
 Machine learning process involves
 Exploring the problem, exploring the solution, implementation specification, data
preparation, data surveying, data modeling
 Machine learning task types
 Association
 Supervised Learning: Classification, Regression
 Unsupervised Learning
 Importance of inductive bias in data mining

DSCI 552 machine learning for data science

  • 1.
    DSCI 552 MACHINELEARNING FOR DATA SCIENCE Ke-Thia Yao Lecture 1, 12 January 2023 1
  • 2.
    Textbook 2  Ethem Alpaydin Introduction to Machine Learning, Fourth Edition  MIT Press  ISBN 9780262043793
  • 3.
    Optional Textbook forScikit-Learn  Aurélien Géron  Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition  Available online through USC library https://libraries.usc.edu/databases/safari- books  Scikit-Learn website provides excellent documentation and user guides  https://scikit- learn.org/stable/index.html 3
  • 4.
    Office Hours 4  USCISI Office:  4676 Admiralty Way, Suite 835  Marina del Rey, CA 90292  (310) 448-8297  kyao@isi.edu  USC Marina del Rey Shuttle  http://transnet.usc.edu/index.php/bus-map-schedules/  Office Hours:  Tuesdays 2-4PM on Zoom https://usc.zoom.us/j/95896335860?pwd=MkhtMEsvR1BsUThvU3hMYjNHZE5Gd z09&from=addon  Thursdays 2-4PM on campus, location TDB
  • 5.
    Grading 5  Homework /Programming Assignments: 35%  Class participation: 5%  Midterm: 20%  Final Exam: 20%  Semester Project: 20%
  • 6.
    Viterbi Code ofAcademic Integrity "A Community of Honor" 6 We are the USC Viterbi School of Engineering, a community of academic and professional integrity. As students, faculty, and staff our fundamental purpose is the pursuit of knowledge and truth. We recognize that ethics and honesty are essential to this mission and pledge to uphold the highest standards of these principles. As responsible men and women of engineering, our lifelong commitment is to respect others and be fair in all endeavors. Our actions will reflect and promote a community of honor.
  • 7.
    Schedule 7 Date Topic 12-Jan-23 Introductionto ML, Supervised learning, Bias, K-nearest neighbor vs 19-Jan-23 Bayesian decision theory, Naïve Bayes, Jupyter, SciKit Learn 26-Jan-23 Parametric Methods, Bias/Variance Trade-off 2-Feb-23 Nonparametric methods, Decision Trees 9-Feb-23 Dimension reduction 16-Feb-23 Clustering 23-Feb-23 Linear Discrimination, Multilayer Perceptrons, 2-Mar-23 Midterm 9-Mar-23 Deep Learning 16-Mar-23 Spring Recess 23-Mar-23 Local Models, Kernel Machines 30-Mar-23 Graph Models, Boltzmann Machines, Quantum Adiabetic Annealer 6-Apr-23 Hidden Markov Models 13-Apr-23 Combining Multiple Learners 20-Apr-23 Reinforcement Learning 27-Apr-23 Presentation
  • 8.
    What is MachineLearning 8  Machine learning is the science (and art) of programming computers so they can learn from data  Here is a slightly more general definition: [Machine learning is the] field of study that gives computers the ability to learn without being explicitly programmed. Arthur Samuel, 1959  And a more engineering-oriented one: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. Tom Mitchell, 1997
  • 9.
    Why Use MachineLearning 9 Traditional approach Machine learning approach
  • 10.
    Why “Learn” ? 10 Machine learning is programming computers to optimize a performance criterion using example data or past experience.  There is no need to “learn” to calculate payroll  Learning is used when:  Human expertise does not exist (navigating on Mars),  Humans are unable to explain their expertise (speech recognition)  Solution changes in time (routing on a computer network)  Solution needs to be adapted to particular cases (user biometrics)
  • 11.
    Big Data 11  Widespreaduse of personal computers and wireless communication leads to “big data”  We are both producers and consumers of data  Data is not random, it has structure, e.g., customer behavior  We need “big theory” to extract that structure from data for (a) Understanding the process (b) Making predictions for the future  Cheaper computational power (e.g., GPUs).
  • 12.
    Why Mine Data?Scientific Viewpoint 12  Data collected and stored at enormous speeds (GB/hour)  remote sensors on a satellite  telescopes scanning the skies  microarrays generating gene expression data  scientific simulations generating terabytes of data  Traditional techniques infeasible for raw data  Data mining may help scientists  in classifying and segmenting data  in Hypothesis Formation
  • 13.
    Why Mine Data?Commercial Viewpoint 13  Lots of data is being collected and warehoused  Web data, e-commerce  purchases at department/ grocery stores  Bank/Credit Card transactions  Computers have become cheaper and more powerful  Competitive Pressure is Strong  Provide better, customized services for an edge (e.g. in Customer Relationship Management)
  • 14.
    Big Data Opportunity 14 Unlock significant value by making information transparent and usable  Collect and store more accurate and detailed data in digital form  Allows ever-narrower segmentation of customers and provide precise tailored products & services  Sophisticated analytics to substantially improve decision making  Improve the next generation of products and services Source: McKinsey & Company
  • 15.
    Big Data Opportunity(cont.) 15 McKinsey Report  Data have swept into every industry and business function and are now an important factor of production, alongside labor and capital.  The use of big data will become a key basis of competition and growth for individual firms.  The use of big data will underpin new waves of productivity growth and consumer surplus.  There will be a shortage of talent necessary for organizations to take advantage of big data.
  • 16.
    Data Mining 16  Retail:Market basket analysis, Customer relationship management (CRM)  Finance: Credit scoring, fraud detection  Manufacturing: Control, robotics, troubleshooting  Medicine: Medical diagnosis  Telecommunications: Spam filters, intrusion detection  Bioinformatics: Motifs, alignment  Web mining: Search engines  ...
  • 17.
    What We TalkAbout When We Talk About “Learning” 17  Learning general models from a data of particular examples  Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce.  Example in retail: Customer transactions to consumer behavior: People who bought “Blink” also bought “Outliers” (www.amazon.com)  Build a model that is a good and useful approximation to the data.
  • 18.
    What is MachineLearning? 18  Optimize a performance criterion using example data or past experience  Role of Statistics: Inference from a sample  Role of Computer science: Efficient algorithms to  Solve the optimization problem  Representing and evaluating the model for inference  Role of domain knowledge  Selecting the right attributes, representation and datasets
  • 19.
    Machine Learning Tasks 19 Supervised Learning  Classification  Regression  Unsupervised Learning  Association  Reinforcement Learning
  • 20.
    Supervised Learning: Classification Given training set with labels  Predict label for a new instance that is not in the training set 20
  • 21.
    Classification 21  Example: Creditscoring  Differentiating between low-risk and high-risk customers from their income and savings Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk
  • 22.
    Classification: Applications 22  AkaPattern recognition  Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style  Character recognition: Different handwriting styles.  Speech recognition: Temporal dependency.  Medical diagnosis: From symptoms to illnesses  Biometrics: Recognition/authentication using physical and/or behavioral characteristics: Face, iris, signature, etc  Outlier/novelty detection:
  • 23.
    Face Recognition 23 Training examplesof a person Test images ORL dataset, AT&T Laboratories, Cambridge UK
  • 24.
    Supervised Learning: Regression Given training set with target numerical values  Predict target for a new instance that is not in the training set 24
  • 25.
    Regression  Example: Priceof a used car  x : car attributes y : price y = g (x | θ ) g ( ) model, θ parameters 25 y = wx+w0
  • 26.
    Regression Applications 26  Navigatinga car: Angle of the steering  Kinematics of a robot arm α1= g1(x,y) α2= g2(x,y) α1 α2 (x,y)  Response surface design
  • 27.
    Supervised Learning: Uses 27 Prediction of future cases: Use the rule to predict the output for future inputs  Knowledge extraction: The rule is easy to understand  Compression: The rule is simpler than the data it explains  Outlier detection: Exceptions that are not covered by the rule, e.g., fraud
  • 28.
    Unsupervised Learning 28  Learning“what normally happens”  Clustering: Grouping similar instances  Example applications  Customer segmentation in CRM  Image compression: Color quantization  Bioinformatics: Learning motifs (nucleotide or amino-acid sequence patterns)
  • 29.
    Unsupervised Learning: Clustering 29 Given training set with no labels  Group similar instances into clusters
  • 30.
    Unsupervised Learning: Anomaly Detection 30 Given training set with no labels  Assign new instance as either normal or anomaly
  • 31.
    Unsupervised Learning: Dimension Reduction 31 Given training set with high number of features (say images)  Output training set with lower number of features
  • 32.
    Learning Associations 32  Basketanalysis  Given training dataset containing baskets of products/services  Find P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services. Example: P ( chips | beer ) = 0.7
  • 33.
    Reinforcement Learning 33  Learninga policy: A sequence of outputs  No supervised output but delayed reward  Credit assignment problem  Game playing  Robot in a maze  Multiple agents, partial observability, ...
  • 34.
    The data miningprocess 34
  • 35.
    Data Mining Process 35 Timeto complete Importance to success 1. Exploring the problem 20% 80% 2. Exploring the solution 3. Implementation specification 4. Data mining 80% 20% a. Data preparation b. Data surveying c. Data modeling
  • 36.
    Inductive Bias 36  Importantdecisions in learning systems:  Structure of the model (language)  Order to search the space of structures  Way that overfitting to the particular training data is avoided  Type of inductive bias:  Language bias  Search bias  Overfitting-avoidance bias
  • 37.
  • 38.
    Linear Least SquareClassification 38  Predictive Method: Linear regression  Find the best line f(x) that divides the space into positive and negative regions
  • 39.
    Linear Least SquareFit Bias 39  Language bias  The function f(x) is linear  Search bias  Analytical solution minimizing the error, sum of the squares  Overfitting-avoidance bias  Not needed. Language is too simple.
  • 40.
    K Nearest Neighbor 40 Let the k nearest neighbor vote for classification
  • 41.
  • 42.
  • 43.
    K Nearest NeighborBias 43  Language bias  Represent point by its k nearest neighbor  Search bias  Deterministic  Overfitting-avoidance bias  Adjust k using validation/dev data set
  • 44.
    Model Selection UsingHoldout Validation 44
  • 45.
  • 46.
  • 47.
  • 48.
    Generalization as search 48 Inductive learning: find a concept description that fits the data  Example: rule sets as description language  Enormous, but finite, search space  Simple solution:  enumerate the concept space  eliminate descriptions that do not fit examples  surviving descriptions contain target concept
  • 49.
    Bias and LearningExample 49 ID Pump Type Pump Size Max Load Pump Eff. Class 1 A Large Low High Normal 2 B Small High Low Failure 3 B Large High High Normal 4 … … … … …  Attributes  ID is integer  Pump Type is {A, B}  Pump Size is {Large, Small}  Max Load is {High, Low}  Pump Eff. Is {High, Low}  Class is {Normal, Failure} • Ignoring the ID and Class attributes, how many distinct instances are possible? The size of the instance space is = 2*2*2*2 = 16
  • 50.
    Modeling Language: HypothesisSpace 50  Suppose for this problem, the four attributes (instance language) precisely capture the features of the domain  Let the instance space 𝐼𝐼  Instances <type, size, load, efficiency> 𝑖𝑖1=<A, Large, High, High> 𝑖𝑖2=<A, Large, High, Low> 𝑖𝑖3=<A, Large, Low, High> … 𝑖𝑖15=<B, Small, Low, High> 𝑖𝑖16=<B, Small, Low, Low> } , , , , { 16 3 2 1 i i i i I  =
  • 51.
    Power Set 51  Powerset of a set S is set of all possible subset of S.  Power set of {a, b, c} is
  • 52.
    Hypothesis Space 52  Letthe modeling language be the power set of I }} , , , { , }, , , { , }, , , { }, , , { }, , { , }, , { }, , { }, { , }, { }, { {{}, 2 16 2 1 16 15 14 4 2 1 3 2 1 16 15 3 1 2 1 16 2 1 i i i i i i i i i i i i i i i i i i i i i H I      = = } , , { ; 65536 2 65536 1 16 h h H H  = = = What is the size of modeling language (hypothesis space)?
  • 53.
    Learning Algorithm 53  Ahypothesis h is consistent with the training instance i,  if i is labeled normal, then h contains i  If i is labeled failure , then h does not contain i  Learning the model  Initially, let set of candidate C = H  Remove all hypothesis for C that is not consistent with the instances in the training set  Classification: given an instance i, each hypothesis h in C votes  +1 if the hypothesis h is consistent with i  -1 otherwise
  • 54.
    Example Training 54  Trainingset: a = normal, c = failure  Candidate hypothesis consistent with training set must contain instance a, and not instance c
  • 55.
    Is Unbiased LearningPossible? 55  There are only 16 unique instances  Suppose the training set contains 15 instances (i1, i2, …, i15), and they are all labeled failure  What is the content of candidate set C?  C = { {}, {i16} }  What is the vote count for i16  Zero
  • 56.
    Summary 56  Machine learning Analysis of often large amounts of data to find unsuspected patterns and to summarize in novel ways  Machine learning process involves  Exploring the problem, exploring the solution, implementation specification, data preparation, data surveying, data modeling  Machine learning task types  Association  Supervised Learning: Classification, Regression  Unsupervised Learning  Importance of inductive bias in data mining