SlideShare a Scribd company logo
1 of 34
Machine Learning Models
and Linear Regression
Department of Computer Science
What is machine learning?
• "Field of study that gives computers the ability to learn without
being explicitly programmed“ -- Arthur Samuel (1959)
• "A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P, improves with
experience E.“ -- Tom Michel (1999)
Examples
• Database mining:
• Machine learning has recently become so big party because of the huge amount of
data being generated
• Large datasets from growth of automation web
• Sources of data include
• Web data (click-stream or click through data)
• Mine to understand users better
• Huge segment of silicon valley
• Medical records
• Electronic records -> turn records in knowledges
• Biological data
• Gene sequences, ML algorithms give a better understanding of human genome
• Engineering info
• Data from sensors, log reports, photos etc
• Applications that we cannot program by hand
• Autonomous helicopter
• Handwriting recognition
• This is very inexpensive because when you write an envelope, algorithms can
automatically route envelopes through the post
• Natural language processing (NLP)
• AI pertaining to language
• Computer vision
• AI pertaining vision
• Self customizing programs
• Netflix
• Amazon
• iTunes genius
• Take users info
• Learn based on your behavior
• Understand human learning and the brain
• If we can build systems that mimic (or try to mimic) how the brain works, this
may push our own understanding of the associated neurobiology
Types of learning algorithms
• Supervised learning
• Teach the computer how to do something, then let it use it;s new found
knowledge to do it
• Unsupervised learning
• Let the computer learn how to do something, and use this to determine
structure and patterns in data
• Reinforcement learning
Supervised learning - introduction
• Probably the most common problem type in machine learning
• Starting with an example
• How do we predict housing prices
• Collect data regarding housing prices and how they relate to size in feet
• Example problem: "Given this data, a friend has a house 750 square feet -
how much can they be expected to get?"
Unsupervised learning - introduction
• In unsupervised learning, we get unlabeled data
• Just told - here is a data set, can you structure it
• One way of doing this would be to cluster data into to groups
• This is a clustering algorithm
• Clustering algorithm
• Example of clustering algorithm
• Google news
• Groups news stories into cohesive groups
• Used in any other problems as well
• Genomics
• Microarray data
• Have a group of individuals
• On each measure expression of a gene
• Run algorithm to cluster individuals into types of people
• Organize computer clusters
• Identify potential weak spots or distribute workload effectively
• Social network analysis
• Customer data
• Astronomical data analysis
• Algorithms give amazing results
• Basically
• Can you automatically generate structure
• Because we don't give it the answer, it's unsupervised learning
RL
• An RL agent learns by interacting with its environment and observing the
results of these interactions. This mimics the fundamental way in which
humans (and animals alike) learn.
• The idea is commonly known as "cause and effect", and this undoubtedly is
the key to building up knowledge of our environment throughout our
lifetime.
• The "cause and effect" idea can be translated into the following steps for
an RL agent:
• The agent observes an input state
• An action is determined by a decision making function (policy)
• The action is performed
• The agent receives a scalar reward or reinforcement from the environment
• Information about the reward given for that state / action pair is recorded
Uses for Reinforcement Learning
• RL agents can learn without expert supervision, the type of problems
that are best suited to RL are complex problems where there appears
to be no obvious or easily programmable solution.
• Game playing - determining the best move to make in a game often depends
on a number of different factors, hence the number of possible states that
can exist in a particular game is usually very large.
• Control problems - such as elevator scheduling. Again, it is not obvious what
strategies would provide the best, most timely elevator service. For control
problems such as this, RL agents can be left to learn in a simulated
environment and eventually they will come up with good controlling policies.
Linear Regression with one variable
• Housing price data example used earlier
• Supervised learning regression problem
Linear regression
• (x,y) - single training example
• (xi, yj) - specific example (ith training example)
• i is an index to training set
• With our training set defined - how do we used it?
• Take training set
• Pass into a learning algorithm
• Algorithm outputs a function (denoted h ) (h = hypothesis)
• This function takes an input (e.g. size of new house)
• Tries to output the estimated value of Y
How do we represent hypothesis h ?
• Going to present h as;
• hθ(x) = θ0 + θ1x
• h(x) (shorthand)
• What does this mean?
• Means Y is a linear function of x!
• θi are parameters
• θ0 is zero condition
• θ1 is gradient
• This kind of function is a linear regression with one variable
• Also called univariate linear regression
Linear regression - implementation (cost function)
• A cost function lets us figure out how to fit the best straight line to our data
• Choosing values for θi (parameters)
• Different values give you different functions
• If θ0 is 1.5 and θ1 is 0 then we get straight line parallel with X along 1.5 @ y
• If θ1 is > 0 then we get a positive slope
• Based on our training set we want to generate parameters which
make the straight line
• Chosen these parameters so hθ(x) is close to y for our training examples
• Basically, uses xs in training set with hθ(x) to give output which is as close to the actual y
value as possible
• Think of hθ(x) as a "y imitator" - it tries to convert the x into y, and considering we
already have y we can evaluate how well hθ(x) does this
• To formalize this;
• We want to want to solve a minimization problem
• Minimize (hθ(x) - y)2
• i.e. minimize the difference between h(x) and y for each/any/every example
• Sum this over the training set
𝐽 θ0, θ1 =
1
2𝑚
𝑖=1
𝑚
(ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )2
• Minimize squared different between predicted house price and actual
house price1/2m
• 1/m - means we determine the average
• 1/2m the 2 makes the math a bit easier, and doesn't change the constants we
determine at all (i.e. half the smallest value is still the smallest value!)
• Minimizing θ0/θ1 means we get the values of θ0 and θ1 which find on
average the minimal deviation of x from y when we use those parameters
in our hypothesis function
• And we want to minimize this cost function
• Our cost function is (because of the summartion term) inherently looking at ALL the
data in the training set at any time
• Hypothesis - is like your prediction machine, throw in an x value, get a
putative y value
• Cost - is a way to, using your training data, determine values for your θ
values which make the hypothesis as accurate as possible
• This cost function is also called the squared error cost function
• This cost function is reasonable choice for most regression functions
• Probably most commonly used function
Linear Regression Problem
A deeper insight into the cost function
𝐽 θ0, θ1 =
1
2𝑚
𝑖=1
𝑚
(ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )2
• Generates a 3D surface plot where axis are
• X = θ1
• Z = θ0
• Y = J(θ0,θ1)
We can see that the height (y) indicates
the value of the cost function, so find where
y is at a minimum
• Instead of a surface plot we can use a contour figures/plots
• Set of ellipses in different colors
• Each colour is the same value of J(θ0, θ1), but obviously plot to different
locations because θ1 and θ0 will vary
• Imagine a bowl shape function coming out of the screen so the middle is the
concentric circles
• Each point (like the red one above) represents a pair of parameter
values for θ1 and θ0
• Our example here put the values at
• θ0 = ~800
• θ1 = ~-0.15
•What we really want is an efficient algorithm fro finding
the minimum for θ0 and θ1
Gradient descent algorithm
• Minimize cost function J
• Gradient descent
• Used all over machine learning for minimization
• Start by looking at a general J() function
• Problem
• We have J(θ0, θ1)
• We want to get min J(θ0, θ1)
• Gradient descent applies to more general functions
• J(θ0, θ1, θ2 .... θn)
• min J(θ0, θ1, θ2 .... θn)
How does Gradient Descent work?
• Start with initial guesses
• Start at 0,0 (or any other value)
• Keeping changing θ0 and θ1 a little bit to try and reduce J(θ0,θ1)
• Each time you change the parameters, you select the gradient which reduces J(θ0,θ1) the most
possible
• Repeat
• Do so until you converge to a local minimum
• Has an interesting property
• Where you start can determine which minimum you end up
• Here we can see one initialization point led to one local minimum
• The other led to a different one
Formal Definition
• Do the following until covergence
• What does this all mean?
• Update θj by setting it to (θj - α) times the partial derivative of the
cost function with respect to θj
• Here α is the learning rate.
• If α is big have an aggressive gradient descent
• If α is small take tiny steps
How this gradient descent algorithm is
implemented?
• Do this for θ0 and θ1
• For j = 0 and j = 1 means we simultaneously update both
• How do we do this?
• Compute the right hand side for both θ0 and θ1
• So we need a temp value
• Then, update θ0 and θ1 at the same time
• If you implement the non-simultaneous update it's not gradient descent,
and will behave weirdly
• But it might look sort of right - so it's important to remember this!
Linear regression with gradient descent
• Apply gradient descent to minimize the squared error cost function J(θ0, θ1)
• Now we have a partial derivative
• The linear regression cost function is always a convex function -
always has a single minimum
• Bowl shaped
• One global optima
• So gradient descent will always converge to global optima
• Initialize values to
• θ0 = 900
• θ1 = -0.1
• End up at a global minimum

More Related Content

Similar to Week 2 - ML models and Linear Regression.pptx

L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer visionEran Shlomo
 
Utah Code Camp 2014 - Learning from Data by Thomas Holloway
Utah Code Camp 2014 - Learning from Data by Thomas HollowayUtah Code Camp 2014 - Learning from Data by Thomas Holloway
Utah Code Camp 2014 - Learning from Data by Thomas HollowayThomas Holloway
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch Eran Shlomo
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)NYversity
 
Data structure and algorithm using java
Data structure and algorithm using javaData structure and algorithm using java
Data structure and algorithm using javaNarayan Sau
 
Genetic programming
Genetic programmingGenetic programming
Genetic programmingYun-Yan Chi
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sVidyasagar Bhargava
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMohsin Ul Haq
 
Algorithm analysis (All in one)
Algorithm analysis (All in one)Algorithm analysis (All in one)
Algorithm analysis (All in one)jehan1987
 
Astu DSA week 1-2.pdf
Astu DSA week 1-2.pdfAstu DSA week 1-2.pdf
Astu DSA week 1-2.pdfHailuSeyoum1
 
Analysis and Algorithms: basic Introduction
Analysis and Algorithms: basic IntroductionAnalysis and Algorithms: basic Introduction
Analysis and Algorithms: basic Introductionssuseraf8b2f
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 

Similar to Week 2 - ML models and Linear Regression.pptx (20)

Regression ppt
Regression pptRegression ppt
Regression ppt
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer vision
 
Utah Code Camp 2014 - Learning from Data by Thomas Holloway
Utah Code Camp 2014 - Learning from Data by Thomas HollowayUtah Code Camp 2014 - Learning from Data by Thomas Holloway
Utah Code Camp 2014 - Learning from Data by Thomas Holloway
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
Minterms
MintermsMinterms
Minterms
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)
 
Data structure and algorithm using java
Data structure and algorithm using javaData structure and algorithm using java
Data structure and algorithm using java
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Genetic programming
Genetic programmingGenetic programming
Genetic programming
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
 
Machine Learning using Support Vector Machine
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector Machine
 
Algorithm analysis (All in one)
Algorithm analysis (All in one)Algorithm analysis (All in one)
Algorithm analysis (All in one)
 
Astu DSA week 1-2.pdf
Astu DSA week 1-2.pdfAstu DSA week 1-2.pdf
Astu DSA week 1-2.pdf
 
e.ppt
e.ppte.ppt
e.ppt
 
Analysis and Algorithms: basic Introduction
Analysis and Algorithms: basic IntroductionAnalysis and Algorithms: basic Introduction
Analysis and Algorithms: basic Introduction
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Week 2 - ML models and Linear Regression.pptx

  • 1. Machine Learning Models and Linear Regression Department of Computer Science
  • 2. What is machine learning? • "Field of study that gives computers the ability to learn without being explicitly programmed“ -- Arthur Samuel (1959) • "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.“ -- Tom Michel (1999)
  • 3. Examples • Database mining: • Machine learning has recently become so big party because of the huge amount of data being generated • Large datasets from growth of automation web • Sources of data include • Web data (click-stream or click through data) • Mine to understand users better • Huge segment of silicon valley • Medical records • Electronic records -> turn records in knowledges • Biological data • Gene sequences, ML algorithms give a better understanding of human genome • Engineering info • Data from sensors, log reports, photos etc
  • 4. • Applications that we cannot program by hand • Autonomous helicopter • Handwriting recognition • This is very inexpensive because when you write an envelope, algorithms can automatically route envelopes through the post • Natural language processing (NLP) • AI pertaining to language • Computer vision • AI pertaining vision
  • 5. • Self customizing programs • Netflix • Amazon • iTunes genius • Take users info • Learn based on your behavior • Understand human learning and the brain • If we can build systems that mimic (or try to mimic) how the brain works, this may push our own understanding of the associated neurobiology
  • 6. Types of learning algorithms • Supervised learning • Teach the computer how to do something, then let it use it;s new found knowledge to do it • Unsupervised learning • Let the computer learn how to do something, and use this to determine structure and patterns in data • Reinforcement learning
  • 7. Supervised learning - introduction • Probably the most common problem type in machine learning • Starting with an example • How do we predict housing prices • Collect data regarding housing prices and how they relate to size in feet • Example problem: "Given this data, a friend has a house 750 square feet - how much can they be expected to get?"
  • 8. Unsupervised learning - introduction • In unsupervised learning, we get unlabeled data • Just told - here is a data set, can you structure it • One way of doing this would be to cluster data into to groups • This is a clustering algorithm • Clustering algorithm • Example of clustering algorithm • Google news • Groups news stories into cohesive groups • Used in any other problems as well • Genomics • Microarray data • Have a group of individuals • On each measure expression of a gene • Run algorithm to cluster individuals into types of people
  • 9. • Organize computer clusters • Identify potential weak spots or distribute workload effectively • Social network analysis • Customer data • Astronomical data analysis • Algorithms give amazing results • Basically • Can you automatically generate structure • Because we don't give it the answer, it's unsupervised learning
  • 10. RL • An RL agent learns by interacting with its environment and observing the results of these interactions. This mimics the fundamental way in which humans (and animals alike) learn. • The idea is commonly known as "cause and effect", and this undoubtedly is the key to building up knowledge of our environment throughout our lifetime. • The "cause and effect" idea can be translated into the following steps for an RL agent: • The agent observes an input state • An action is determined by a decision making function (policy) • The action is performed • The agent receives a scalar reward or reinforcement from the environment • Information about the reward given for that state / action pair is recorded
  • 11. Uses for Reinforcement Learning • RL agents can learn without expert supervision, the type of problems that are best suited to RL are complex problems where there appears to be no obvious or easily programmable solution. • Game playing - determining the best move to make in a game often depends on a number of different factors, hence the number of possible states that can exist in a particular game is usually very large. • Control problems - such as elevator scheduling. Again, it is not obvious what strategies would provide the best, most timely elevator service. For control problems such as this, RL agents can be left to learn in a simulated environment and eventually they will come up with good controlling policies.
  • 12. Linear Regression with one variable • Housing price data example used earlier • Supervised learning regression problem
  • 14. • (x,y) - single training example • (xi, yj) - specific example (ith training example) • i is an index to training set
  • 15. • With our training set defined - how do we used it? • Take training set • Pass into a learning algorithm • Algorithm outputs a function (denoted h ) (h = hypothesis) • This function takes an input (e.g. size of new house) • Tries to output the estimated value of Y
  • 16. How do we represent hypothesis h ? • Going to present h as; • hθ(x) = θ0 + θ1x • h(x) (shorthand) • What does this mean? • Means Y is a linear function of x! • θi are parameters • θ0 is zero condition • θ1 is gradient • This kind of function is a linear regression with one variable • Also called univariate linear regression
  • 17. Linear regression - implementation (cost function) • A cost function lets us figure out how to fit the best straight line to our data • Choosing values for θi (parameters) • Different values give you different functions • If θ0 is 1.5 and θ1 is 0 then we get straight line parallel with X along 1.5 @ y • If θ1 is > 0 then we get a positive slope
  • 18.
  • 19.
  • 20. • Based on our training set we want to generate parameters which make the straight line • Chosen these parameters so hθ(x) is close to y for our training examples • Basically, uses xs in training set with hθ(x) to give output which is as close to the actual y value as possible • Think of hθ(x) as a "y imitator" - it tries to convert the x into y, and considering we already have y we can evaluate how well hθ(x) does this • To formalize this; • We want to want to solve a minimization problem • Minimize (hθ(x) - y)2 • i.e. minimize the difference between h(x) and y for each/any/every example • Sum this over the training set
  • 21. 𝐽 θ0, θ1 = 1 2𝑚 𝑖=1 𝑚 (ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )2 • Minimize squared different between predicted house price and actual house price1/2m • 1/m - means we determine the average • 1/2m the 2 makes the math a bit easier, and doesn't change the constants we determine at all (i.e. half the smallest value is still the smallest value!) • Minimizing θ0/θ1 means we get the values of θ0 and θ1 which find on average the minimal deviation of x from y when we use those parameters in our hypothesis function • And we want to minimize this cost function • Our cost function is (because of the summartion term) inherently looking at ALL the data in the training set at any time
  • 22. • Hypothesis - is like your prediction machine, throw in an x value, get a putative y value • Cost - is a way to, using your training data, determine values for your θ values which make the hypothesis as accurate as possible • This cost function is also called the squared error cost function • This cost function is reasonable choice for most regression functions • Probably most commonly used function
  • 24. A deeper insight into the cost function 𝐽 θ0, θ1 = 1 2𝑚 𝑖=1 𝑚 (ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )2 • Generates a 3D surface plot where axis are • X = θ1 • Z = θ0 • Y = J(θ0,θ1) We can see that the height (y) indicates the value of the cost function, so find where y is at a minimum
  • 25. • Instead of a surface plot we can use a contour figures/plots • Set of ellipses in different colors • Each colour is the same value of J(θ0, θ1), but obviously plot to different locations because θ1 and θ0 will vary • Imagine a bowl shape function coming out of the screen so the middle is the concentric circles • Each point (like the red one above) represents a pair of parameter values for θ1 and θ0 • Our example here put the values at • θ0 = ~800 • θ1 = ~-0.15 •What we really want is an efficient algorithm fro finding the minimum for θ0 and θ1
  • 26. Gradient descent algorithm • Minimize cost function J • Gradient descent • Used all over machine learning for minimization • Start by looking at a general J() function • Problem • We have J(θ0, θ1) • We want to get min J(θ0, θ1) • Gradient descent applies to more general functions • J(θ0, θ1, θ2 .... θn) • min J(θ0, θ1, θ2 .... θn)
  • 27. How does Gradient Descent work? • Start with initial guesses • Start at 0,0 (or any other value) • Keeping changing θ0 and θ1 a little bit to try and reduce J(θ0,θ1) • Each time you change the parameters, you select the gradient which reduces J(θ0,θ1) the most possible • Repeat • Do so until you converge to a local minimum • Has an interesting property • Where you start can determine which minimum you end up • Here we can see one initialization point led to one local minimum • The other led to a different one
  • 28. Formal Definition • Do the following until covergence • What does this all mean? • Update θj by setting it to (θj - α) times the partial derivative of the cost function with respect to θj • Here α is the learning rate. • If α is big have an aggressive gradient descent • If α is small take tiny steps
  • 29. How this gradient descent algorithm is implemented? • Do this for θ0 and θ1 • For j = 0 and j = 1 means we simultaneously update both • How do we do this? • Compute the right hand side for both θ0 and θ1 • So we need a temp value • Then, update θ0 and θ1 at the same time • If you implement the non-simultaneous update it's not gradient descent, and will behave weirdly • But it might look sort of right - so it's important to remember this!
  • 30. Linear regression with gradient descent
  • 31. • Apply gradient descent to minimize the squared error cost function J(θ0, θ1) • Now we have a partial derivative
  • 32.
  • 33. • The linear regression cost function is always a convex function - always has a single minimum • Bowl shaped • One global optima • So gradient descent will always converge to global optima
  • 34. • Initialize values to • θ0 = 900 • θ1 = -0.1 • End up at a global minimum

Editor's Notes

  1. the loss function is to capture the difference between the actual and predicted values for a single record whereas cost functions aggregate the difference for the entire training dataset.