SlideShare a Scribd company logo
1 of 22
Download to read offline
Machine Learning
Roughly speaking, for a given learning task, with a given finite amount of training data, the
best generalization performance will be achieved if the right balance is struck between the
accuracy attained on that particular training set, and the “capacity” of the machine, that is, the
ability of the machine to learn any training set without error. A machine with too much capacity
is like a botanist with a photographic memory who, when presented with a new tree,
concludes that it is not a tree because it has a different number of leaves from anything she
has seen before; a machine with too little capacity is like the botanist’s lazy brother, who
declares that if it’s green, it’s a tree. Neither can generalize well. The exploration and
formalization of these concepts has resulted in one of the shining peaks of the theory of
statistical learning.

(Vapnik, 1979)
What is machine learning?



  Data         Model      Output


 examples      training    Predictions
                           Classifications
Why: Face Recognition?     Clusters
                           Ordinals
Categories of problems
By output:
Clustering          Regression        Prediction



 Classification            Ordinal Reg.


By input:
       Vector, X      Time Series, x(t)
One size never fits all…
• Improving an algorithm:
  – First option: better features
     • Visualize classes
     • Trends
     • Histograms    WEKA or GGOBI
  – Next: make the algorithm smarter (more complicated)
     • Interaction of features
     • Better objective and training criteria
Categories of ML algorithms
            By training:
                         Supervised (labeled)                           Unsupervised (unlabeled)


            By model:
          Non-parametric                                      Kernel                                         Parametric
                 Raw data only                                methods                                   Model parameters only
         40                                             40                                      40


         30                                             30                                      30           y=1 + 0.5t + 4t2 - t3
         20                                             20                                      20
output




                                               output




         10                                             10                                      10


           0                                              0                                       0


         -10                                            -10                                     -10

         -20                                            -20                                     -20
            -4      -2    0           2    4        6      -4    -2     0           2   4   6      -4   -2       0     2        4    6
                              input                                         input
40
                                                                                                 0.2

                     30
                                                                                                0.15
                     20
            output




                     10                                                                          0.1


                       0
                                                                                                0.05
                     -10


                     -20                                                                            0
                        -4        -2          0            2       4                   6                0       50       100                  150        200       250
                                                   input




         40                                                                      40                                                      40


         30                                                                      30                                                      30


         20                                                                      20                                                      20
output




                                                                       output




                                                                                                                               output
         10                                                                      10                                                      10


           0                                                                      0                                                       0


         -10                                                                    -10                                                     -10


         -20                                                                     -20                                                     -20
            -4               -2        0            2          4                6 -4       -2   0           2        4                  6 -4        -2         0           2   4   6
                                           input                                                    input                                                          input
Training a ML algorithm
        • Choose data
        • Optimize model parameters according to:
          – Objective function
               Regression                            Classification
40
                                          10                                      Max Margin
                                                                          1

30                    Mean Square Error   8                               2


                                          6
20

                                          4
10
                                          2
  0
                                          0
-10
                                          -2
                                            -2   0      2     4       6       8
-20
   -4     -2     0     2    4    6
Pitfalls of ML algorithms
• Clean your features:
   – Training volume: more is better
   – Outliers: remove them!
   – Dynamic range: normalize it!

• Generalization
   – Over fitting
   – Under fitting

• Speed: parametric vs. non

• What are you learning? …features, features, features…
outliers         40
             40

                                                                             30
             30

                                                                             20
             20




                                                                    output
   output




                                                                             10
             10

                                                                               0
                 0

                                                                             -10
            -10

                                                                             -20
            -20                                                                 -4   -2   0           2   4   6
               -4     -2       0                2       4       6
                                                                                              input
                                    input
            50

            40

            30

            20                                                                 Keep a “good” percentile range!
output




            10

            0
                                                                               5-95, 1-99: depends on your data
         -10

         -20
            -4       -2    0                2       4       6
                                   input
Dynamic range
          6                                                       1.2
                                                     1                                                    1
          5                                          2             1                                      2

          4                                                       0.8

          3                                                       0.6
     f2




                                                            f2
          2                                                       0.4

          1                                                       0.2

          0                                                        0

          -1                                                     -0.2
               0   200   400         600   800       1000                0   0.2   0.4        0.6   0.8   1
                                f1                                                       f1

     400                                                           6
                                                 1                                                        1
     350                                                           5
                                                 2                                                        2
     300
                                                                   4
     250
                                                                   3
f2




                                                             f2
     200
                                                                   2
     150
                                                                   1
     100

      50                                                           0

          0                                                       -1
               0   200   400         600   800   1000               -2       0     2           4      6       8
                               f1                                                        f1
Over fitting and comparing
             algorithms

• Early stop
• Regularization
• Validation Sets
Under fitting
Curse of dimensionality
Under fitting
Curse of dimensionality
K-Means clustering

            •Planar decision boundaries,
            depending on space you are in…

            •Highly Efficient

            •Not always great (but usually
            pretty good)

            •Needs good starting criteria
K-Nearest Neighbor

       •Arbitrary decision boundaries

       •Not so efficient…

       •With enough data in each class…
       optimal

       •Easy to train, known as a lazy classifier
Mixture of Gaussians
          •Arbitrary decision boundaries
          with enough boundaries

          •Efficient, depending on number
          of models and Gaussians

          •Can represent more than just
          Gaussian distributions

          •Generative, sometimes tough to
          train up

          •Spurious singularities


          •Can get a distribution for a
          specific class and feature(s)… and
          get a Bayesian classifier
Components Analysis
(principal or independent)
           •Reduces dimensionality

           •All other classifiers work in a
           rotated space

           •Remember Eigen-values and
           Vectors?
Trees Classifiers


          •Arbitrary Decision boundaries

          •Can be quite efficient (or not!)

          •Needs good criteria for splitting

          •Easy to visualize
Multi-Layer Perceptron

              •Arbitrary (but linear) Decision
              boundaries

              •Can be quite efficient (or not!)

              •What did it learn?
Support Vector Machines




   •Arbitrary Decision boundaries

   •Efficiency depends on support
   vector size and feature size
Hidden Markov Models




  •Arbitrary Decision boundaries

  •Efficiency depends on state
  space and number of models

  •Generalizes to incorporate
  features that change over time
More sophisticated approaches
• Graphical models (like an HMM)
   – Bayesian network
   – Markov random fields

• Boosting
   – Adaboost

• Voting

• Cascading

• Stacking…

More Related Content

What's hot

slide
slideslide
slidekoh-t
 
Hovedtrender for fremtidig reiseetterspørsel
Hovedtrender for fremtidig reiseetterspørselHovedtrender for fremtidig reiseetterspørsel
Hovedtrender for fremtidig reiseetterspørselRobin Stenersen
 
شبكه هاي عصبي مصنوعي Ann farsi [www.matlabtrainings.blogfa.com]
شبكه هاي عصبي مصنوعي Ann farsi [www.matlabtrainings.blogfa.com]شبكه هاي عصبي مصنوعي Ann farsi [www.matlabtrainings.blogfa.com]
شبكه هاي عصبي مصنوعي Ann farsi [www.matlabtrainings.blogfa.com]sasanhb
 
EwB Excel - What we do
EwB Excel - What we doEwB Excel - What we do
EwB Excel - What we doVinit Patel
 
Excel with Business Services Launch
Excel with Business Services LaunchExcel with Business Services Launch
Excel with Business Services LaunchVinit Patel
 
Nanotechnology in the Czech Republic
Nanotechnology in the Czech RepublicNanotechnology in the Czech Republic
Nanotechnology in the Czech Republichelikarv
 

What's hot (9)

slide
slideslide
slide
 
Hovedtrender for fremtidig reiseetterspørsel
Hovedtrender for fremtidig reiseetterspørselHovedtrender for fremtidig reiseetterspørsel
Hovedtrender for fremtidig reiseetterspørsel
 
Empty template
Empty templateEmpty template
Empty template
 
شبكه هاي عصبي مصنوعي Ann farsi [www.matlabtrainings.blogfa.com]
شبكه هاي عصبي مصنوعي Ann farsi [www.matlabtrainings.blogfa.com]شبكه هاي عصبي مصنوعي Ann farsi [www.matlabtrainings.blogfa.com]
شبكه هاي عصبي مصنوعي Ann farsi [www.matlabtrainings.blogfa.com]
 
Cômoda para Bebê Magia
Cômoda para Bebê MagiaCômoda para Bebê Magia
Cômoda para Bebê Magia
 
EwB Excel - What we do
EwB Excel - What we doEwB Excel - What we do
EwB Excel - What we do
 
Excel with Business Services Launch
Excel with Business Services LaunchExcel with Business Services Launch
Excel with Business Services Launch
 
367 peter binfield
367 peter binfield367 peter binfield
367 peter binfield
 
Nanotechnology in the Czech Republic
Nanotechnology in the Czech RepublicNanotechnology in the Czech Republic
Nanotechnology in the Czech Republic
 

Similar to Machine Learning Lecture

Improvement Projects 2008
Improvement Projects 2008Improvement Projects 2008
Improvement Projects 2008Marcelo Costa
 
Solucion
SolucionSolucion
SolucionIETI SD
 
Workspace analysis of stewart platform
Workspace analysis of stewart platformWorkspace analysis of stewart platform
Workspace analysis of stewart platformMarzieh Nabi
 
Q-Learning and Pontryagin's Minimum Principle
Q-Learning and Pontryagin's Minimum PrincipleQ-Learning and Pontryagin's Minimum Principle
Q-Learning and Pontryagin's Minimum PrincipleSean Meyn
 
Ponca City Economy Presentation - 3.15.12
Ponca City Economy Presentation - 3.15.12Ponca City Economy Presentation - 3.15.12
Ponca City Economy Presentation - 3.15.12KatLong68
 
CrossSectionDrawings
CrossSectionDrawingsCrossSectionDrawings
CrossSectionDrawingsP1666
 
22.02, Group 5 — Concept of sustainable development in built environment
22.02, Group 5 — Concept of sustainable development in built environment22.02, Group 5 — Concept of sustainable development in built environment
22.02, Group 5 — Concept of sustainable development in built environmentWDC_Ukraine
 
The Open Access movement gains momentum – should young scientists care?
The Open Access movement gains momentum – should young scientists care?The Open Access movement gains momentum – should young scientists care?
The Open Access movement gains momentum – should young scientists care?Martin Ballaschk
 
Ei09 Opposite Green
Ei09 Opposite GreenEi09 Opposite Green
Ei09 Opposite Greennmoroney
 
Regulations As a "Panacea": Exploring the Consequences
Regulations As a "Panacea": Exploring the ConsequencesRegulations As a "Panacea": Exploring the Consequences
Regulations As a "Panacea": Exploring the ConsequencesMercatus Center
 
Natalia Restrepo-Coupe_Remotely-sensed photosynthetic phenology and ecosystem...
Natalia Restrepo-Coupe_Remotely-sensed photosynthetic phenology and ecosystem...Natalia Restrepo-Coupe_Remotely-sensed photosynthetic phenology and ecosystem...
Natalia Restrepo-Coupe_Remotely-sensed photosynthetic phenology and ecosystem...TERN Australia
 
CUBOM Event Data Analysis
CUBOM Event Data AnalysisCUBOM Event Data Analysis
CUBOM Event Data Analysisrosey36v
 
Tabla de frecuencia tiempo
Tabla de frecuencia tiempoTabla de frecuencia tiempo
Tabla de frecuencia tiempoIETI SD
 
Villar ciasem 2007
Villar ciasem 2007Villar ciasem 2007
Villar ciasem 2007Karina Mello
 

Similar to Machine Learning Lecture (20)

Improvement Projects 2008
Improvement Projects 2008Improvement Projects 2008
Improvement Projects 2008
 
Solucion
SolucionSolucion
Solucion
 
Presentation sikin
Presentation sikinPresentation sikin
Presentation sikin
 
Gusa20101023
Gusa20101023Gusa20101023
Gusa20101023
 
Workspace analysis of stewart platform
Workspace analysis of stewart platformWorkspace analysis of stewart platform
Workspace analysis of stewart platform
 
Q-Learning and Pontryagin's Minimum Principle
Q-Learning and Pontryagin's Minimum PrincipleQ-Learning and Pontryagin's Minimum Principle
Q-Learning and Pontryagin's Minimum Principle
 
Ponca City Economy Presentation - 3.15.12
Ponca City Economy Presentation - 3.15.12Ponca City Economy Presentation - 3.15.12
Ponca City Economy Presentation - 3.15.12
 
CrossSectionDrawings
CrossSectionDrawingsCrossSectionDrawings
CrossSectionDrawings
 
The Green Delusion
The Green DelusionThe Green Delusion
The Green Delusion
 
22.02, Group 5 — Concept of sustainable development in built environment
22.02, Group 5 — Concept of sustainable development in built environment22.02, Group 5 — Concept of sustainable development in built environment
22.02, Group 5 — Concept of sustainable development in built environment
 
Akvo's Admin Features
Akvo's Admin FeaturesAkvo's Admin Features
Akvo's Admin Features
 
The Open Access movement gains momentum – should young scientists care?
The Open Access movement gains momentum – should young scientists care?The Open Access movement gains momentum – should young scientists care?
The Open Access movement gains momentum – should young scientists care?
 
Ei09 Opposite Green
Ei09 Opposite GreenEi09 Opposite Green
Ei09 Opposite Green
 
Ch099 a ch02-if-wkshts
Ch099 a ch02-if-wkshtsCh099 a ch02-if-wkshts
Ch099 a ch02-if-wkshts
 
Regulations As a "Panacea": Exploring the Consequences
Regulations As a "Panacea": Exploring the ConsequencesRegulations As a "Panacea": Exploring the Consequences
Regulations As a "Panacea": Exploring the Consequences
 
Natalia Restrepo-Coupe_Remotely-sensed photosynthetic phenology and ecosystem...
Natalia Restrepo-Coupe_Remotely-sensed photosynthetic phenology and ecosystem...Natalia Restrepo-Coupe_Remotely-sensed photosynthetic phenology and ecosystem...
Natalia Restrepo-Coupe_Remotely-sensed photosynthetic phenology and ecosystem...
 
CUBOM Event Data Analysis
CUBOM Event Data AnalysisCUBOM Event Data Analysis
CUBOM Event Data Analysis
 
Tabla de frecuencia tiempo
Tabla de frecuencia tiempoTabla de frecuencia tiempo
Tabla de frecuencia tiempo
 
Villar ciasem 2007
Villar ciasem 2007Villar ciasem 2007
Villar ciasem 2007
 
Hahaha
HahahaHahaha
Hahaha
 

More from Eric Larson

PupilWare Petra 2015
PupilWare Petra 2015PupilWare Petra 2015
PupilWare Petra 2015Eric Larson
 
Mobile healthforthemasses.2015
Mobile healthforthemasses.2015Mobile healthforthemasses.2015
Mobile healthforthemasses.2015Eric Larson
 
Flipping the clinic: in home health monitoring using mobile phones
Flipping the clinic: in home health monitoring using mobile phonesFlipping the clinic: in home health monitoring using mobile phones
Flipping the clinic: in home health monitoring using mobile phonesEric Larson
 
First world problems: education, options, and impact
First world problems: education, options, and impactFirst world problems: education, options, and impact
First world problems: education, options, and impactEric Larson
 
Recognizing mHealth through phone-as-a-sensor technology
Recognizing mHealth through phone-as-a-sensor technologyRecognizing mHealth through phone-as-a-sensor technology
Recognizing mHealth through phone-as-a-sensor technologyEric Larson
 
Consumer Centered Calibration End Use Water Monitoring
Consumer Centered Calibration End Use Water MonitoringConsumer Centered Calibration End Use Water Monitoring
Consumer Centered Calibration End Use Water MonitoringEric Larson
 
Big Data, Small Data
Big Data, Small DataBig Data, Small Data
Big Data, Small DataEric Larson
 
Phone As A Sensor Technology: mHealth and Chronic Disease
Phone As A Sensor Technology: mHealth and Chronic Disease Phone As A Sensor Technology: mHealth and Chronic Disease
Phone As A Sensor Technology: mHealth and Chronic Disease Eric Larson
 
Commercialization and Broader Impact: mirroring research through commercial d...
Commercialization and Broader Impact: mirroring research through commercial d...Commercialization and Broader Impact: mirroring research through commercial d...
Commercialization and Broader Impact: mirroring research through commercial d...Eric Larson
 
Creating the Dots: Computer Science and Engineering for Good
Creating the Dots: Computer Science and Engineering for GoodCreating the Dots: Computer Science and Engineering for Good
Creating the Dots: Computer Science and Engineering for GoodEric Larson
 
Mobilizing mHealth: interdisciplinary computer science and engineering
Mobilizing mHealth: interdisciplinary computer science and engineeringMobilizing mHealth: interdisciplinary computer science and engineering
Mobilizing mHealth: interdisciplinary computer science and engineeringEric Larson
 
Applications and Derivation of Linear Predictive Coding
Applications and Derivation of Linear Predictive CodingApplications and Derivation of Linear Predictive Coding
Applications and Derivation of Linear Predictive CodingEric Larson
 
Sensing for Sustainability: Disaggregated Sensing of Electricity, Gas, and Water
Sensing for Sustainability: Disaggregated Sensing of Electricity, Gas, and WaterSensing for Sustainability: Disaggregated Sensing of Electricity, Gas, and Water
Sensing for Sustainability: Disaggregated Sensing of Electricity, Gas, and WaterEric Larson
 
Ubicomp2012 spiro smartpresentation
Ubicomp2012 spiro smartpresentationUbicomp2012 spiro smartpresentation
Ubicomp2012 spiro smartpresentationEric Larson
 
Open cv tutorial
Open cv tutorialOpen cv tutorial
Open cv tutorialEric Larson
 
Accurate and Privacy Preserving Cough Sensing from a Low Cost Microphone
Accurate and Privacy Preserving Cough Sensing from a Low Cost MicrophoneAccurate and Privacy Preserving Cough Sensing from a Low Cost Microphone
Accurate and Privacy Preserving Cough Sensing from a Low Cost MicrophoneEric Larson
 

More from Eric Larson (20)

PupilWare Petra 2015
PupilWare Petra 2015PupilWare Petra 2015
PupilWare Petra 2015
 
Mobile healthforthemasses.2015
Mobile healthforthemasses.2015Mobile healthforthemasses.2015
Mobile healthforthemasses.2015
 
Flipping the clinic: in home health monitoring using mobile phones
Flipping the clinic: in home health monitoring using mobile phonesFlipping the clinic: in home health monitoring using mobile phones
Flipping the clinic: in home health monitoring using mobile phones
 
First world problems: education, options, and impact
First world problems: education, options, and impactFirst world problems: education, options, and impact
First world problems: education, options, and impact
 
Recognizing mHealth through phone-as-a-sensor technology
Recognizing mHealth through phone-as-a-sensor technologyRecognizing mHealth through phone-as-a-sensor technology
Recognizing mHealth through phone-as-a-sensor technology
 
Consumer Centered Calibration End Use Water Monitoring
Consumer Centered Calibration End Use Water MonitoringConsumer Centered Calibration End Use Water Monitoring
Consumer Centered Calibration End Use Water Monitoring
 
Big Data, Small Data
Big Data, Small DataBig Data, Small Data
Big Data, Small Data
 
Phone As A Sensor Technology: mHealth and Chronic Disease
Phone As A Sensor Technology: mHealth and Chronic Disease Phone As A Sensor Technology: mHealth and Chronic Disease
Phone As A Sensor Technology: mHealth and Chronic Disease
 
Commercialization and Broader Impact: mirroring research through commercial d...
Commercialization and Broader Impact: mirroring research through commercial d...Commercialization and Broader Impact: mirroring research through commercial d...
Commercialization and Broader Impact: mirroring research through commercial d...
 
Creating the Dots: Computer Science and Engineering for Good
Creating the Dots: Computer Science and Engineering for GoodCreating the Dots: Computer Science and Engineering for Good
Creating the Dots: Computer Science and Engineering for Good
 
Mobilizing mHealth: interdisciplinary computer science and engineering
Mobilizing mHealth: interdisciplinary computer science and engineeringMobilizing mHealth: interdisciplinary computer science and engineering
Mobilizing mHealth: interdisciplinary computer science and engineering
 
Applications and Derivation of Linear Predictive Coding
Applications and Derivation of Linear Predictive CodingApplications and Derivation of Linear Predictive Coding
Applications and Derivation of Linear Predictive Coding
 
BreatheSuite
BreatheSuiteBreatheSuite
BreatheSuite
 
Job Talk
Job TalkJob Talk
Job Talk
 
Larson.defense
Larson.defenseLarson.defense
Larson.defense
 
Sensing for Sustainability: Disaggregated Sensing of Electricity, Gas, and Water
Sensing for Sustainability: Disaggregated Sensing of Electricity, Gas, and WaterSensing for Sustainability: Disaggregated Sensing of Electricity, Gas, and Water
Sensing for Sustainability: Disaggregated Sensing of Electricity, Gas, and Water
 
Ubicomp2012 spiro smartpresentation
Ubicomp2012 spiro smartpresentationUbicomp2012 spiro smartpresentation
Ubicomp2012 spiro smartpresentation
 
ACEEE 2012
ACEEE 2012ACEEE 2012
ACEEE 2012
 
Open cv tutorial
Open cv tutorialOpen cv tutorial
Open cv tutorial
 
Accurate and Privacy Preserving Cough Sensing from a Low Cost Microphone
Accurate and Privacy Preserving Cough Sensing from a Low Cost MicrophoneAccurate and Privacy Preserving Cough Sensing from a Low Cost Microphone
Accurate and Privacy Preserving Cough Sensing from a Low Cost Microphone
 

Recently uploaded

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Recently uploaded (20)

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

Machine Learning Lecture

  • 1. Machine Learning Roughly speaking, for a given learning task, with a given finite amount of training data, the best generalization performance will be achieved if the right balance is struck between the accuracy attained on that particular training set, and the “capacity” of the machine, that is, the ability of the machine to learn any training set without error. A machine with too much capacity is like a botanist with a photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything she has seen before; a machine with too little capacity is like the botanist’s lazy brother, who declares that if it’s green, it’s a tree. Neither can generalize well. The exploration and formalization of these concepts has resulted in one of the shining peaks of the theory of statistical learning. (Vapnik, 1979)
  • 2. What is machine learning? Data Model Output examples training Predictions Classifications Why: Face Recognition? Clusters Ordinals
  • 3. Categories of problems By output: Clustering Regression Prediction Classification Ordinal Reg. By input: Vector, X Time Series, x(t)
  • 4. One size never fits all… • Improving an algorithm: – First option: better features • Visualize classes • Trends • Histograms WEKA or GGOBI – Next: make the algorithm smarter (more complicated) • Interaction of features • Better objective and training criteria
  • 5. Categories of ML algorithms By training: Supervised (labeled) Unsupervised (unlabeled) By model: Non-parametric Kernel Parametric Raw data only methods Model parameters only 40 40 40 30 30 30 y=1 + 0.5t + 4t2 - t3 20 20 20 output output 10 10 10 0 0 0 -10 -10 -10 -20 -20 -20 -4 -2 0 2 4 6 -4 -2 0 2 4 6 -4 -2 0 2 4 6 input input
  • 6. 40 0.2 30 0.15 20 output 10 0.1 0 0.05 -10 -20 0 -4 -2 0 2 4 6 0 50 100 150 200 250 input 40 40 40 30 30 30 20 20 20 output output output 10 10 10 0 0 0 -10 -10 -10 -20 -20 -20 -4 -2 0 2 4 6 -4 -2 0 2 4 6 -4 -2 0 2 4 6 input input input
  • 7. Training a ML algorithm • Choose data • Optimize model parameters according to: – Objective function Regression Classification 40 10 Max Margin 1 30 Mean Square Error 8 2 6 20 4 10 2 0 0 -10 -2 -2 0 2 4 6 8 -20 -4 -2 0 2 4 6
  • 8. Pitfalls of ML algorithms • Clean your features: – Training volume: more is better – Outliers: remove them! – Dynamic range: normalize it! • Generalization – Over fitting – Under fitting • Speed: parametric vs. non • What are you learning? …features, features, features…
  • 9. outliers 40 40 30 30 20 20 output output 10 10 0 0 -10 -10 -20 -20 -4 -2 0 2 4 6 -4 -2 0 2 4 6 input input 50 40 30 20 Keep a “good” percentile range! output 10 0 5-95, 1-99: depends on your data -10 -20 -4 -2 0 2 4 6 input
  • 10. Dynamic range 6 1.2 1 1 5 2 1 2 4 0.8 3 0.6 f2 f2 2 0.4 1 0.2 0 0 -1 -0.2 0 200 400 600 800 1000 0 0.2 0.4 0.6 0.8 1 f1 f1 400 6 1 1 350 5 2 2 300 4 250 3 f2 f2 200 2 150 1 100 50 0 0 -1 0 200 400 600 800 1000 -2 0 2 4 6 8 f1 f1
  • 11. Over fitting and comparing algorithms • Early stop • Regularization • Validation Sets
  • 12. Under fitting Curse of dimensionality
  • 13. Under fitting Curse of dimensionality
  • 14. K-Means clustering •Planar decision boundaries, depending on space you are in… •Highly Efficient •Not always great (but usually pretty good) •Needs good starting criteria
  • 15. K-Nearest Neighbor •Arbitrary decision boundaries •Not so efficient… •With enough data in each class… optimal •Easy to train, known as a lazy classifier
  • 16. Mixture of Gaussians •Arbitrary decision boundaries with enough boundaries •Efficient, depending on number of models and Gaussians •Can represent more than just Gaussian distributions •Generative, sometimes tough to train up •Spurious singularities •Can get a distribution for a specific class and feature(s)… and get a Bayesian classifier
  • 17. Components Analysis (principal or independent) •Reduces dimensionality •All other classifiers work in a rotated space •Remember Eigen-values and Vectors?
  • 18. Trees Classifiers •Arbitrary Decision boundaries •Can be quite efficient (or not!) •Needs good criteria for splitting •Easy to visualize
  • 19. Multi-Layer Perceptron •Arbitrary (but linear) Decision boundaries •Can be quite efficient (or not!) •What did it learn?
  • 20. Support Vector Machines •Arbitrary Decision boundaries •Efficiency depends on support vector size and feature size
  • 21. Hidden Markov Models •Arbitrary Decision boundaries •Efficiency depends on state space and number of models •Generalizes to incorporate features that change over time
  • 22. More sophisticated approaches • Graphical models (like an HMM) – Bayesian network – Markov random fields • Boosting – Adaboost • Voting • Cascading • Stacking…