COMP 875: IntroductionsName, year, research area/group
Why are you interested in machine learning and how does it relate to your research?
What topics would you like to see covered in this course?What is Machine Learning?Using past experiences to improve future performance (on a particular task)For a machine, experiences come in the form of dataWhat does it mean to improve performance?Learning is guided by a quantitative objective, associated with a particular notion of loss to be minimized (or gain to be maximized)Why machine learning?Often it is too difficult to design a set of rules “by hand”Machine learning is about automatically extracting relevant information from data and applying it to analyze new dataSource: G. Shakhnarovich
Machine Learning StepsData collection:Start with training data for which we know the correct outcome provided by a “teacher”Representation: Decide how to encode the input to the learning programModeling:Choose a hypothesis class– a set of possible explanations for the dataEstimation:Find best hypothesis you can in the chosen classModel selection:We may reconsider the class of hypotheses given the outcomeEach of these steps can make or break the learning outcomeSource: G. Shakhnarovich
Learning and ProbabilityThere are many sources of uncertainty with which learning algorithms must cope:Variability of the dataDataset collectionMeasurement noiseLabeling errorsProbability and statistics provide an appropriate framework to deal with uncertaintySome basic statistical assumptions:Training data is sampled from the “true” underlying data distributionFuture test data will be sampled from the same distributionSource: G. Shakhnarovich
Example of a learning problemGiven: training images and their categoriesWhat are the categories of these test images?Possible representation: image of size n×n pixels -> vector of length n2 (or 3n2 if color)Source: G. Shakhnarovich
The Importance of RepresentationDimensionalityBeyond vectors: complex or heterogeneous input objectsWeb pagesProgram tracesImages with captions or metadataVideo with soundProteinsFeature extraction and feature selectionWhat measurements/information about the input objects are the most useful for solving the given problem?Successful representation requires domain knowledge!If we could find the “ideal” feature representation, we would not even need learning!
Types of learning problemsSupervisedClassificationRegressionUnsupervisedSemi-supervisedReinforcement learningActive learning….
Supervised learningGiven training examples of inputs and corresponding outputs, produce the “correct” outputs for new inputsTwo main scenarios:Classification:outputs are discrete variables (category labels). Learn a decision boundary that separates one class from the otherRegression:also known as “curve fitting” or “function approximation.” Learn a continuous input-output mapping from examples (possibly noisy)
Regression: example 1Suppose we want to predict gas mileage of a car based on some characteristics: number of cylinders or doors, weight, horsepower, year etc.Source: G. Shakhnarovich
Regression: example 2Training set: faces (represented as vectors of distances between keypoints) together with experimentally obtained attractiveness rankingsLearn: function to reproduce attractiveness ranking based on training inputs and outputsAttractiveness score f(v)Vector of distances vT. Leyvand, D. Cohen-Or, G. Dror, and D. Lischinski, Data-driven enhancement of facial attractiveness, SIGGRAPH 2008
Regression: example 3Input: scalar (attractiveness score)Output: vector-valued object (face)B. Davis and S. Lazebnik, “Analysis of Human Attractiveness Using Manifold Kernel Regression,” ICIP 2008
Regression: example 4Input: scalar (age)Output: vector-valued object (3D brain image)B. C. Davis, P. T. Fletcher, E. Bullitt and S. Joshi, "Population Shape Regression From Random Design Data", ICCV, 2007.
Structured PredictionWordImageSource: B. Taskar
Structured PredictionParse treeSentenceSource: B. Taskar
Structured PredictionWord alignmentSentence in two languagesSource: B. Taskar
Structured PredictionBond structureAmino-acid sequenceSource: B. Taskar
Structured PredictionMany image-based inference tasks can loosely be thought of as “structured prediction”Data association problemmodelSource: D. Ramanan
Other supervised learning scenariosLearning similarity functions from relations between multiple input objectsPairwise constraintsSource: X. Sui, K. Grauman
Other supervised learning scenariosLearning similarity functions from relations between multiple input objectsTriplet constraintsSource: X. Sui, K. Grauman
Unsupervised LearningGiven only unlabeled data as input, learn some sort of structureThe objective is often more vague or subjective than in supervised learning. This is more of an exploratory/descriptive data analysis
Unsupervised LearningClusteringDiscover groups of “similar” data points
Unsupervised LearningQuantizationMap a continuous input to a discrete (more compact) output213
Unsupervised LearningDimensionality reduction, manifold learningDiscover a lower-dimensional surface on which the data lives
Unsupervised LearningDensity estimationFind a function that approximates the probability density of the data (i.e., value of the function is high for “typical” points and low for “atypical” points)Can be used for anomaly detection
Other types of learningSemi-supervised learning:lots of data is available, but only small portion is labeled (e.g. since labeling is expensive)
Other types of learningSemi-supervised learning:lots of data is available, but only small portion is labeled (e.g. since labeling is expensive)Why is learning from labeled and unlabeled data better than learning from labeled data alone??
Other types of learningActive learning: the learning algorithm can choose its own training examples, or ask a “teacher” for an answer on selected inputsS. Vijayanarasimhan and K. Grauman, “Cost-Sensitive Active Visual Category Learning,” 2009 
Other types of learningReinforcement learning: an agent takes inputs from the environment, and takes actions that affect the environment. Occasionally, the agent gets a scalar reward or punishment. The goal is to learn to produce action sequences that maximize the expected reward (e.g. driving a robot without bumping into obstacles)Apprenticeship learning: learning from demonstrations when the reward function is initially unknownAutonomous helicopter flight: Pieter Abbeelhttp://heli.stanford.edu/
GeneralizationThe ultimate goal is to do as well as possible on new, unseen data (a test set), but we only have access to labels (“ground truth”) for the training setWhat makes generalization possible?Inductive bias: set of assumptions a learner uses to predict the target value for previously unseen inputsThis is the same as modeling or choosing a target hypothesis classTypes of inductive biasOccam’s razorSimilarity/continuity bias: similar inputs should have similar outputs…
Achieving good generalizationConsideration 1:BiasHow well does your model fit the observed data?It may be a good idea to accept some fitting error, because it may be due to noise or other “accidental” characteristics of one particular training setConsideration 2: VarianceHow robust is the model to the selection of a particular training set?To put it differently, if we learn models on two different training sets, how consistent will the models be?
Bias/variance tradeoff  Models with too many parameters may fit the training data well (low bias), but are sensitive to choice of training set (high variance)Bias/variance tradeoff  Models with too many parameters may fit the training data well (low bias), but are sensitive to choice of training set (high variance)
  Models with too few parameters may not fit the data well (high bias) but are consistent across different training sets (low variance)2
Bias/variance tradeoff  Models with too many parameters may fit the training data well (low bias), but are sensitive to choice of training set (high variance)
  Generalization error is due to overfitting
  Models with too few parameters may not fit the data well (high bias) but are consistent across different training sets (low variance)

Introduction

  • 1.
    COMP 875: IntroductionsName,year, research area/group
  • 2.
    Why are youinterested in machine learning and how does it relate to your research?
  • 3.
    What topics wouldyou like to see covered in this course?What is Machine Learning?Using past experiences to improve future performance (on a particular task)For a machine, experiences come in the form of dataWhat does it mean to improve performance?Learning is guided by a quantitative objective, associated with a particular notion of loss to be minimized (or gain to be maximized)Why machine learning?Often it is too difficult to design a set of rules “by hand”Machine learning is about automatically extracting relevant information from data and applying it to analyze new dataSource: G. Shakhnarovich
  • 4.
    Machine Learning StepsDatacollection:Start with training data for which we know the correct outcome provided by a “teacher”Representation: Decide how to encode the input to the learning programModeling:Choose a hypothesis class– a set of possible explanations for the dataEstimation:Find best hypothesis you can in the chosen classModel selection:We may reconsider the class of hypotheses given the outcomeEach of these steps can make or break the learning outcomeSource: G. Shakhnarovich
  • 5.
    Learning and ProbabilityThereare many sources of uncertainty with which learning algorithms must cope:Variability of the dataDataset collectionMeasurement noiseLabeling errorsProbability and statistics provide an appropriate framework to deal with uncertaintySome basic statistical assumptions:Training data is sampled from the “true” underlying data distributionFuture test data will be sampled from the same distributionSource: G. Shakhnarovich
  • 6.
    Example of alearning problemGiven: training images and their categoriesWhat are the categories of these test images?Possible representation: image of size n×n pixels -> vector of length n2 (or 3n2 if color)Source: G. Shakhnarovich
  • 7.
    The Importance ofRepresentationDimensionalityBeyond vectors: complex or heterogeneous input objectsWeb pagesProgram tracesImages with captions or metadataVideo with soundProteinsFeature extraction and feature selectionWhat measurements/information about the input objects are the most useful for solving the given problem?Successful representation requires domain knowledge!If we could find the “ideal” feature representation, we would not even need learning!
  • 8.
    Types of learningproblemsSupervisedClassificationRegressionUnsupervisedSemi-supervisedReinforcement learningActive learning….
  • 9.
    Supervised learningGiven trainingexamples of inputs and corresponding outputs, produce the “correct” outputs for new inputsTwo main scenarios:Classification:outputs are discrete variables (category labels). Learn a decision boundary that separates one class from the otherRegression:also known as “curve fitting” or “function approximation.” Learn a continuous input-output mapping from examples (possibly noisy)
  • 10.
    Regression: example 1Supposewe want to predict gas mileage of a car based on some characteristics: number of cylinders or doors, weight, horsepower, year etc.Source: G. Shakhnarovich
  • 11.
    Regression: example 2Trainingset: faces (represented as vectors of distances between keypoints) together with experimentally obtained attractiveness rankingsLearn: function to reproduce attractiveness ranking based on training inputs and outputsAttractiveness score f(v)Vector of distances vT. Leyvand, D. Cohen-Or, G. Dror, and D. Lischinski, Data-driven enhancement of facial attractiveness, SIGGRAPH 2008
  • 12.
    Regression: example 3Input:scalar (attractiveness score)Output: vector-valued object (face)B. Davis and S. Lazebnik, “Analysis of Human Attractiveness Using Manifold Kernel Regression,” ICIP 2008
  • 13.
    Regression: example 4Input:scalar (age)Output: vector-valued object (3D brain image)B. C. Davis, P. T. Fletcher, E. Bullitt and S. Joshi, "Population Shape Regression From Random Design Data", ICCV, 2007.
  • 14.
  • 15.
  • 16.
    Structured PredictionWord alignmentSentencein two languagesSource: B. Taskar
  • 17.
  • 18.
    Structured PredictionMany image-basedinference tasks can loosely be thought of as “structured prediction”Data association problemmodelSource: D. Ramanan
  • 19.
    Other supervised learningscenariosLearning similarity functions from relations between multiple input objectsPairwise constraintsSource: X. Sui, K. Grauman
  • 20.
    Other supervised learningscenariosLearning similarity functions from relations between multiple input objectsTriplet constraintsSource: X. Sui, K. Grauman
  • 21.
    Unsupervised LearningGiven onlyunlabeled data as input, learn some sort of structureThe objective is often more vague or subjective than in supervised learning. This is more of an exploratory/descriptive data analysis
  • 22.
  • 23.
    Unsupervised LearningQuantizationMap acontinuous input to a discrete (more compact) output213
  • 24.
    Unsupervised LearningDimensionality reduction,manifold learningDiscover a lower-dimensional surface on which the data lives
  • 25.
    Unsupervised LearningDensity estimationFinda function that approximates the probability density of the data (i.e., value of the function is high for “typical” points and low for “atypical” points)Can be used for anomaly detection
  • 26.
    Other types oflearningSemi-supervised learning:lots of data is available, but only small portion is labeled (e.g. since labeling is expensive)
  • 27.
    Other types oflearningSemi-supervised learning:lots of data is available, but only small portion is labeled (e.g. since labeling is expensive)Why is learning from labeled and unlabeled data better than learning from labeled data alone??
  • 28.
    Other types oflearningActive learning: the learning algorithm can choose its own training examples, or ask a “teacher” for an answer on selected inputsS. Vijayanarasimhan and K. Grauman, “Cost-Sensitive Active Visual Category Learning,” 2009 
  • 29.
    Other types oflearningReinforcement learning: an agent takes inputs from the environment, and takes actions that affect the environment. Occasionally, the agent gets a scalar reward or punishment. The goal is to learn to produce action sequences that maximize the expected reward (e.g. driving a robot without bumping into obstacles)Apprenticeship learning: learning from demonstrations when the reward function is initially unknownAutonomous helicopter flight: Pieter Abbeelhttp://heli.stanford.edu/
  • 30.
    GeneralizationThe ultimate goalis to do as well as possible on new, unseen data (a test set), but we only have access to labels (“ground truth”) for the training setWhat makes generalization possible?Inductive bias: set of assumptions a learner uses to predict the target value for previously unseen inputsThis is the same as modeling or choosing a target hypothesis classTypes of inductive biasOccam’s razorSimilarity/continuity bias: similar inputs should have similar outputs…
  • 31.
    Achieving good generalizationConsideration1:BiasHow well does your model fit the observed data?It may be a good idea to accept some fitting error, because it may be due to noise or other “accidental” characteristics of one particular training setConsideration 2: VarianceHow robust is the model to the selection of a particular training set?To put it differently, if we learn models on two different training sets, how consistent will the models be?
  • 32.
    Bias/variance tradeoff Models with too many parameters may fit the training data well (low bias), but are sensitive to choice of training set (high variance)Bias/variance tradeoff Models with too many parameters may fit the training data well (low bias), but are sensitive to choice of training set (high variance)
  • 33.
    Modelswith too few parameters may not fit the data well (high bias) but are consistent across different training sets (low variance)2
  • 34.
    Bias/variance tradeoff Models with too many parameters may fit the training data well (low bias), but are sensitive to choice of training set (high variance)
  • 35.
    Generalizationerror is due to overfitting
  • 36.
    Modelswith too few parameters may not fit the data well (high bias) but are consistent across different training sets (low variance)
  • 37.
    Generalizationerror is due tounderfitting2
  • 38.
    Underfitting and overfittingHowto recognize underfitting?High training error and high test errorHow to deal with underfitting?Find a more complex modelHow to recognize overfitting?Low training error, but high test errorHow to deal with overfitting?Get more training dataDecrease the number of parameters in your modelRegularization: penalize certain parts of the parameter space or introduce additional constraints to deal with a potentially ill-posed problem
  • 39.
    MethodologyDistinction between trainingand testing is crucialCorrect performance on training set is just memorization!Strictly speaking, the researcher should never look at the test data when designing the systemGeneralization performance should be evaluated on a hold-out or validation setRaises some troubling issues for learning “benchmarks” Source: R. Parr
  • 40.
    Next timeThe mathbegins…Guest lecturer: Max Raginsky (Duke EE)Reading lists due to me by email by the end of next Thursday, September 3rdA couple of sentences describing your topicA list of ~3 papers (doesn’t have to be final)Date constraints/preferencesIf you have more than one idea, send them all (will help with conflict resolution)