Introduction to Machine Learning Algorithms
What is Artificial Intelligence (AI)? Design and study of computer programs that  behave intelligently . Designing computer programs to  make computers smarter . Study of how to make computers  do things at  which, at the moment, people are better .
Research Areas and Approaches Artificial Intelligence Research Rationalism (Logical) Empiricism (Statistical) Connectionism (Neural) Evolutionary (Genetic) Biological (Molecular) Paradigm Application Intelligent Agents Information Retrieval Electronic Commerce Data Mining Bioinformatics Natural Language Proc. Expert Systems Learning Algorithms Inference Mechanisms Knowledge Representation Intelligent System Architecture
Concept of Machine Learning
                                              
Context Information Theory Computer Science (AI) Cognitive Science Statistics Machine Learning
Why Machine Learning? Recent progress in algorithms and theory Growing flood of online data Computational power is available Budding industry Three niches for machine learning   Data mining : using historical data to improve decisions Medical records --> medical knowledge Software applications  we can’t program by hand Autonomous driving Speech recognition Self-customizing programs Newsreader that learns user interests
Learning: Definition Definition  Learning  is the  improvement  of  performance  in some  environment  through the acquisition of  knowledge  resulting from  experience  in that environment. the improvement of behavior on some performance task through acquisition of knowledge based on partial  task experience
A Learning Problem:  EnjoySport Sky  What is the general concept? Temp  Humid Wind Water Forecast EnjoySports Sunny  Warm  Normal  Strong  Warm  Same  Yes   Sunny  Warm  High  Strong  Warm  Same  Yes   Rainy  Cold  High  Strong  Warm  Change  No   Sunny  Warm  High  Strong  Cool  Change  Yes
Metaphors and Methods Neurobiology Biological Evolution Heuristic Search Statistical Inference Memory and Retrieval Connectionist Learning Genetic Learning Tree / Rule Induction Case-Based Learning Probabilistic Induction
What is the Learning Problem? Learning = improving with experience at some task Improve over  task  T , With respect to  performance measure  P , Based on  experience  E . E.g., Learn to play checkers T : Play checkers P : % of games won in world tournament E : opportunity to play against self
Machine Learning: Tasks Supervised Learning Estimate an unknown mapping from known input- output pairs Learn  f w  from training set  D ={( x , y )} s.t. Classification :  y  is discrete Regression :  y  is continuous Unsupervised Learning Only input values are provided Learn  f w  from  D ={( x )} s.t. Compression Clustering Reinforcement Learning
Machine Learning: Strategies Rote learning Concept learning Learning from examples Learning by instruction Inductive learning Deductive learning Explanation-based learning (EBL) Learning by analogy Learning by observation
Supervised Learning Given a sequence of input/output pairs of the form < x i , y i >,  where  x i  is a possible input and  y i  is the output associated with  x i . Learn a function  f   that accounts for the examples seen so far,  f(x i ) = y i  for all  i , and that makes a good guess for the outputs of the inputs that it has not seen.
Examples of Input-Output Pairs Task Inputs Outputs Recognition Action Janitor robot problem Descriptions of objects Classes that the objects belong to Actions or predictions Descriptions of  situations Descriptions of offices (floor, prof’s office) Yes or No (indicating whether or not the office contains a  recycling bin)
Unsupervised Learning Clustering A clustering algorithm  partitions the inputs into a fixed number of subsets or clusters  so that inputs in the same cluster are close to one another. Discovery learning The objective is to  uncover new relations  in the data.
Online and Batch Learning Batch methods Process large sets of examples  all at once . Online (incremental) methods Process examples  one at a time.
Machine Learning Algorithms and Applications
Machine Learning Algorithms Neural Learning Multilayer Perceptrons (MLPs) Self-Organizing Maps (SOMs) Evolutionary Learning Genetic Algorithms Probabilistic Learning Bayesian Networks (BNs) Other Machine Learning Methods Decision Trees (DTs)
Neural Nets for Handwritten Digit Recognition … … Pre-processing … … … Input units Hidden units Output units 0 1 2 3 9 … Training Test … … … 0 1 2 3 9 ? …
ALVINN System:  Neural Network Learning to Steer an Autonomous Vehicle
Learning to Navigate a Vehicle by Observing an Human Expert (1/2) Inputs  The images produces by a camera mounted on the vehicle Outputs The actions taken by the human driver to steer the vehicle or adjust its speed. Result of learning A function mapping images to control actions
Learning to Navigate a Vehicle by Observing an Human Expert (2/2)
Data Recorrection by a Hopfield Network original  target data corrupted  input data Recorrected  data after  10 iterations Recorrected  data after  20 iterations Fully recorrected  data after  35 iterations
ANN for Face Recognition 960 x 3 x 4 network is trained on gray-level images of faces to predict whether a person is looking to their left, right, ahead, or up.
Data Mining -- -- -- -- -- -- -- -- -- Target  data Cleaned data Transformed data Patterns/ model Knowledge Database/data warehouse Selection & Sampling Preprocessing & Cleaning Transformation & reduction Interpretation/ Evaluation Data Mining Performance system
Hot Water Flashing Nozzle with Evolutionary Algorithms Start Hot water entering Steam and droplet at exit At throat: Mach 1 and onset of flashing Hans-Paul Schwefel  performed the original experiments
Machine Learning Applications in Bioinformatics
Bayesian Networks for Gene Expression Analysis Learning Inference Processed data Data Preprocessing Learning algorithm Gene C Gene B Gene A Target Gene D Gene C Gene B Gene A Target Gene D Gene C Gene B Gene A Target Gene D Gene C Gene B Gene A Target Gene D The values of Gene C and Gene B are given. Belief propagation Probability for the target is computed.
Multilayer Perceptrons for Gene Finding and Prediction bases Discrete exon score 0 1 sequence score Coding potential value GC Composition Length Donor Acceptor Intron vocabulary
Self-Organizing Maps for DNA Microarray Data Analysis Two-dimensional array of postsynaptic neurons Bundle of synaptic connections Winning neurons Input
Biological Information Extraction Database Template Filling Data Analysis & Field Identification Data Classification & Field Extraction Information Extraction Field Property Identification & Learning Text Data DB Location Date DB Record
Biomolecular Computing 011001101010001 ATGCTCGAAGCT

Machine Learning

  • 1.
    Introduction to MachineLearning Algorithms
  • 2.
    What is ArtificialIntelligence (AI)? Design and study of computer programs that behave intelligently . Designing computer programs to make computers smarter . Study of how to make computers do things at which, at the moment, people are better .
  • 3.
    Research Areas andApproaches Artificial Intelligence Research Rationalism (Logical) Empiricism (Statistical) Connectionism (Neural) Evolutionary (Genetic) Biological (Molecular) Paradigm Application Intelligent Agents Information Retrieval Electronic Commerce Data Mining Bioinformatics Natural Language Proc. Expert Systems Learning Algorithms Inference Mechanisms Knowledge Representation Intelligent System Architecture
  • 4.
  • 5.
  • 6.
    Context Information TheoryComputer Science (AI) Cognitive Science Statistics Machine Learning
  • 7.
    Why Machine Learning?Recent progress in algorithms and theory Growing flood of online data Computational power is available Budding industry Three niches for machine learning Data mining : using historical data to improve decisions Medical records --> medical knowledge Software applications we can’t program by hand Autonomous driving Speech recognition Self-customizing programs Newsreader that learns user interests
  • 8.
    Learning: Definition Definition Learning is the improvement of performance in some environment through the acquisition of knowledge resulting from experience in that environment. the improvement of behavior on some performance task through acquisition of knowledge based on partial task experience
  • 9.
    A Learning Problem: EnjoySport Sky What is the general concept? Temp Humid Wind Water Forecast EnjoySports Sunny Warm Normal Strong Warm Same Yes Sunny Warm High Strong Warm Same Yes Rainy Cold High Strong Warm Change No Sunny Warm High Strong Cool Change Yes
  • 10.
    Metaphors and MethodsNeurobiology Biological Evolution Heuristic Search Statistical Inference Memory and Retrieval Connectionist Learning Genetic Learning Tree / Rule Induction Case-Based Learning Probabilistic Induction
  • 11.
    What is theLearning Problem? Learning = improving with experience at some task Improve over task T , With respect to performance measure P , Based on experience E . E.g., Learn to play checkers T : Play checkers P : % of games won in world tournament E : opportunity to play against self
  • 12.
    Machine Learning: TasksSupervised Learning Estimate an unknown mapping from known input- output pairs Learn f w from training set D ={( x , y )} s.t. Classification : y is discrete Regression : y is continuous Unsupervised Learning Only input values are provided Learn f w from D ={( x )} s.t. Compression Clustering Reinforcement Learning
  • 13.
    Machine Learning: StrategiesRote learning Concept learning Learning from examples Learning by instruction Inductive learning Deductive learning Explanation-based learning (EBL) Learning by analogy Learning by observation
  • 14.
    Supervised Learning Givena sequence of input/output pairs of the form < x i , y i >, where x i is a possible input and y i is the output associated with x i . Learn a function f that accounts for the examples seen so far, f(x i ) = y i for all i , and that makes a good guess for the outputs of the inputs that it has not seen.
  • 15.
    Examples of Input-OutputPairs Task Inputs Outputs Recognition Action Janitor robot problem Descriptions of objects Classes that the objects belong to Actions or predictions Descriptions of situations Descriptions of offices (floor, prof’s office) Yes or No (indicating whether or not the office contains a recycling bin)
  • 16.
    Unsupervised Learning ClusteringA clustering algorithm partitions the inputs into a fixed number of subsets or clusters so that inputs in the same cluster are close to one another. Discovery learning The objective is to uncover new relations in the data.
  • 17.
    Online and BatchLearning Batch methods Process large sets of examples all at once . Online (incremental) methods Process examples one at a time.
  • 18.
  • 19.
    Machine Learning AlgorithmsNeural Learning Multilayer Perceptrons (MLPs) Self-Organizing Maps (SOMs) Evolutionary Learning Genetic Algorithms Probabilistic Learning Bayesian Networks (BNs) Other Machine Learning Methods Decision Trees (DTs)
  • 20.
    Neural Nets forHandwritten Digit Recognition … … Pre-processing … … … Input units Hidden units Output units 0 1 2 3 9 … Training Test … … … 0 1 2 3 9 ? …
  • 21.
    ALVINN System: Neural Network Learning to Steer an Autonomous Vehicle
  • 22.
    Learning to Navigatea Vehicle by Observing an Human Expert (1/2) Inputs The images produces by a camera mounted on the vehicle Outputs The actions taken by the human driver to steer the vehicle or adjust its speed. Result of learning A function mapping images to control actions
  • 23.
    Learning to Navigatea Vehicle by Observing an Human Expert (2/2)
  • 24.
    Data Recorrection bya Hopfield Network original target data corrupted input data Recorrected data after 10 iterations Recorrected data after 20 iterations Fully recorrected data after 35 iterations
  • 25.
    ANN for FaceRecognition 960 x 3 x 4 network is trained on gray-level images of faces to predict whether a person is looking to their left, right, ahead, or up.
  • 26.
    Data Mining ---- -- -- -- -- -- -- -- Target data Cleaned data Transformed data Patterns/ model Knowledge Database/data warehouse Selection & Sampling Preprocessing & Cleaning Transformation & reduction Interpretation/ Evaluation Data Mining Performance system
  • 27.
    Hot Water FlashingNozzle with Evolutionary Algorithms Start Hot water entering Steam and droplet at exit At throat: Mach 1 and onset of flashing Hans-Paul Schwefel performed the original experiments
  • 28.
  • 29.
    Bayesian Networks forGene Expression Analysis Learning Inference Processed data Data Preprocessing Learning algorithm Gene C Gene B Gene A Target Gene D Gene C Gene B Gene A Target Gene D Gene C Gene B Gene A Target Gene D Gene C Gene B Gene A Target Gene D The values of Gene C and Gene B are given. Belief propagation Probability for the target is computed.
  • 30.
    Multilayer Perceptrons forGene Finding and Prediction bases Discrete exon score 0 1 sequence score Coding potential value GC Composition Length Donor Acceptor Intron vocabulary
  • 31.
    Self-Organizing Maps forDNA Microarray Data Analysis Two-dimensional array of postsynaptic neurons Bundle of synaptic connections Winning neurons Input
  • 32.
    Biological Information ExtractionDatabase Template Filling Data Analysis & Field Identification Data Classification & Field Extraction Information Extraction Field Property Identification & Learning Text Data DB Location Date DB Record
  • 33.

Editor's Notes

  • #13 Machine Learning ( 형주 ) &gt;     - Supervised Learning, Unsupervised Learning 정의 &gt;     - What if the data are (preprocessed) text documents?
  • #33 &gt;     - Text Classification &gt;     - Filtering &gt;     - Extraction