SlideShare a Scribd company logo
1 of 69
Download to read offline
The Machine Learning Paradigm
           Supervised vs. Unsupervised Learning
 Supervised Learning with a Probabilistic Grammar
                     A Bayesian Reply to the APS




Grammar Induction Through Machine Learning
                  Part 1
                       Linguistic Nativism Reconsidered


                  Alexander Clark1 and Shalom Lappin2

                             1 Department of Computer Science

                            Royal Holloway, University of London
                                  2 Department   of Philosophy
                                     King’s College, London


                                     February 5, 2008
                                 Department of Philosophy
                                  King’s College, London
                                Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
                Supervised vs. Unsupervised Learning
      Supervised Learning with a Probabilistic Grammar
                          A Bayesian Reply to the APS


Outline


  1    The Machine Learning Paradigm

  2    Supervised vs. Unsupervised Learning

  3    Supervised Learning with a Probabilistic Grammar

  4    A Bayesian Reply to the APS




                                     Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Machine Learning Algorithms and Models of the
Learning Domain
       A machine learning system implements a learning
       algorithm that defines a function from a domain of input
       samples to a range of output values.
       A corpus of examples is divided into a training and a test
       set.
       The learning algorithm is specified in conjunction with a
       model of the phenomenon to be learned.
       This model defines the space of possible hypotheses that
       the algorithm can generate from the input data.
       When the values of its parameters are set through training
       of the algorithm on the test set, an element of the
       hypothesis space is selected.
                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Machine Learning Algorithms and Models of the
Learning Domain
       A machine learning system implements a learning
       algorithm that defines a function from a domain of input
       samples to a range of output values.
       A corpus of examples is divided into a training and a test
       set.
       The learning algorithm is specified in conjunction with a
       model of the phenomenon to be learned.
       This model defines the space of possible hypotheses that
       the algorithm can generate from the input data.
       When the values of its parameters are set through training
       of the algorithm on the test set, an element of the
       hypothesis space is selected.
                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Machine Learning Algorithms and Models of the
Learning Domain
       A machine learning system implements a learning
       algorithm that defines a function from a domain of input
       samples to a range of output values.
       A corpus of examples is divided into a training and a test
       set.
       The learning algorithm is specified in conjunction with a
       model of the phenomenon to be learned.
       This model defines the space of possible hypotheses that
       the algorithm can generate from the input data.
       When the values of its parameters are set through training
       of the algorithm on the test set, an element of the
       hypothesis space is selected.
                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Machine Learning Algorithms and Models of the
Learning Domain
       A machine learning system implements a learning
       algorithm that defines a function from a domain of input
       samples to a range of output values.
       A corpus of examples is divided into a training and a test
       set.
       The learning algorithm is specified in conjunction with a
       model of the phenomenon to be learned.
       This model defines the space of possible hypotheses that
       the algorithm can generate from the input data.
       When the values of its parameters are set through training
       of the algorithm on the test set, an element of the
       hypothesis space is selected.
                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Machine Learning Algorithms and Models of the
Learning Domain
       A machine learning system implements a learning
       algorithm that defines a function from a domain of input
       samples to a range of output values.
       A corpus of examples is divided into a training and a test
       set.
       The learning algorithm is specified in conjunction with a
       model of the phenomenon to be learned.
       This model defines the space of possible hypotheses that
       the algorithm can generate from the input data.
       When the values of its parameters are set through training
       of the algorithm on the test set, an element of the
       hypothesis space is selected.
                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
                Supervised vs. Unsupervised Learning
      Supervised Learning with a Probabilistic Grammar
                          A Bayesian Reply to the APS


Evaluating a Parsing Algorithm
Supervised Learning

         If one has a gold standard of correct parses in a corpus,
         then it is possible to compute the percentage of correct
         parses that the algorithm produces for a blind test subpart
         of this corpus.
         A more common procedure for scoring an ML algorithm on
         a test set is to determine its performance for recall and
         precision.
         The recall of a parsing algorithm A is the percentage of
         labelled brackets of the test set that it correctly identifies.
         A’s precision is the percentage of the brackets that it
         returns which correspond to those in the gold standard.
         A unified score for A, known as an F score, can be
         computed as an average of its recall and its precision.
                                     Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
                Supervised vs. Unsupervised Learning
      Supervised Learning with a Probabilistic Grammar
                          A Bayesian Reply to the APS


Evaluating a Parsing Algorithm
Supervised Learning

         If one has a gold standard of correct parses in a corpus,
         then it is possible to compute the percentage of correct
         parses that the algorithm produces for a blind test subpart
         of this corpus.
         A more common procedure for scoring an ML algorithm on
         a test set is to determine its performance for recall and
         precision.
         The recall of a parsing algorithm A is the percentage of
         labelled brackets of the test set that it correctly identifies.
         A’s precision is the percentage of the brackets that it
         returns which correspond to those in the gold standard.
         A unified score for A, known as an F score, can be
         computed as an average of its recall and its precision.
                                     Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
                Supervised vs. Unsupervised Learning
      Supervised Learning with a Probabilistic Grammar
                          A Bayesian Reply to the APS


Evaluating a Parsing Algorithm
Supervised Learning

         If one has a gold standard of correct parses in a corpus,
         then it is possible to compute the percentage of correct
         parses that the algorithm produces for a blind test subpart
         of this corpus.
         A more common procedure for scoring an ML algorithm on
         a test set is to determine its performance for recall and
         precision.
         The recall of a parsing algorithm A is the percentage of
         labelled brackets of the test set that it correctly identifies.
         A’s precision is the percentage of the brackets that it
         returns which correspond to those in the gold standard.
         A unified score for A, known as an F score, can be
         computed as an average of its recall and its precision.
                                     Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
                Supervised vs. Unsupervised Learning
      Supervised Learning with a Probabilistic Grammar
                          A Bayesian Reply to the APS


Evaluating a Parsing Algorithm
Supervised Learning

         If one has a gold standard of correct parses in a corpus,
         then it is possible to compute the percentage of correct
         parses that the algorithm produces for a blind test subpart
         of this corpus.
         A more common procedure for scoring an ML algorithm on
         a test set is to determine its performance for recall and
         precision.
         The recall of a parsing algorithm A is the percentage of
         labelled brackets of the test set that it correctly identifies.
         A’s precision is the percentage of the brackets that it
         returns which correspond to those in the gold standard.
         A unified score for A, known as an F score, can be
         computed as an average of its recall and its precision.
                                     Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
                Supervised vs. Unsupervised Learning
      Supervised Learning with a Probabilistic Grammar
                          A Bayesian Reply to the APS


Evaluating a Parsing Algorithm
Supervised Learning

         If one has a gold standard of correct parses in a corpus,
         then it is possible to compute the percentage of correct
         parses that the algorithm produces for a blind test subpart
         of this corpus.
         A more common procedure for scoring an ML algorithm on
         a test set is to determine its performance for recall and
         precision.
         The recall of a parsing algorithm A is the percentage of
         labelled brackets of the test set that it correctly identifies.
         A’s precision is the percentage of the brackets that it
         returns which correspond to those in the gold standard.
         A unified score for A, known as an F score, can be
         computed as an average of its recall and its precision.
                                     Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Learning Biases and Priors


       The choice of parameters and their possible values defines
       a bias for the language model by imposing prior constraints
       on the set of learnable hypotheses.
       All learning requires some sort of bias to restrict the set of
       possible hypotheses for the phenomenon to be learned.
       This bias can express strong assumptions about the nature
       of the domain of learning.
       Alternatively, it can define comparatively weak
       domain-specific constraints, with learning driven primarily
       by domain-general procedures and conditions.



                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Learning Biases and Priors


       The choice of parameters and their possible values defines
       a bias for the language model by imposing prior constraints
       on the set of learnable hypotheses.
       All learning requires some sort of bias to restrict the set of
       possible hypotheses for the phenomenon to be learned.
       This bias can express strong assumptions about the nature
       of the domain of learning.
       Alternatively, it can define comparatively weak
       domain-specific constraints, with learning driven primarily
       by domain-general procedures and conditions.



                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Learning Biases and Priors


       The choice of parameters and their possible values defines
       a bias for the language model by imposing prior constraints
       on the set of learnable hypotheses.
       All learning requires some sort of bias to restrict the set of
       possible hypotheses for the phenomenon to be learned.
       This bias can express strong assumptions about the nature
       of the domain of learning.
       Alternatively, it can define comparatively weak
       domain-specific constraints, with learning driven primarily
       by domain-general procedures and conditions.



                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Learning Biases and Priors


       The choice of parameters and their possible values defines
       a bias for the language model by imposing prior constraints
       on the set of learnable hypotheses.
       All learning requires some sort of bias to restrict the set of
       possible hypotheses for the phenomenon to be learned.
       This bias can express strong assumptions about the nature
       of the domain of learning.
       Alternatively, it can define comparatively weak
       domain-specific constraints, with learning driven primarily
       by domain-general procedures and conditions.



                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Prior Probability Distributions on a Hypothesis Space



       One way of formalising this learning bias is a prior
       probability distribution on the elements of the hypothesis
       space that favours some hypotheses as more likely than
       others.
       The paradigm of Bayesian learning in cognitive science
       implements this approach.
       The simplicity and compactness measure that Perfors et al.
       (2006) use is an example of a very general prior.




                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Prior Probability Distributions on a Hypothesis Space



       One way of formalising this learning bias is a prior
       probability distribution on the elements of the hypothesis
       space that favours some hypotheses as more likely than
       others.
       The paradigm of Bayesian learning in cognitive science
       implements this approach.
       The simplicity and compactness measure that Perfors et al.
       (2006) use is an example of a very general prior.




                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Prior Probability Distributions on a Hypothesis Space



       One way of formalising this learning bias is a prior
       probability distribution on the elements of the hypothesis
       space that favours some hypotheses as more likely than
       others.
       The paradigm of Bayesian learning in cognitive science
       implements this approach.
       The simplicity and compactness measure that Perfors et al.
       (2006) use is an example of a very general prior.




                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Learning Bias and the Poverty of Stimulus

       The poverty of stimulus issue can be formulated as follows:

       What are the minimal domain-specific linguistic biases that
       must be assumed for a reasonable learning algorithm to
       support language acquisition on the basis of the training
       set available to the child?


       If a model with relatively weak language-specific biases
       can sustain effective grammar induction, then this result
       undermines poverty of stimulus arguments for a rich theory
       of universal grammar.


                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Learning Bias and the Poverty of Stimulus

       The poverty of stimulus issue can be formulated as follows:

       What are the minimal domain-specific linguistic biases that
       must be assumed for a reasonable learning algorithm to
       support language acquisition on the basis of the training
       set available to the child?


       If a model with relatively weak language-specific biases
       can sustain effective grammar induction, then this result
       undermines poverty of stimulus arguments for a rich theory
       of universal grammar.


                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Learning Bias and the Poverty of Stimulus

       The poverty of stimulus issue can be formulated as follows:

       What are the minimal domain-specific linguistic biases that
       must be assumed for a reasonable learning algorithm to
       support language acquisition on the basis of the training
       set available to the child?


       If a model with relatively weak language-specific biases
       can sustain effective grammar induction, then this result
       undermines poverty of stimulus arguments for a rich theory
       of universal grammar.


                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Supervised Learning


      When the samples of the training set are annotated with
      the classifications and structures that the learning
      algorithm is intended to produce as output for the test set,
      then learning is supervised.
      Supervised grammar induction involves training an ML
      procedure on a corpus annotated with the parse structures
      of the gold standard.
      The learning algorithm infers a function for assigning input
      sentences to appropriate parse output on the basis of a
      training set of sentence argument-parse value pairs.



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Supervised Learning


      When the samples of the training set are annotated with
      the classifications and structures that the learning
      algorithm is intended to produce as output for the test set,
      then learning is supervised.
      Supervised grammar induction involves training an ML
      procedure on a corpus annotated with the parse structures
      of the gold standard.
      The learning algorithm infers a function for assigning input
      sentences to appropriate parse output on the basis of a
      training set of sentence argument-parse value pairs.



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Supervised Learning


      When the samples of the training set are annotated with
      the classifications and structures that the learning
      algorithm is intended to produce as output for the test set,
      then learning is supervised.
      Supervised grammar induction involves training an ML
      procedure on a corpus annotated with the parse structures
      of the gold standard.
      The learning algorithm infers a function for assigning input
      sentences to appropriate parse output on the basis of a
      training set of sentence argument-parse value pairs.



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Unsupervised Learning



      If the test set is not marked with the properties to be
      returned as output for the test set, then learning is
      unsupervised.


      Unsupervised learning involves using clustering patterns
      and distributional regularities in a training set to identify
      structure in the data.




                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Unsupervised Learning



      If the test set is not marked with the properties to be
      returned as output for the test set, then learning is
      unsupervised.


      Unsupervised learning involves using clustering patterns
      and distributional regularities in a training set to identify
      structure in the data.




                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Supervised Learning and Language Acquisition

      It could be argued that supervised grammar induction is
      not directly relevant to poverty of stimulus arguments.
      It requires that target parse structures be represented in
      the training set, while children have no access to such
      representations in the data they are exposed to.
      If negative evidence of the sort identified by Saxton (1997),
      and Chouinard and Clark (2003) is available and plays a
      role in grammar induction, then it is possible to model the
      acquisition process as a type of supervised learning.
      If, however, children achieve language solely on the basis
      of positive evidence, then it is necessary to treat
      acquisition as unsupervised learning.

                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Supervised Learning and Language Acquisition

      It could be argued that supervised grammar induction is
      not directly relevant to poverty of stimulus arguments.
      It requires that target parse structures be represented in
      the training set, while children have no access to such
      representations in the data they are exposed to.
      If negative evidence of the sort identified by Saxton (1997),
      and Chouinard and Clark (2003) is available and plays a
      role in grammar induction, then it is possible to model the
      acquisition process as a type of supervised learning.
      If, however, children achieve language solely on the basis
      of positive evidence, then it is necessary to treat
      acquisition as unsupervised learning.

                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Supervised Learning and Language Acquisition

      It could be argued that supervised grammar induction is
      not directly relevant to poverty of stimulus arguments.
      It requires that target parse structures be represented in
      the training set, while children have no access to such
      representations in the data they are exposed to.
      If negative evidence of the sort identified by Saxton (1997),
      and Chouinard and Clark (2003) is available and plays a
      role in grammar induction, then it is possible to model the
      acquisition process as a type of supervised learning.
      If, however, children achieve language solely on the basis
      of positive evidence, then it is necessary to treat
      acquisition as unsupervised learning.

                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Supervised Learning and Language Acquisition

      It could be argued that supervised grammar induction is
      not directly relevant to poverty of stimulus arguments.
      It requires that target parse structures be represented in
      the training set, while children have no access to such
      representations in the data they are exposed to.
      If negative evidence of the sort identified by Saxton (1997),
      and Chouinard and Clark (2003) is available and plays a
      role in grammar induction, then it is possible to model the
      acquisition process as a type of supervised learning.
      If, however, children achieve language solely on the basis
      of positive evidence, then it is necessary to treat
      acquisition as unsupervised learning.

                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Probabilistic Context-Free Grammars

       A Probabilistic Context-Free Grammar (PCFG) conditions
       the probability of a child nonterminal sequence on that of
       the parent nonterminal.
       It provides conditional probabilities of the form
       P(X1 · · · Xn | N) for each nonterminal N and sequence
       X1 · · · Xn of items from the vocabulary of the grammar.
       It also specifies a probability distribution over the label of
       the root of the tree Ps (N).
       The conditional probabilities P(X1 · · · Xn | N) correspond to
       probabilistic parameters that govern the expansion of a
       node in a parse tree according to a context free rule
       N → X1 · · · Xn .

                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Probabilistic Context-Free Grammars

       A Probabilistic Context-Free Grammar (PCFG) conditions
       the probability of a child nonterminal sequence on that of
       the parent nonterminal.
       It provides conditional probabilities of the form
       P(X1 · · · Xn | N) for each nonterminal N and sequence
       X1 · · · Xn of items from the vocabulary of the grammar.
       It also specifies a probability distribution over the label of
       the root of the tree Ps (N).
       The conditional probabilities P(X1 · · · Xn | N) correspond to
       probabilistic parameters that govern the expansion of a
       node in a parse tree according to a context free rule
       N → X1 · · · Xn .

                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Probabilistic Context-Free Grammars

       A Probabilistic Context-Free Grammar (PCFG) conditions
       the probability of a child nonterminal sequence on that of
       the parent nonterminal.
       It provides conditional probabilities of the form
       P(X1 · · · Xn | N) for each nonterminal N and sequence
       X1 · · · Xn of items from the vocabulary of the grammar.
       It also specifies a probability distribution over the label of
       the root of the tree Ps (N).
       The conditional probabilities P(X1 · · · Xn | N) correspond to
       probabilistic parameters that govern the expansion of a
       node in a parse tree according to a context free rule
       N → X1 · · · Xn .

                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Probabilistic Context-Free Grammars

       A Probabilistic Context-Free Grammar (PCFG) conditions
       the probability of a child nonterminal sequence on that of
       the parent nonterminal.
       It provides conditional probabilities of the form
       P(X1 · · · Xn | N) for each nonterminal N and sequence
       X1 · · · Xn of items from the vocabulary of the grammar.
       It also specifies a probability distribution over the label of
       the root of the tree Ps (N).
       The conditional probabilities P(X1 · · · Xn | N) correspond to
       probabilistic parameters that govern the expansion of a
       node in a parse tree according to a context free rule
       N → X1 · · · Xn .

                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Probabilistic Context-Free Grammars


       The probabilistic parameter values of a PCFG can be
       learned from a parse annotated training corpus by
       computing the frequency of CFG rules in accordance with
       a Maximum Likelihood Expectation (MLE) condition.

        c(A→β1 ...βk )
          c(A→γ)

       Statistical models of this kind have achieved F-measures in
       the low 70% range against the Penn Tree Bank.



                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Probabilistic Context-Free Grammars


       The probabilistic parameter values of a PCFG can be
       learned from a parse annotated training corpus by
       computing the frequency of CFG rules in accordance with
       a Maximum Likelihood Expectation (MLE) condition.

        c(A→β1 ...βk )
          c(A→γ)

       Statistical models of this kind have achieved F-measures in
       the low 70% range against the Penn Tree Bank.



                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Probabilistic Context-Free Grammars


       The probabilistic parameter values of a PCFG can be
       learned from a parse annotated training corpus by
       computing the frequency of CFG rules in accordance with
       a Maximum Likelihood Expectation (MLE) condition.

        c(A→β1 ...βk )
          c(A→γ)

       Statistical models of this kind have achieved F-measures in
       the low 70% range against the Penn Tree Bank.



                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Lexicalized Probabilistic Context-Free Grammars


       It is possible to significantly improve the performance of a
       PCFG by adding additional bias to the language model that
       it defines.
       Collins (1999) constructs a Lexicalized Probabilistic
       Context-Free Grammar (LPCFG) in which the probabilities
       of the CFG rules are conditioning on lexical heads of the
       phrases that nonterminal symbols represent.
       In Collins’ LPCFGs nonterminals are replaced by
       nonterminal/head pairs.



                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Lexicalized Probabilistic Context-Free Grammars


       It is possible to significantly improve the performance of a
       PCFG by adding additional bias to the language model that
       it defines.
       Collins (1999) constructs a Lexicalized Probabilistic
       Context-Free Grammar (LPCFG) in which the probabilities
       of the CFG rules are conditioning on lexical heads of the
       phrases that nonterminal symbols represent.
       In Collins’ LPCFGs nonterminals are replaced by
       nonterminal/head pairs.



                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Lexicalized Probabilistic Context-Free Grammars


       It is possible to significantly improve the performance of a
       PCFG by adding additional bias to the language model that
       it defines.
       Collins (1999) constructs a Lexicalized Probabilistic
       Context-Free Grammar (LPCFG) in which the probabilities
       of the CFG rules are conditioning on lexical heads of the
       phrases that nonterminal symbols represent.
       In Collins’ LPCFGs nonterminals are replaced by
       nonterminal/head pairs.



                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Lexicalized Probabilistic Context-Free Grammars



       The probability distributions of the model are of the form
       Ps (N/h) and P(X1 /h1 · · · H/h · · · Xn /hn | N/h).
       Collins’ LPCFG achieves an F-measure performance of
       approximately 88%.
       Charniak and Johnson (2005) present a LPCFG with an F
       score of approximately 91%.




                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Lexicalized Probabilistic Context-Free Grammars



       The probability distributions of the model are of the form
       Ps (N/h) and P(X1 /h1 · · · H/h · · · Xn /hn | N/h).
       Collins’ LPCFG achieves an F-measure performance of
       approximately 88%.
       Charniak and Johnson (2005) present a LPCFG with an F
       score of approximately 91%.




                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Lexicalized Probabilistic Context-Free Grammars



       The probability distributions of the model are of the form
       Ps (N/h) and P(X1 /h1 · · · H/h · · · Xn /hn | N/h).
       Collins’ LPCFG achieves an F-measure performance of
       approximately 88%.
       Charniak and Johnson (2005) present a LPCFG with an F
       score of approximately 91%.




                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Bias in a LPCFG


      Rather than encoding a particular categorical bias into his
      language model by excluding certain context-free rules,
      Collins allows all such rules.
      He incorporates bias by adjusting the prior distribution of
      probabilities over the lexicalized CFG rules.
      The model imposes the requirements that
              sentences have hierarchical constituent structure,
              constituents have heads that select for their siblings, and
              this selection is determined by the head words of the
              siblings.



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Bias in a LPCFG


      Rather than encoding a particular categorical bias into his
      language model by excluding certain context-free rules,
      Collins allows all such rules.
      He incorporates bias by adjusting the prior distribution of
      probabilities over the lexicalized CFG rules.
      The model imposes the requirements that
              sentences have hierarchical constituent structure,
              constituents have heads that select for their siblings, and
              this selection is determined by the head words of the
              siblings.



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Bias in a LPCFG


      Rather than encoding a particular categorical bias into his
      language model by excluding certain context-free rules,
      Collins allows all such rules.
      He incorporates bias by adjusting the prior distribution of
      probabilities over the lexicalized CFG rules.
      The model imposes the requirements that
              sentences have hierarchical constituent structure,
              constituents have heads that select for their siblings, and
              this selection is determined by the head words of the
              siblings.



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Bias in a LPCFG


      Rather than encoding a particular categorical bias into his
      language model by excluding certain context-free rules,
      Collins allows all such rules.
      He incorporates bias by adjusting the prior distribution of
      probabilities over the lexicalized CFG rules.
      The model imposes the requirements that
              sentences have hierarchical constituent structure,
              constituents have heads that select for their siblings, and
              this selection is determined by the head words of the
              siblings.



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Bias in a LPCFG


      Rather than encoding a particular categorical bias into his
      language model by excluding certain context-free rules,
      Collins allows all such rules.
      He incorporates bias by adjusting the prior distribution of
      probabilities over the lexicalized CFG rules.
      The model imposes the requirements that
              sentences have hierarchical constituent structure,
              constituents have heads that select for their siblings, and
              this selection is determined by the head words of the
              siblings.



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


LPCFG as a Weak Bias Model


      The bias that Collins, and Charniak and Johnson specify
      for their respective LPCFGs do not express the complex
      syntactic parameters that have been proposed as
      elements of a strong bias view of universal grammar.
      So, for example, these models do not contain a
      head-complement directionality parameter.
      However, they still learn the correct generalizations
      concerning head-complement order.
      The bias of a statistical parsing model has implications for
      the design of UG.



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


LPCFG as a Weak Bias Model


      The bias that Collins, and Charniak and Johnson specify
      for their respective LPCFGs do not express the complex
      syntactic parameters that have been proposed as
      elements of a strong bias view of universal grammar.
      So, for example, these models do not contain a
      head-complement directionality parameter.
      However, they still learn the correct generalizations
      concerning head-complement order.
      The bias of a statistical parsing model has implications for
      the design of UG.



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


LPCFG as a Weak Bias Model


      The bias that Collins, and Charniak and Johnson specify
      for their respective LPCFGs do not express the complex
      syntactic parameters that have been proposed as
      elements of a strong bias view of universal grammar.
      So, for example, these models do not contain a
      head-complement directionality parameter.
      However, they still learn the correct generalizations
      concerning head-complement order.
      The bias of a statistical parsing model has implications for
      the design of UG.



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


LPCFG as a Weak Bias Model


      The bias that Collins, and Charniak and Johnson specify
      for their respective LPCFGs do not express the complex
      syntactic parameters that have been proposed as
      elements of a strong bias view of universal grammar.
      So, for example, these models do not contain a
      head-complement directionality parameter.
      However, they still learn the correct generalizations
      concerning head-complement order.
      The bias of a statistical parsing model has implications for
      the design of UG.



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Bayesian Learning


      Let D be data, and H a hypothesis.
      Maximum likelihood chooses the H which makes the D
      most likely: argmaxH P(D|H)
      Posterior probability is proportional to the prior probability
      times the likelihood.
      P(H|D) ∝ P(H)P(D|H)
      The maximum a posteriori approach chooses the H which
      maximises the posterior probability: argmaxH P(H)P(D|H)
      The bias is explicitly represented in the prior P(H).



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Bayesian Learning


      Let D be data, and H a hypothesis.
      Maximum likelihood chooses the H which makes the D
      most likely: argmaxH P(D|H)
      Posterior probability is proportional to the prior probability
      times the likelihood.
      P(H|D) ∝ P(H)P(D|H)
      The maximum a posteriori approach chooses the H which
      maximises the posterior probability: argmaxH P(H)P(D|H)
      The bias is explicitly represented in the prior P(H).



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Bayesian Learning


      Let D be data, and H a hypothesis.
      Maximum likelihood chooses the H which makes the D
      most likely: argmaxH P(D|H)
      Posterior probability is proportional to the prior probability
      times the likelihood.
      P(H|D) ∝ P(H)P(D|H)
      The maximum a posteriori approach chooses the H which
      maximises the posterior probability: argmaxH P(H)P(D|H)
      The bias is explicitly represented in the prior P(H).



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Bayesian Learning


      Let D be data, and H a hypothesis.
      Maximum likelihood chooses the H which makes the D
      most likely: argmaxH P(D|H)
      Posterior probability is proportional to the prior probability
      times the likelihood.
      P(H|D) ∝ P(H)P(D|H)
      The maximum a posteriori approach chooses the H which
      maximises the posterior probability: argmaxH P(H)P(D|H)
      The bias is explicitly represented in the prior P(H).



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
             Supervised vs. Unsupervised Learning
   Supervised Learning with a Probabilistic Grammar
                       A Bayesian Reply to the APS


Bayesian Learning


      Let D be data, and H a hypothesis.
      Maximum likelihood chooses the H which makes the D
      most likely: argmaxH P(D|H)
      Posterior probability is proportional to the prior probability
      times the likelihood.
      P(H|D) ∝ P(H)P(D|H)
      The maximum a posteriori approach chooses the H which
      maximises the posterior probability: argmaxH P(H)P(D|H)
      The bias is explicitly represented in the prior P(H).



                                  Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
                 Supervised vs. Unsupervised Learning
       Supervised Learning with a Probabilistic Grammar
                           A Bayesian Reply to the APS


Acquisition with a Weak Bias
Perfors et al. (2006)




          Perfors defines a very general prior that does not have a
          bias towards constituent structure.
          It includes both grammars that impose hierarchical
          constituent structure and those that don’t.
          In general it favours smaller, simpler grammars, as
          expressed in terms of the number of rules and symbols.
          Perfors et al. compute the posterior probability of three
          types of grammar for a subset of the CHILDES corpus.




                                      Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
                 Supervised vs. Unsupervised Learning
       Supervised Learning with a Probabilistic Grammar
                           A Bayesian Reply to the APS


Acquisition with a Weak Bias
Perfors et al. (2006)




          Perfors defines a very general prior that does not have a
          bias towards constituent structure.
          It includes both grammars that impose hierarchical
          constituent structure and those that don’t.
          In general it favours smaller, simpler grammars, as
          expressed in terms of the number of rules and symbols.
          Perfors et al. compute the posterior probability of three
          types of grammar for a subset of the CHILDES corpus.




                                      Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
                 Supervised vs. Unsupervised Learning
       Supervised Learning with a Probabilistic Grammar
                           A Bayesian Reply to the APS


Acquisition with a Weak Bias
Perfors et al. (2006)




          Perfors defines a very general prior that does not have a
          bias towards constituent structure.
          It includes both grammars that impose hierarchical
          constituent structure and those that don’t.
          In general it favours smaller, simpler grammars, as
          expressed in terms of the number of rules and symbols.
          Perfors et al. compute the posterior probability of three
          types of grammar for a subset of the CHILDES corpus.




                                      Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
                 Supervised vs. Unsupervised Learning
       Supervised Learning with a Probabilistic Grammar
                           A Bayesian Reply to the APS


Acquisition with a Weak Bias
Perfors et al. (2006)




          Perfors defines a very general prior that does not have a
          bias towards constituent structure.
          It includes both grammars that impose hierarchical
          constituent structure and those that don’t.
          In general it favours smaller, simpler grammars, as
          expressed in terms of the number of rules and symbols.
          Perfors et al. compute the posterior probability of three
          types of grammar for a subset of the CHILDES corpus.




                                      Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Acquisition without a Constituent Structure Bias




       The three types of grammar that Perfors et al consider are:
               a flat grammar that generates strings directly from S without
               intermediate non-terminal symbols,
               a probabilistic regular grammar (PRG), and
               a probabilistic context free grammar (PCFG).




                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Acquisition without a Constituent Structure Bias




       The three types of grammar that Perfors et al consider are:
               a flat grammar that generates strings directly from S without
               intermediate non-terminal symbols,
               a probabilistic regular grammar (PRG), and
               a probabilistic context free grammar (PCFG).




                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Acquisition without a Constituent Structure Bias




       The three types of grammar that Perfors et al consider are:
               a flat grammar that generates strings directly from S without
               intermediate non-terminal symbols,
               a probabilistic regular grammar (PRG), and
               a probabilistic context free grammar (PCFG).




                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Acquisition without a Constituent Structure Bias




       The three types of grammar that Perfors et al consider are:
               a flat grammar that generates strings directly from S without
               intermediate non-terminal symbols,
               a probabilistic regular grammar (PRG), and
               a probabilistic context free grammar (PCFG).




                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Acquisition without a Constituent Structure Bias


       The PCFG receives a higher posterior probability value
       and covers significantly more sentence types in the corpus
       than either the PRG or the flat grammar.
       The grammar with maximum a posteriori probability makes
       the correct generalisation.
       This result suggests that it may be possible to decide
       among radically distinct types of grammars on the basis of
       a probabilistic model with relatively weak learning priors,
       for a realistic data set.



                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Acquisition without a Constituent Structure Bias


       The PCFG receives a higher posterior probability value
       and covers significantly more sentence types in the corpus
       than either the PRG or the flat grammar.
       The grammar with maximum a posteriori probability makes
       the correct generalisation.
       This result suggests that it may be possible to decide
       among radically distinct types of grammars on the basis of
       a probabilistic model with relatively weak learning priors,
       for a realistic data set.



                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1
The Machine Learning Paradigm
              Supervised vs. Unsupervised Learning
    Supervised Learning with a Probabilistic Grammar
                        A Bayesian Reply to the APS


Acquisition without a Constituent Structure Bias


       The PCFG receives a higher posterior probability value
       and covers significantly more sentence types in the corpus
       than either the PRG or the flat grammar.
       The grammar with maximum a posteriori probability makes
       the correct generalisation.
       This result suggests that it may be possible to decide
       among radically distinct types of grammars on the basis of
       a probabilistic model with relatively weak learning priors,
       for a realistic data set.



                                   Clark and Lappin    Grammar Induction Through Machine Learning Part 1

More Related Content

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Machine Learning Grammar Induction

  • 1. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Grammar Induction Through Machine Learning Part 1 Linguistic Nativism Reconsidered Alexander Clark1 and Shalom Lappin2 1 Department of Computer Science Royal Holloway, University of London 2 Department of Philosophy King’s College, London February 5, 2008 Department of Philosophy King’s College, London Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 2. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Outline 1 The Machine Learning Paradigm 2 Supervised vs. Unsupervised Learning 3 Supervised Learning with a Probabilistic Grammar 4 A Bayesian Reply to the APS Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 3. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Machine Learning Algorithms and Models of the Learning Domain A machine learning system implements a learning algorithm that defines a function from a domain of input samples to a range of output values. A corpus of examples is divided into a training and a test set. The learning algorithm is specified in conjunction with a model of the phenomenon to be learned. This model defines the space of possible hypotheses that the algorithm can generate from the input data. When the values of its parameters are set through training of the algorithm on the test set, an element of the hypothesis space is selected. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 4. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Machine Learning Algorithms and Models of the Learning Domain A machine learning system implements a learning algorithm that defines a function from a domain of input samples to a range of output values. A corpus of examples is divided into a training and a test set. The learning algorithm is specified in conjunction with a model of the phenomenon to be learned. This model defines the space of possible hypotheses that the algorithm can generate from the input data. When the values of its parameters are set through training of the algorithm on the test set, an element of the hypothesis space is selected. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 5. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Machine Learning Algorithms and Models of the Learning Domain A machine learning system implements a learning algorithm that defines a function from a domain of input samples to a range of output values. A corpus of examples is divided into a training and a test set. The learning algorithm is specified in conjunction with a model of the phenomenon to be learned. This model defines the space of possible hypotheses that the algorithm can generate from the input data. When the values of its parameters are set through training of the algorithm on the test set, an element of the hypothesis space is selected. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 6. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Machine Learning Algorithms and Models of the Learning Domain A machine learning system implements a learning algorithm that defines a function from a domain of input samples to a range of output values. A corpus of examples is divided into a training and a test set. The learning algorithm is specified in conjunction with a model of the phenomenon to be learned. This model defines the space of possible hypotheses that the algorithm can generate from the input data. When the values of its parameters are set through training of the algorithm on the test set, an element of the hypothesis space is selected. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 7. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Machine Learning Algorithms and Models of the Learning Domain A machine learning system implements a learning algorithm that defines a function from a domain of input samples to a range of output values. A corpus of examples is divided into a training and a test set. The learning algorithm is specified in conjunction with a model of the phenomenon to be learned. This model defines the space of possible hypotheses that the algorithm can generate from the input data. When the values of its parameters are set through training of the algorithm on the test set, an element of the hypothesis space is selected. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 8. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Evaluating a Parsing Algorithm Supervised Learning If one has a gold standard of correct parses in a corpus, then it is possible to compute the percentage of correct parses that the algorithm produces for a blind test subpart of this corpus. A more common procedure for scoring an ML algorithm on a test set is to determine its performance for recall and precision. The recall of a parsing algorithm A is the percentage of labelled brackets of the test set that it correctly identifies. A’s precision is the percentage of the brackets that it returns which correspond to those in the gold standard. A unified score for A, known as an F score, can be computed as an average of its recall and its precision. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 9. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Evaluating a Parsing Algorithm Supervised Learning If one has a gold standard of correct parses in a corpus, then it is possible to compute the percentage of correct parses that the algorithm produces for a blind test subpart of this corpus. A more common procedure for scoring an ML algorithm on a test set is to determine its performance for recall and precision. The recall of a parsing algorithm A is the percentage of labelled brackets of the test set that it correctly identifies. A’s precision is the percentage of the brackets that it returns which correspond to those in the gold standard. A unified score for A, known as an F score, can be computed as an average of its recall and its precision. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 10. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Evaluating a Parsing Algorithm Supervised Learning If one has a gold standard of correct parses in a corpus, then it is possible to compute the percentage of correct parses that the algorithm produces for a blind test subpart of this corpus. A more common procedure for scoring an ML algorithm on a test set is to determine its performance for recall and precision. The recall of a parsing algorithm A is the percentage of labelled brackets of the test set that it correctly identifies. A’s precision is the percentage of the brackets that it returns which correspond to those in the gold standard. A unified score for A, known as an F score, can be computed as an average of its recall and its precision. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 11. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Evaluating a Parsing Algorithm Supervised Learning If one has a gold standard of correct parses in a corpus, then it is possible to compute the percentage of correct parses that the algorithm produces for a blind test subpart of this corpus. A more common procedure for scoring an ML algorithm on a test set is to determine its performance for recall and precision. The recall of a parsing algorithm A is the percentage of labelled brackets of the test set that it correctly identifies. A’s precision is the percentage of the brackets that it returns which correspond to those in the gold standard. A unified score for A, known as an F score, can be computed as an average of its recall and its precision. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 12. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Evaluating a Parsing Algorithm Supervised Learning If one has a gold standard of correct parses in a corpus, then it is possible to compute the percentage of correct parses that the algorithm produces for a blind test subpart of this corpus. A more common procedure for scoring an ML algorithm on a test set is to determine its performance for recall and precision. The recall of a parsing algorithm A is the percentage of labelled brackets of the test set that it correctly identifies. A’s precision is the percentage of the brackets that it returns which correspond to those in the gold standard. A unified score for A, known as an F score, can be computed as an average of its recall and its precision. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 13. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Learning Biases and Priors The choice of parameters and their possible values defines a bias for the language model by imposing prior constraints on the set of learnable hypotheses. All learning requires some sort of bias to restrict the set of possible hypotheses for the phenomenon to be learned. This bias can express strong assumptions about the nature of the domain of learning. Alternatively, it can define comparatively weak domain-specific constraints, with learning driven primarily by domain-general procedures and conditions. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 14. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Learning Biases and Priors The choice of parameters and their possible values defines a bias for the language model by imposing prior constraints on the set of learnable hypotheses. All learning requires some sort of bias to restrict the set of possible hypotheses for the phenomenon to be learned. This bias can express strong assumptions about the nature of the domain of learning. Alternatively, it can define comparatively weak domain-specific constraints, with learning driven primarily by domain-general procedures and conditions. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 15. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Learning Biases and Priors The choice of parameters and their possible values defines a bias for the language model by imposing prior constraints on the set of learnable hypotheses. All learning requires some sort of bias to restrict the set of possible hypotheses for the phenomenon to be learned. This bias can express strong assumptions about the nature of the domain of learning. Alternatively, it can define comparatively weak domain-specific constraints, with learning driven primarily by domain-general procedures and conditions. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 16. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Learning Biases and Priors The choice of parameters and their possible values defines a bias for the language model by imposing prior constraints on the set of learnable hypotheses. All learning requires some sort of bias to restrict the set of possible hypotheses for the phenomenon to be learned. This bias can express strong assumptions about the nature of the domain of learning. Alternatively, it can define comparatively weak domain-specific constraints, with learning driven primarily by domain-general procedures and conditions. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 17. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Prior Probability Distributions on a Hypothesis Space One way of formalising this learning bias is a prior probability distribution on the elements of the hypothesis space that favours some hypotheses as more likely than others. The paradigm of Bayesian learning in cognitive science implements this approach. The simplicity and compactness measure that Perfors et al. (2006) use is an example of a very general prior. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 18. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Prior Probability Distributions on a Hypothesis Space One way of formalising this learning bias is a prior probability distribution on the elements of the hypothesis space that favours some hypotheses as more likely than others. The paradigm of Bayesian learning in cognitive science implements this approach. The simplicity and compactness measure that Perfors et al. (2006) use is an example of a very general prior. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 19. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Prior Probability Distributions on a Hypothesis Space One way of formalising this learning bias is a prior probability distribution on the elements of the hypothesis space that favours some hypotheses as more likely than others. The paradigm of Bayesian learning in cognitive science implements this approach. The simplicity and compactness measure that Perfors et al. (2006) use is an example of a very general prior. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 20. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Learning Bias and the Poverty of Stimulus The poverty of stimulus issue can be formulated as follows: What are the minimal domain-specific linguistic biases that must be assumed for a reasonable learning algorithm to support language acquisition on the basis of the training set available to the child? If a model with relatively weak language-specific biases can sustain effective grammar induction, then this result undermines poverty of stimulus arguments for a rich theory of universal grammar. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 21. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Learning Bias and the Poverty of Stimulus The poverty of stimulus issue can be formulated as follows: What are the minimal domain-specific linguistic biases that must be assumed for a reasonable learning algorithm to support language acquisition on the basis of the training set available to the child? If a model with relatively weak language-specific biases can sustain effective grammar induction, then this result undermines poverty of stimulus arguments for a rich theory of universal grammar. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 22. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Learning Bias and the Poverty of Stimulus The poverty of stimulus issue can be formulated as follows: What are the minimal domain-specific linguistic biases that must be assumed for a reasonable learning algorithm to support language acquisition on the basis of the training set available to the child? If a model with relatively weak language-specific biases can sustain effective grammar induction, then this result undermines poverty of stimulus arguments for a rich theory of universal grammar. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 23. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Supervised Learning When the samples of the training set are annotated with the classifications and structures that the learning algorithm is intended to produce as output for the test set, then learning is supervised. Supervised grammar induction involves training an ML procedure on a corpus annotated with the parse structures of the gold standard. The learning algorithm infers a function for assigning input sentences to appropriate parse output on the basis of a training set of sentence argument-parse value pairs. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 24. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Supervised Learning When the samples of the training set are annotated with the classifications and structures that the learning algorithm is intended to produce as output for the test set, then learning is supervised. Supervised grammar induction involves training an ML procedure on a corpus annotated with the parse structures of the gold standard. The learning algorithm infers a function for assigning input sentences to appropriate parse output on the basis of a training set of sentence argument-parse value pairs. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 25. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Supervised Learning When the samples of the training set are annotated with the classifications and structures that the learning algorithm is intended to produce as output for the test set, then learning is supervised. Supervised grammar induction involves training an ML procedure on a corpus annotated with the parse structures of the gold standard. The learning algorithm infers a function for assigning input sentences to appropriate parse output on the basis of a training set of sentence argument-parse value pairs. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 26. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Unsupervised Learning If the test set is not marked with the properties to be returned as output for the test set, then learning is unsupervised. Unsupervised learning involves using clustering patterns and distributional regularities in a training set to identify structure in the data. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 27. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Unsupervised Learning If the test set is not marked with the properties to be returned as output for the test set, then learning is unsupervised. Unsupervised learning involves using clustering patterns and distributional regularities in a training set to identify structure in the data. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 28. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Supervised Learning and Language Acquisition It could be argued that supervised grammar induction is not directly relevant to poverty of stimulus arguments. It requires that target parse structures be represented in the training set, while children have no access to such representations in the data they are exposed to. If negative evidence of the sort identified by Saxton (1997), and Chouinard and Clark (2003) is available and plays a role in grammar induction, then it is possible to model the acquisition process as a type of supervised learning. If, however, children achieve language solely on the basis of positive evidence, then it is necessary to treat acquisition as unsupervised learning. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 29. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Supervised Learning and Language Acquisition It could be argued that supervised grammar induction is not directly relevant to poverty of stimulus arguments. It requires that target parse structures be represented in the training set, while children have no access to such representations in the data they are exposed to. If negative evidence of the sort identified by Saxton (1997), and Chouinard and Clark (2003) is available and plays a role in grammar induction, then it is possible to model the acquisition process as a type of supervised learning. If, however, children achieve language solely on the basis of positive evidence, then it is necessary to treat acquisition as unsupervised learning. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 30. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Supervised Learning and Language Acquisition It could be argued that supervised grammar induction is not directly relevant to poverty of stimulus arguments. It requires that target parse structures be represented in the training set, while children have no access to such representations in the data they are exposed to. If negative evidence of the sort identified by Saxton (1997), and Chouinard and Clark (2003) is available and plays a role in grammar induction, then it is possible to model the acquisition process as a type of supervised learning. If, however, children achieve language solely on the basis of positive evidence, then it is necessary to treat acquisition as unsupervised learning. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 31. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Supervised Learning and Language Acquisition It could be argued that supervised grammar induction is not directly relevant to poverty of stimulus arguments. It requires that target parse structures be represented in the training set, while children have no access to such representations in the data they are exposed to. If negative evidence of the sort identified by Saxton (1997), and Chouinard and Clark (2003) is available and plays a role in grammar induction, then it is possible to model the acquisition process as a type of supervised learning. If, however, children achieve language solely on the basis of positive evidence, then it is necessary to treat acquisition as unsupervised learning. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 32. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Probabilistic Context-Free Grammars A Probabilistic Context-Free Grammar (PCFG) conditions the probability of a child nonterminal sequence on that of the parent nonterminal. It provides conditional probabilities of the form P(X1 · · · Xn | N) for each nonterminal N and sequence X1 · · · Xn of items from the vocabulary of the grammar. It also specifies a probability distribution over the label of the root of the tree Ps (N). The conditional probabilities P(X1 · · · Xn | N) correspond to probabilistic parameters that govern the expansion of a node in a parse tree according to a context free rule N → X1 · · · Xn . Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 33. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Probabilistic Context-Free Grammars A Probabilistic Context-Free Grammar (PCFG) conditions the probability of a child nonterminal sequence on that of the parent nonterminal. It provides conditional probabilities of the form P(X1 · · · Xn | N) for each nonterminal N and sequence X1 · · · Xn of items from the vocabulary of the grammar. It also specifies a probability distribution over the label of the root of the tree Ps (N). The conditional probabilities P(X1 · · · Xn | N) correspond to probabilistic parameters that govern the expansion of a node in a parse tree according to a context free rule N → X1 · · · Xn . Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 34. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Probabilistic Context-Free Grammars A Probabilistic Context-Free Grammar (PCFG) conditions the probability of a child nonterminal sequence on that of the parent nonterminal. It provides conditional probabilities of the form P(X1 · · · Xn | N) for each nonterminal N and sequence X1 · · · Xn of items from the vocabulary of the grammar. It also specifies a probability distribution over the label of the root of the tree Ps (N). The conditional probabilities P(X1 · · · Xn | N) correspond to probabilistic parameters that govern the expansion of a node in a parse tree according to a context free rule N → X1 · · · Xn . Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 35. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Probabilistic Context-Free Grammars A Probabilistic Context-Free Grammar (PCFG) conditions the probability of a child nonterminal sequence on that of the parent nonterminal. It provides conditional probabilities of the form P(X1 · · · Xn | N) for each nonterminal N and sequence X1 · · · Xn of items from the vocabulary of the grammar. It also specifies a probability distribution over the label of the root of the tree Ps (N). The conditional probabilities P(X1 · · · Xn | N) correspond to probabilistic parameters that govern the expansion of a node in a parse tree according to a context free rule N → X1 · · · Xn . Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 36. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Probabilistic Context-Free Grammars The probabilistic parameter values of a PCFG can be learned from a parse annotated training corpus by computing the frequency of CFG rules in accordance with a Maximum Likelihood Expectation (MLE) condition. c(A→β1 ...βk ) c(A→γ) Statistical models of this kind have achieved F-measures in the low 70% range against the Penn Tree Bank. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 37. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Probabilistic Context-Free Grammars The probabilistic parameter values of a PCFG can be learned from a parse annotated training corpus by computing the frequency of CFG rules in accordance with a Maximum Likelihood Expectation (MLE) condition. c(A→β1 ...βk ) c(A→γ) Statistical models of this kind have achieved F-measures in the low 70% range against the Penn Tree Bank. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 38. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Probabilistic Context-Free Grammars The probabilistic parameter values of a PCFG can be learned from a parse annotated training corpus by computing the frequency of CFG rules in accordance with a Maximum Likelihood Expectation (MLE) condition. c(A→β1 ...βk ) c(A→γ) Statistical models of this kind have achieved F-measures in the low 70% range against the Penn Tree Bank. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 39. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Lexicalized Probabilistic Context-Free Grammars It is possible to significantly improve the performance of a PCFG by adding additional bias to the language model that it defines. Collins (1999) constructs a Lexicalized Probabilistic Context-Free Grammar (LPCFG) in which the probabilities of the CFG rules are conditioning on lexical heads of the phrases that nonterminal symbols represent. In Collins’ LPCFGs nonterminals are replaced by nonterminal/head pairs. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 40. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Lexicalized Probabilistic Context-Free Grammars It is possible to significantly improve the performance of a PCFG by adding additional bias to the language model that it defines. Collins (1999) constructs a Lexicalized Probabilistic Context-Free Grammar (LPCFG) in which the probabilities of the CFG rules are conditioning on lexical heads of the phrases that nonterminal symbols represent. In Collins’ LPCFGs nonterminals are replaced by nonterminal/head pairs. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 41. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Lexicalized Probabilistic Context-Free Grammars It is possible to significantly improve the performance of a PCFG by adding additional bias to the language model that it defines. Collins (1999) constructs a Lexicalized Probabilistic Context-Free Grammar (LPCFG) in which the probabilities of the CFG rules are conditioning on lexical heads of the phrases that nonterminal symbols represent. In Collins’ LPCFGs nonterminals are replaced by nonterminal/head pairs. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 42. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Lexicalized Probabilistic Context-Free Grammars The probability distributions of the model are of the form Ps (N/h) and P(X1 /h1 · · · H/h · · · Xn /hn | N/h). Collins’ LPCFG achieves an F-measure performance of approximately 88%. Charniak and Johnson (2005) present a LPCFG with an F score of approximately 91%. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 43. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Lexicalized Probabilistic Context-Free Grammars The probability distributions of the model are of the form Ps (N/h) and P(X1 /h1 · · · H/h · · · Xn /hn | N/h). Collins’ LPCFG achieves an F-measure performance of approximately 88%. Charniak and Johnson (2005) present a LPCFG with an F score of approximately 91%. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 44. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Lexicalized Probabilistic Context-Free Grammars The probability distributions of the model are of the form Ps (N/h) and P(X1 /h1 · · · H/h · · · Xn /hn | N/h). Collins’ LPCFG achieves an F-measure performance of approximately 88%. Charniak and Johnson (2005) present a LPCFG with an F score of approximately 91%. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 45. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Bias in a LPCFG Rather than encoding a particular categorical bias into his language model by excluding certain context-free rules, Collins allows all such rules. He incorporates bias by adjusting the prior distribution of probabilities over the lexicalized CFG rules. The model imposes the requirements that sentences have hierarchical constituent structure, constituents have heads that select for their siblings, and this selection is determined by the head words of the siblings. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 46. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Bias in a LPCFG Rather than encoding a particular categorical bias into his language model by excluding certain context-free rules, Collins allows all such rules. He incorporates bias by adjusting the prior distribution of probabilities over the lexicalized CFG rules. The model imposes the requirements that sentences have hierarchical constituent structure, constituents have heads that select for their siblings, and this selection is determined by the head words of the siblings. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 47. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Bias in a LPCFG Rather than encoding a particular categorical bias into his language model by excluding certain context-free rules, Collins allows all such rules. He incorporates bias by adjusting the prior distribution of probabilities over the lexicalized CFG rules. The model imposes the requirements that sentences have hierarchical constituent structure, constituents have heads that select for their siblings, and this selection is determined by the head words of the siblings. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 48. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Bias in a LPCFG Rather than encoding a particular categorical bias into his language model by excluding certain context-free rules, Collins allows all such rules. He incorporates bias by adjusting the prior distribution of probabilities over the lexicalized CFG rules. The model imposes the requirements that sentences have hierarchical constituent structure, constituents have heads that select for their siblings, and this selection is determined by the head words of the siblings. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 49. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Bias in a LPCFG Rather than encoding a particular categorical bias into his language model by excluding certain context-free rules, Collins allows all such rules. He incorporates bias by adjusting the prior distribution of probabilities over the lexicalized CFG rules. The model imposes the requirements that sentences have hierarchical constituent structure, constituents have heads that select for their siblings, and this selection is determined by the head words of the siblings. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 50. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS LPCFG as a Weak Bias Model The bias that Collins, and Charniak and Johnson specify for their respective LPCFGs do not express the complex syntactic parameters that have been proposed as elements of a strong bias view of universal grammar. So, for example, these models do not contain a head-complement directionality parameter. However, they still learn the correct generalizations concerning head-complement order. The bias of a statistical parsing model has implications for the design of UG. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 51. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS LPCFG as a Weak Bias Model The bias that Collins, and Charniak and Johnson specify for their respective LPCFGs do not express the complex syntactic parameters that have been proposed as elements of a strong bias view of universal grammar. So, for example, these models do not contain a head-complement directionality parameter. However, they still learn the correct generalizations concerning head-complement order. The bias of a statistical parsing model has implications for the design of UG. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 52. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS LPCFG as a Weak Bias Model The bias that Collins, and Charniak and Johnson specify for their respective LPCFGs do not express the complex syntactic parameters that have been proposed as elements of a strong bias view of universal grammar. So, for example, these models do not contain a head-complement directionality parameter. However, they still learn the correct generalizations concerning head-complement order. The bias of a statistical parsing model has implications for the design of UG. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 53. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS LPCFG as a Weak Bias Model The bias that Collins, and Charniak and Johnson specify for their respective LPCFGs do not express the complex syntactic parameters that have been proposed as elements of a strong bias view of universal grammar. So, for example, these models do not contain a head-complement directionality parameter. However, they still learn the correct generalizations concerning head-complement order. The bias of a statistical parsing model has implications for the design of UG. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 54. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Bayesian Learning Let D be data, and H a hypothesis. Maximum likelihood chooses the H which makes the D most likely: argmaxH P(D|H) Posterior probability is proportional to the prior probability times the likelihood. P(H|D) ∝ P(H)P(D|H) The maximum a posteriori approach chooses the H which maximises the posterior probability: argmaxH P(H)P(D|H) The bias is explicitly represented in the prior P(H). Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 55. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Bayesian Learning Let D be data, and H a hypothesis. Maximum likelihood chooses the H which makes the D most likely: argmaxH P(D|H) Posterior probability is proportional to the prior probability times the likelihood. P(H|D) ∝ P(H)P(D|H) The maximum a posteriori approach chooses the H which maximises the posterior probability: argmaxH P(H)P(D|H) The bias is explicitly represented in the prior P(H). Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 56. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Bayesian Learning Let D be data, and H a hypothesis. Maximum likelihood chooses the H which makes the D most likely: argmaxH P(D|H) Posterior probability is proportional to the prior probability times the likelihood. P(H|D) ∝ P(H)P(D|H) The maximum a posteriori approach chooses the H which maximises the posterior probability: argmaxH P(H)P(D|H) The bias is explicitly represented in the prior P(H). Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 57. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Bayesian Learning Let D be data, and H a hypothesis. Maximum likelihood chooses the H which makes the D most likely: argmaxH P(D|H) Posterior probability is proportional to the prior probability times the likelihood. P(H|D) ∝ P(H)P(D|H) The maximum a posteriori approach chooses the H which maximises the posterior probability: argmaxH P(H)P(D|H) The bias is explicitly represented in the prior P(H). Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 58. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Bayesian Learning Let D be data, and H a hypothesis. Maximum likelihood chooses the H which makes the D most likely: argmaxH P(D|H) Posterior probability is proportional to the prior probability times the likelihood. P(H|D) ∝ P(H)P(D|H) The maximum a posteriori approach chooses the H which maximises the posterior probability: argmaxH P(H)P(D|H) The bias is explicitly represented in the prior P(H). Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 59. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Acquisition with a Weak Bias Perfors et al. (2006) Perfors defines a very general prior that does not have a bias towards constituent structure. It includes both grammars that impose hierarchical constituent structure and those that don’t. In general it favours smaller, simpler grammars, as expressed in terms of the number of rules and symbols. Perfors et al. compute the posterior probability of three types of grammar for a subset of the CHILDES corpus. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 60. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Acquisition with a Weak Bias Perfors et al. (2006) Perfors defines a very general prior that does not have a bias towards constituent structure. It includes both grammars that impose hierarchical constituent structure and those that don’t. In general it favours smaller, simpler grammars, as expressed in terms of the number of rules and symbols. Perfors et al. compute the posterior probability of three types of grammar for a subset of the CHILDES corpus. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 61. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Acquisition with a Weak Bias Perfors et al. (2006) Perfors defines a very general prior that does not have a bias towards constituent structure. It includes both grammars that impose hierarchical constituent structure and those that don’t. In general it favours smaller, simpler grammars, as expressed in terms of the number of rules and symbols. Perfors et al. compute the posterior probability of three types of grammar for a subset of the CHILDES corpus. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 62. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Acquisition with a Weak Bias Perfors et al. (2006) Perfors defines a very general prior that does not have a bias towards constituent structure. It includes both grammars that impose hierarchical constituent structure and those that don’t. In general it favours smaller, simpler grammars, as expressed in terms of the number of rules and symbols. Perfors et al. compute the posterior probability of three types of grammar for a subset of the CHILDES corpus. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 63. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Acquisition without a Constituent Structure Bias The three types of grammar that Perfors et al consider are: a flat grammar that generates strings directly from S without intermediate non-terminal symbols, a probabilistic regular grammar (PRG), and a probabilistic context free grammar (PCFG). Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 64. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Acquisition without a Constituent Structure Bias The three types of grammar that Perfors et al consider are: a flat grammar that generates strings directly from S without intermediate non-terminal symbols, a probabilistic regular grammar (PRG), and a probabilistic context free grammar (PCFG). Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 65. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Acquisition without a Constituent Structure Bias The three types of grammar that Perfors et al consider are: a flat grammar that generates strings directly from S without intermediate non-terminal symbols, a probabilistic regular grammar (PRG), and a probabilistic context free grammar (PCFG). Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 66. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Acquisition without a Constituent Structure Bias The three types of grammar that Perfors et al consider are: a flat grammar that generates strings directly from S without intermediate non-terminal symbols, a probabilistic regular grammar (PRG), and a probabilistic context free grammar (PCFG). Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 67. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Acquisition without a Constituent Structure Bias The PCFG receives a higher posterior probability value and covers significantly more sentence types in the corpus than either the PRG or the flat grammar. The grammar with maximum a posteriori probability makes the correct generalisation. This result suggests that it may be possible to decide among radically distinct types of grammars on the basis of a probabilistic model with relatively weak learning priors, for a realistic data set. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 68. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Acquisition without a Constituent Structure Bias The PCFG receives a higher posterior probability value and covers significantly more sentence types in the corpus than either the PRG or the flat grammar. The grammar with maximum a posteriori probability makes the correct generalisation. This result suggests that it may be possible to decide among radically distinct types of grammars on the basis of a probabilistic model with relatively weak learning priors, for a realistic data set. Clark and Lappin Grammar Induction Through Machine Learning Part 1
  • 69. The Machine Learning Paradigm Supervised vs. Unsupervised Learning Supervised Learning with a Probabilistic Grammar A Bayesian Reply to the APS Acquisition without a Constituent Structure Bias The PCFG receives a higher posterior probability value and covers significantly more sentence types in the corpus than either the PRG or the flat grammar. The grammar with maximum a posteriori probability makes the correct generalisation. This result suggests that it may be possible to decide among radically distinct types of grammars on the basis of a probabilistic model with relatively weak learning priors, for a realistic data set. Clark and Lappin Grammar Induction Through Machine Learning Part 1