SHALLOW PARSING
DEPARTMENT OF LINGUISTICS
Shallow is an Adjective form - 1. of little depth –
"serve the noodles in a shallow bowl"
Parsing is a noun of gerundial form
Parsing means to divide into parts and describe the relations
among the parts.
The parser is a program that parses i.e. divides the given input
into parts and describes the relation among them.
It resolves (a sentence) into its component parts and describe their
syntactic roles.
A parser can have a word as an input or a sentence as an input.
When the input is a word, it is usually known as a morphological
analyzer.
The word parser typically is restricted to the sentence level
analyzer.
When the input is a sentence, it is usually known as a syntactic
parser.
Shallow parsing is nothing but the partial parsing. In shallow parsing,
it assigns, partial syntactic structures to sentences.
It is not full parsing. In full parsing, a grammar is used to assign a
complete syntactic structure to sentences.
Parsed corpora are sometimes known as treebanks.
S
NP VP
N
PP
V P NP
AT N
Daniel sat throneon the
[S
[NP DANIEL NP]
[VP SAT
[PP ON
[NP THE THRONE NP]
PP]
VP]
S]
[S [NP Daniel] [VP sat [PP on [NP the throne]]]]
Approaches
to NLP
Shallow App.
to NLP
Deep App. to
NLP
Shallow NLP is the main approach. The main reasons are:
1. Robustness to noise
2. Low need of training resource (such as tagged corpora)
3. Efficiency in terms of calculation which is important if we
deals with large amount of texts.
CONSTITUENT STRUCTURE ANALYSIS
Thus a parser takes the sentence as input and analysis them in
terms of its constituent parts and describes the relation between
these parts.
[S
[NP DANIEL NP]
[VP SAT
[PP ON
[NP THE THRONE NP]
PP]
VP]
S]
[S [NP Daniel] [VP sat [PP on [NP the throne]]]]
For example,
“Daniel sat on the throne.” is analyzed as follows:
A shallow parser may identify some phrasal constituents, such as noun
phrases, without indicating their internal structure and their function in
the sentence.
Another type of shallow analysis identifies the functional role of some
of the words, such as the main verb, and its direct arguments.
Systems for shallow parsing normally work on top of
morphological analysis and disambiguation.
The basic purpose is to infer as much syntactic structure as possible
from the lemma, morphological information, and word order
configuration at hand.
Typically, shallow parsing aims at detecting phrases and basic
head/modifier relations.
A shared concern of many shallow parsers is the application to
large text corpora.
Frequently partial analyses are allowed if the parser is not potent
enough to resolve all problems.
Church has designed a stochastic program for locating simple noun
phrases which are identified by inserting appropriate brackets, [...].
Abney (1991) is credited with being the first to argue for the
relevance of shallow parsing, both from the point of view of
psycholinguistic evidence and from the point of view of practical
applications.
His own approach used hand-crafted cascaded finite state
transducers to get at a shallow parse.
Typical modules within shallow parser architecture include the
following:
1. Part-of-speech tagging. Given a word and its context, decide what
the correct morphosyntactic class of that word is (noun, verb, etc.).
Pos tagging is a well-understood problem in NLP, to which machine
learning approaches are routinely applied.
2. chunking. given the words and their morphosyntactic class, decide
which words can be grouped as chunks (noun phrases, verb phrases,
complete clauses, etc.)
3. Relation finding. given the chunks in a sentence, decide which
relations they have with the main verb (subject, object, location,
etc.)
Because shallow parsers have to deal with natural languages in their
entirety, they are large, and frequently contain thousands of rules.
For example, a rule might state that determiners (words such as the)
are good predictors of noun phrases.
Building shallow parsers is therefore a labor-intensive task.
These rule sets also tend to be largely ‘soft’, in that exceptions
abound.
The shallow parsers are usually automatically built, using techniques
originating within the machine learning (or statistical) community.
This kind of analysis is known as Constituents Structure analysis
where it is usually represented in terms of a labeled bracketing or
corresponding tree diagram.
Another type of analysis is the one where the relations between
different words in the sentence are shown. This kind of analysis
known as Dependency Analysis.
Chunk Tagset
NP marks a chunk involving nouns, nouns modified by adjectives
and other noun phrases and postpositional phrases.
VP a verb group will include the main verb and its auxiliaries, if
any.
JJP in adjectival chunk consisting of all adjectives excluding the
pronominal modifiers
RBP include all and pure adverbial phrases.
BLK marks elements such as expressives, interjections etc.
CCP marks conjunct or disjunct structures
NEGP, marks usually a negative that is not included in any other
phrase.
6 shallow parsing introduction

6 shallow parsing introduction

  • 1.
  • 2.
    Shallow is anAdjective form - 1. of little depth – "serve the noodles in a shallow bowl" Parsing is a noun of gerundial form Parsing means to divide into parts and describe the relations among the parts. The parser is a program that parses i.e. divides the given input into parts and describes the relation among them.
  • 3.
    It resolves (asentence) into its component parts and describe their syntactic roles. A parser can have a word as an input or a sentence as an input. When the input is a word, it is usually known as a morphological analyzer. The word parser typically is restricted to the sentence level analyzer. When the input is a sentence, it is usually known as a syntactic parser.
  • 4.
    Shallow parsing isnothing but the partial parsing. In shallow parsing, it assigns, partial syntactic structures to sentences. It is not full parsing. In full parsing, a grammar is used to assign a complete syntactic structure to sentences. Parsed corpora are sometimes known as treebanks.
  • 5.
    S NP VP N PP V PNP AT N Daniel sat throneon the [S [NP DANIEL NP] [VP SAT [PP ON [NP THE THRONE NP] PP] VP] S] [S [NP Daniel] [VP sat [PP on [NP the throne]]]]
  • 6.
    Approaches to NLP Shallow App. toNLP Deep App. to NLP Shallow NLP is the main approach. The main reasons are: 1. Robustness to noise 2. Low need of training resource (such as tagged corpora) 3. Efficiency in terms of calculation which is important if we deals with large amount of texts.
  • 7.
    CONSTITUENT STRUCTURE ANALYSIS Thusa parser takes the sentence as input and analysis them in terms of its constituent parts and describes the relation between these parts.
  • 8.
    [S [NP DANIEL NP] [VPSAT [PP ON [NP THE THRONE NP] PP] VP] S] [S [NP Daniel] [VP sat [PP on [NP the throne]]]] For example, “Daniel sat on the throne.” is analyzed as follows:
  • 9.
    A shallow parsermay identify some phrasal constituents, such as noun phrases, without indicating their internal structure and their function in the sentence. Another type of shallow analysis identifies the functional role of some of the words, such as the main verb, and its direct arguments. Systems for shallow parsing normally work on top of morphological analysis and disambiguation.
  • 10.
    The basic purposeis to infer as much syntactic structure as possible from the lemma, morphological information, and word order configuration at hand. Typically, shallow parsing aims at detecting phrases and basic head/modifier relations. A shared concern of many shallow parsers is the application to large text corpora.
  • 11.
    Frequently partial analysesare allowed if the parser is not potent enough to resolve all problems. Church has designed a stochastic program for locating simple noun phrases which are identified by inserting appropriate brackets, [...].
  • 12.
    Abney (1991) iscredited with being the first to argue for the relevance of shallow parsing, both from the point of view of psycholinguistic evidence and from the point of view of practical applications. His own approach used hand-crafted cascaded finite state transducers to get at a shallow parse.
  • 13.
    Typical modules withinshallow parser architecture include the following: 1. Part-of-speech tagging. Given a word and its context, decide what the correct morphosyntactic class of that word is (noun, verb, etc.). Pos tagging is a well-understood problem in NLP, to which machine learning approaches are routinely applied.
  • 14.
    2. chunking. giventhe words and their morphosyntactic class, decide which words can be grouped as chunks (noun phrases, verb phrases, complete clauses, etc.) 3. Relation finding. given the chunks in a sentence, decide which relations they have with the main verb (subject, object, location, etc.)
  • 15.
    Because shallow parsershave to deal with natural languages in their entirety, they are large, and frequently contain thousands of rules. For example, a rule might state that determiners (words such as the) are good predictors of noun phrases. Building shallow parsers is therefore a labor-intensive task. These rule sets also tend to be largely ‘soft’, in that exceptions abound.
  • 16.
    The shallow parsersare usually automatically built, using techniques originating within the machine learning (or statistical) community.
  • 17.
    This kind ofanalysis is known as Constituents Structure analysis where it is usually represented in terms of a labeled bracketing or corresponding tree diagram. Another type of analysis is the one where the relations between different words in the sentence are shown. This kind of analysis known as Dependency Analysis.
  • 18.
    Chunk Tagset NP marksa chunk involving nouns, nouns modified by adjectives and other noun phrases and postpositional phrases. VP a verb group will include the main verb and its auxiliaries, if any. JJP in adjectival chunk consisting of all adjectives excluding the pronominal modifiers RBP include all and pure adverbial phrases.
  • 19.
    BLK marks elementssuch as expressives, interjections etc. CCP marks conjunct or disjunct structures NEGP, marks usually a negative that is not included in any other phrase.