SlideShare a Scribd company logo
1 of 20
Treebank Annotation 
By – 
Mohit Jasapara – 2012EEB1059 
Aashish Kholiya – 2012MEB1083 
1
Treebank 
 The termtreebank was coined by linguist Geoffrey Leech in the 1980s because 
both syntactic and semantic structure are commonly represented compositionally 
as a tree structure. 
 In linguistics , a treebank is a parsed text corpus that annotates syntactic or 
semantic sentence structure. 
 In simple words, treebanks are collections of manually checked syntactic analyses 
of sentences. 
2
3 Treebank
Construction 
 Treebanks are often created on top of a corpus that has already been annotated 
with part-of-speech tags. 
 treebanks are sometimes enhanced with semantic or other linguistic information. 
 Treebanks can be created completely manually, where linguists annotate each 
sentence with syntactic structure, or semi-automatically, where a parser assigns 
some syntactic structure which linguists then check and, if necessary, correct 
4
Construction 
 In practice, fully checking and completing the parsing of natural language corpora 
is a labour-intensive project that can take teams of graduate linguists several years. 
 The level of annotation detail and the breadth of the linguistic sample determine 
the difficulty of the task and the length of time required to build a treebank. 
5
Construction 
 Some treebanks follow a specific linguistic theory in their syntactic annotation 
(e.g. the BulTreeBank follows HPSG) but most try to be less theory-specific. 
However, two main groups can be distinguished: 
treebanks that annotate phrase structure (for example the Penn Treebank or ICE-GB) 
and 
those that annotate dependency structure (for example the Prague Dependency 
Treebank or the Quranic Arabic Dependency Treebank). 
6
Construction 
 It is important to clarify the distinction between the formal representation and the 
file format used to store the annotated data. 
 Treebanks are necessarily constructed according to a particular grammar. The same 
grammar may be implemented by different file formats. 
7
Construction 
For example, the syntactic analysis for John loves Mary, shown in the figure on the 
right, may be represented by simple labelled brackets in a text file, like this (following 
the Penn Treebank notation): 
8
Construction 
 This type of representation is popular because it is light on resources, and the tree 
structure is relatively easy to read without software tools. However as corpora 
become increasingly complex, other file formats may be preferred. Alternatives 
include treebank-specific XML schemes, numbered indentation and various types 
of standoff notation. 
9
Applications 
Computational perspective 
 From a computational perspective, Treebank have been used to engineer state-of-the- 
art natural language processing systems such as part-of-speech 
taggers, parsers, semantic analyzers and machine translation systems. 
 Most computational systems utilize gold-standard Treebank data. 
 However, an automatically parsed corpus that is not corrected by human linguists 
can still be useful. 
10
Applications 
 It can provide evidence of rule frequency for a parser. 
 A parser may be improved by applying it to large amounts of text and gathering 
rule frequencies. 
 However, it should be obvious that only by a process of correcting and completing 
a corpus by hand is it possible then to identify rules absent from the parser 
knowledge base. In addition, frequencies are likely to be more accurate. 
11
Applications 
Corpus linguistics 
 In corpus linguistics, Treebank are used to study syntactic phenomena 
for example, diachronic corpora can be used to study the time course of syntactic 
change. 
 Once parsed, a corpus will contain frequency evidence showing how common 
different grammatical structures are in use. 
 Treebank also provide evidence of coverage and support the discovery of new, 
unanticipated, grammatical phenomena. 
. 
12
Applications 
 Interaction research is particularly fruitful as further layers of annotation, e.g. 
semantic, pragmatic, are added to a corpus. 
 It is then possible to evaluate the impact of non-syntactic phenomena on 
grammatical choices 
13
Applications 
Theoretical linguistics and Psycholinguistics 
 Another use of Treebank in theoretical linguistics and psycholinguistics is 
interaction evidence. 
 A completed Treebank can help linguists carry out experiments as to how the 
decision to use one grammatical construction tends to influence the decision to 
form others, and to try to understand how speakers and writers make decisions as 
they form sentences. 
14
Penn Treebank Project 
 The Penn Treebank Project annotates naturally-occurring text for linguistic 
structure. 
 Most notably, it produces skeletal parses showing rough syntactic and semantic 
information -- a bank of linguistic trees . 
 It also annotate text with part-of-speech tags, and for the Switchboard corpus of 
telephone conversations, dysfluency annotation. 
 It is located in the LINC Laboratory of the Computer and Information Science 
Department at the University of Pennsylvania. 
15
Penn Treebank Project 
 The Linguistic Data Consortium(LDC) provides tools and formats for creating and 
managing linguistic annotations. 
 `Linguistic annotation‘ covers any descriptive or analytic notations applied to raw 
language data. 
 The Penn Treebank is a human-annotated and partially `skeletally' parsed corpus 
consisting of over 4.5 million words of American English. 
 It includes the Brown Corpus (retagged) and the Wall Street Journal Corpus, as well 
as Department of Energy abstracts, Dow Jones Newswire stories, Department of 
Agriculture bulletins, Library of America texts, MUC-3 messages, IBM Manual 
sentences, WBUR radio transcripts, and ATIS sentences. 
16
17
18
References 
 http://en.wikipedia.org/wiki/Treebank 
 http://www.cis.upenn.edu/~treebank/ 
 https://catalog.ldc.upenn.edu/LDC97S62 
 http://mshang.ca/syntree/ 
 http://faculty.washington.edu/fxia/LAWVI/workshop_presentation_slides/special_se 
ssion/pml/ 
 http://www.seas.upenn.edu/~pdtb/tools.shtml 
19
20

More Related Content

What's hot

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Natural language processing
Natural language processingNatural language processing
Natural language processingYogendra Tamang
 
Compiler Design Lecture Notes
Compiler Design Lecture NotesCompiler Design Lecture Notes
Compiler Design Lecture NotesFellowBuddy.com
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Natural Language Processing - Unit 1
Natural Language Processing - Unit 1Natural Language Processing - Unit 1
Natural Language Processing - Unit 1Mithun B N
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text MiningSushanti Acharya
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
4 informed-search
4 informed-search4 informed-search
4 informed-searchMhd Sb
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 
Ambiguous & Unambiguous Grammar
Ambiguous & Unambiguous GrammarAmbiguous & Unambiguous Grammar
Ambiguous & Unambiguous GrammarMdImamHasan1
 
Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)
Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)
Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)Mohammad Ilyas Malik
 
5. phases of nlp
5. phases of nlp5. phases of nlp
5. phases of nlpmonircse2
 
Natural language processing
Natural language processingNatural language processing
Natural language processingAbash shah
 

What's hot (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
IR
IRIR
IR
 
Machine Tanslation
Machine TanslationMachine Tanslation
Machine Tanslation
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Compiler Design Lecture Notes
Compiler Design Lecture NotesCompiler Design Lecture Notes
Compiler Design Lecture Notes
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Natural Language Processing - Unit 1
Natural Language Processing - Unit 1Natural Language Processing - Unit 1
Natural Language Processing - Unit 1
 
NLP
NLPNLP
NLP
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text Mining
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
4 informed-search
4 informed-search4 informed-search
4 informed-search
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Ambiguous & Unambiguous Grammar
Ambiguous & Unambiguous GrammarAmbiguous & Unambiguous Grammar
Ambiguous & Unambiguous Grammar
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
NLP
NLPNLP
NLP
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)
Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)
Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)
 
5. phases of nlp
5. phases of nlp5. phases of nlp
5. phases of nlp
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Similar to Treebank annotation

Corpus study design
Corpus study designCorpus study design
Corpus study designbikashtaly
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsAlicia Ruiz
 
lexicography
lexicographylexicography
lexicographyayfa
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Umm-e-Rooman Yaqoob
 
English kazakh parallel corpus for statistical machine translation
English kazakh parallel corpus for statistical machine translationEnglish kazakh parallel corpus for statistical machine translation
English kazakh parallel corpus for statistical machine translationijnlc
 
Document Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language StructureDocument Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language Structurekevig
 
Document Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language StructureDocument Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language Structurekevig
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY mimisy
 
Building of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemBuilding of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemWaqas Tariq
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...IJITE
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...ijrap
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...gerogepatton
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics introAlex Curtis
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsRaul Vargas
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...Scott Bou
 
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYSTRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYkevig
 
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYSTRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYkevig
 

Similar to Treebank annotation (20)

Corpus study design
Corpus study designCorpus study design
Corpus study design
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
lexicography
lexicographylexicography
lexicography
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
 
W17 5406
W17 5406W17 5406
W17 5406
 
English kazakh parallel corpus for statistical machine translation
English kazakh parallel corpus for statistical machine translationEnglish kazakh parallel corpus for statistical machine translation
English kazakh parallel corpus for statistical machine translation
 
Document Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language StructureDocument Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language Structure
 
Document Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language StructureDocument Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language Structure
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY
 
Building of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemBuilding of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert System
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...
 
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYSTRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
 
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRYSTRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
STRESS TEST FOR BERT AND DEEP MODELS: PREDICTING WORDS FROM ITALIAN POETRY
 

Recently uploaded

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 

Recently uploaded (20)

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 

Treebank annotation

  • 1. Treebank Annotation By – Mohit Jasapara – 2012EEB1059 Aashish Kholiya – 2012MEB1083 1
  • 2. Treebank  The termtreebank was coined by linguist Geoffrey Leech in the 1980s because both syntactic and semantic structure are commonly represented compositionally as a tree structure.  In linguistics , a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure.  In simple words, treebanks are collections of manually checked syntactic analyses of sentences. 2
  • 4. Construction  Treebanks are often created on top of a corpus that has already been annotated with part-of-speech tags.  treebanks are sometimes enhanced with semantic or other linguistic information.  Treebanks can be created completely manually, where linguists annotate each sentence with syntactic structure, or semi-automatically, where a parser assigns some syntactic structure which linguists then check and, if necessary, correct 4
  • 5. Construction  In practice, fully checking and completing the parsing of natural language corpora is a labour-intensive project that can take teams of graduate linguists several years.  The level of annotation detail and the breadth of the linguistic sample determine the difficulty of the task and the length of time required to build a treebank. 5
  • 6. Construction  Some treebanks follow a specific linguistic theory in their syntactic annotation (e.g. the BulTreeBank follows HPSG) but most try to be less theory-specific. However, two main groups can be distinguished: treebanks that annotate phrase structure (for example the Penn Treebank or ICE-GB) and those that annotate dependency structure (for example the Prague Dependency Treebank or the Quranic Arabic Dependency Treebank). 6
  • 7. Construction  It is important to clarify the distinction between the formal representation and the file format used to store the annotated data.  Treebanks are necessarily constructed according to a particular grammar. The same grammar may be implemented by different file formats. 7
  • 8. Construction For example, the syntactic analysis for John loves Mary, shown in the figure on the right, may be represented by simple labelled brackets in a text file, like this (following the Penn Treebank notation): 8
  • 9. Construction  This type of representation is popular because it is light on resources, and the tree structure is relatively easy to read without software tools. However as corpora become increasingly complex, other file formats may be preferred. Alternatives include treebank-specific XML schemes, numbered indentation and various types of standoff notation. 9
  • 10. Applications Computational perspective  From a computational perspective, Treebank have been used to engineer state-of-the- art natural language processing systems such as part-of-speech taggers, parsers, semantic analyzers and machine translation systems.  Most computational systems utilize gold-standard Treebank data.  However, an automatically parsed corpus that is not corrected by human linguists can still be useful. 10
  • 11. Applications  It can provide evidence of rule frequency for a parser.  A parser may be improved by applying it to large amounts of text and gathering rule frequencies.  However, it should be obvious that only by a process of correcting and completing a corpus by hand is it possible then to identify rules absent from the parser knowledge base. In addition, frequencies are likely to be more accurate. 11
  • 12. Applications Corpus linguistics  In corpus linguistics, Treebank are used to study syntactic phenomena for example, diachronic corpora can be used to study the time course of syntactic change.  Once parsed, a corpus will contain frequency evidence showing how common different grammatical structures are in use.  Treebank also provide evidence of coverage and support the discovery of new, unanticipated, grammatical phenomena. . 12
  • 13. Applications  Interaction research is particularly fruitful as further layers of annotation, e.g. semantic, pragmatic, are added to a corpus.  It is then possible to evaluate the impact of non-syntactic phenomena on grammatical choices 13
  • 14. Applications Theoretical linguistics and Psycholinguistics  Another use of Treebank in theoretical linguistics and psycholinguistics is interaction evidence.  A completed Treebank can help linguists carry out experiments as to how the decision to use one grammatical construction tends to influence the decision to form others, and to try to understand how speakers and writers make decisions as they form sentences. 14
  • 15. Penn Treebank Project  The Penn Treebank Project annotates naturally-occurring text for linguistic structure.  Most notably, it produces skeletal parses showing rough syntactic and semantic information -- a bank of linguistic trees .  It also annotate text with part-of-speech tags, and for the Switchboard corpus of telephone conversations, dysfluency annotation.  It is located in the LINC Laboratory of the Computer and Information Science Department at the University of Pennsylvania. 15
  • 16. Penn Treebank Project  The Linguistic Data Consortium(LDC) provides tools and formats for creating and managing linguistic annotations.  `Linguistic annotation‘ covers any descriptive or analytic notations applied to raw language data.  The Penn Treebank is a human-annotated and partially `skeletally' parsed corpus consisting of over 4.5 million words of American English.  It includes the Brown Corpus (retagged) and the Wall Street Journal Corpus, as well as Department of Energy abstracts, Dow Jones Newswire stories, Department of Agriculture bulletins, Library of America texts, MUC-3 messages, IBM Manual sentences, WBUR radio transcripts, and ATIS sentences. 16
  • 17. 17
  • 18. 18
  • 19. References  http://en.wikipedia.org/wiki/Treebank  http://www.cis.upenn.edu/~treebank/  https://catalog.ldc.upenn.edu/LDC97S62  http://mshang.ca/syntree/  http://faculty.washington.edu/fxia/LAWVI/workshop_presentation_slides/special_se ssion/pml/  http://www.seas.upenn.edu/~pdtb/tools.shtml 19
  • 20. 20