Corpus linguistic

379 views
271 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
379
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
6
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Corpus linguistic

  1. 1. FARAH DIYANA BINTI AHMAD JEFIRUDDIN
  2. 2. Study of language as expressed in samples (corpora) or "real world" text. DEFINITION
  3. 3. KUCERA AND W. NELSON FRANCIS -publish Computational Analysis of Present-Day American English (1967) -contains a variety of computational analyses, combining elements of linguistics, language teaching, psychology, statistics, and sociology RANDOLPH QUIRK -publish Towards a description of English Usage' (1960) in which he introduced The Survey of English Usage. HISTORY
  4. 4. HOUGHTON-MIFFLIN - publish American Heritage Dictionary (first dictionary to be compiled using corpus linguistics) -supply a million word, three-line citation base for the dictionary - AHD combines prescriptive elements with descriptive information. COLLINS - publish COBUILD monolingual learner's dictionary - designed for users learning English as a foreign language, (compiled using the Bank of English) -The Survey of English Usage Corpus was used in the development of the Comprehensive Grammar of English
  5. 5. MONTREAL FRENCH PROJECT - The first computerized corpus of transcribed spoken language - contains one million words ANDERSEN-FORBES - is a computerized corpora - database of the Hebrew Bible - every clause is parsed using graphs representing seven levels of syntax, and each segment are tagged with seven fields of information THE QURANIC ARABIC CORPUS - an annotated corpus for the Classical Arabic language of the Quran - recent project with multiple layers of annotation including morphological segmentation, part-of- speech tagging, and syntactic analysis using dependency grammar
  6. 6. METHODS 1) Annotation 2) Abstraction 3) Analysis METHODS
  7. 7. Annotation consists of the application of a scheme to texts. Annotations may include structural mark-up, part-of- speech tagging, parsing, and numerous other representations. 1) ANNOTATION
  8. 8. Abstraction consists of the translation (mapping) of terms in the scheme to terms in a theoretically motivated model or dataset. It typically includes linguist-directed search but may include e.g., rule-learning for parsers. 2) ABSTRACTION
  9. 9. Analysis consists of statistically probing, manipulating and generalising from the dataset. Might include statistical evaluations, optimisation of rule-bases or knowledge discovery methods. 3) ANALYSIS

×