Corpus linguistic
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
71
On Slideshare
70
From Embeds
1
Number of Embeds
1

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 1

http://farahjef.wordpress.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. FARAH DIYANA BINTI AHMAD JEFIRUDDIN
  • 2. Study of language as expressed in samples (corpora) or "real world" text. DEFINITION
  • 3. KUCERA AND W. NELSON FRANCIS -publish Computational Analysis of Present-Day American English (1967) -contains a variety of computational analyses, combining elements of linguistics, language teaching, psychology, statistics, and sociology RANDOLPH QUIRK -publish Towards a description of English Usage' (1960) in which he introduced The Survey of English Usage. HISTORY
  • 4. HOUGHTON-MIFFLIN - publish American Heritage Dictionary (first dictionary to be compiled using corpus linguistics) -supply a million word, three-line citation base for the dictionary - AHD combines prescriptive elements with descriptive information. COLLINS - publish COBUILD monolingual learner's dictionary - designed for users learning English as a foreign language, (compiled using the Bank of English) -The Survey of English Usage Corpus was used in the development of the Comprehensive Grammar of English
  • 5. MONTREAL FRENCH PROJECT - The first computerized corpus of transcribed spoken language - contains one million words ANDERSEN-FORBES - is a computerized corpora - database of the Hebrew Bible - every clause is parsed using graphs representing seven levels of syntax, and each segment are tagged with seven fields of information THE QURANIC ARABIC CORPUS - an annotated corpus for the Classical Arabic language of the Quran - recent project with multiple layers of annotation including morphological segmentation, part-of- speech tagging, and syntactic analysis using dependency grammar
  • 6. METHODS 1) Annotation 2) Abstraction 3) Analysis METHODS
  • 7. Annotation consists of the application of a scheme to texts. Annotations may include structural mark-up, part-of- speech tagging, parsing, and numerous other representations. 1) ANNOTATION
  • 8. Abstraction consists of the translation (mapping) of terms in the scheme to terms in a theoretically motivated model or dataset. It typically includes linguist-directed search but may include e.g., rule-learning for parsers. 2) ABSTRACTION
  • 9. Analysis consists of statistically probing, manipulating and generalising from the dataset. Might include statistical evaluations, optimisation of rule-bases or knowledge discovery methods. 3) ANALYSIS