SlideShare a Scribd company logo
1 of 22
Corpus Linguistics
Analytical Tools
Prepared By
Mr. Jitendra B. Patil
Assistant Professor of English
Pratap College Amalner
Dist – Jalgaon (Maharashtra)
Pin-425401 Mob.- 919421655091
Email- jitendrapca@gmail.com
 WORDCRUNCHER
 Used widely since1980
 Produced originally by Brigham Young University, Utah
 Can provide fast retrieval of large corpora
 Has two separate programs
 WC Index batch process – to index a text file or corpus
 Produces a series of annotated files
 Runs on plain ASCII file
 Early versions took about 20 minutes to index 100k files
 WC View runs as a menu to locate pre-indexed data
 Provides fast retrieval of all tokens of morphemes
 WC can provide many options for the amount of contexts
 From single to about fifty lines
 Good for rapid exploration of text
 Not flexible for sorting and formatting output of analyses
TACT
(TEXT ANALYSIS COMPUTING TOOLS)
 Research oriented software for corpus analyses
 Developed at University of Torranto
 First released in 1989
 a system of 15 programs for MS-DOS
 supports the extended ASCII character set of the IBM PC
 The TACT system is multilingual
 is designed to do text-retrieval and analysis on literary works
 is used to retrieve occurrences of a word, word pattern, or word combination
 Output-in the form of a concordance, a list, or a table
 can do simple kinds of analysis, such as sorted frequencies of letters, words
or phrases, type-token statistics
 is intended for individual literary texts, or small to mid-size groups of such
texts
 Processing a text with TACT normally begins with tagging or marking up an
ASCII copy of the text
 a text-editor to insert these tags, usually within diamond-bracket delimiters
 mark-up helps one to refine word-selections
 mark proper names (of people and places), episodes, date, location, audience,
narrative mode, theme, etc.
 four programs can be used: Preproc, Makedct, Tagtext, and Satdct, to add tags
to each word of the ASCII text
 with other font-editing tools, its capabilities can be extended to other modern
European languages, such as French, German, and Greek.
LEXA: Corpus Processing Software
 A set of programmes- to process linguistically relevant data
 is divided into several groups which perform typical functions
 the first of these-lexical analysis
 Lexa- allows one to tag and lemmatize any text or series of texts with a
minimum of effort.
 the user specifies what (possible) words are to be assigned to what lemmas
 flexibility in design is given highest priority
 flexibility:
 number of items- are user-determinable
 the structure of each programme as user-friendly
CBW: Corpus Workbench
 a widely-used architecture for corpus analysis
 originally designed at the IMS, University of Stuttgart
 consists of a set of tools for indexing, managing and querying very large corpora
with multiple layers of word-level annotation.
 CWB’s central component - Corpus Query Processor (CQP)
 (CQP)-
 an extremely powerful and efficient concordance system implementing a
flexible two-level search
 (CQP)-allows complex query patterns to be specified
 at the level of an individual word or annotation
 at the level of a fully- or partially-specified pattern of tokens
 Several key improvements were made to the CWB core:
 (i) support for multiple character sets Unicode (in the form of UTF-8)
 (ii) support for powerful Perl-style regular expressions in CQP queries, based
on the open-source PCRE library
 (CQP)-allows complex query patterns to be specified
 at the level of an individual word or annotation
 at the level of a fully- or partially-specified pattern of tokens
 Several key improvements were made to the CWB core:
 (i) support for multiple character sets Unicode (in the form of UTF-8)
 (ii) support for powerful Perl-style regular expressions in CQP queries, based
on the open-source PCRE library
 (iv) support for larger corpus sizes of up to 2 billion words on 64-bit
platforms.
 CWB, the IMS Open Corpus Workbench, is somewhat misleadingly named
 as it is not in any sense a comprehensive or general “workbench” for corpus
linguistics
 Instead, it is a powerful and flexible system for indexing and searching corpus
Data
 CWB actually consists of three different software packages:
 (i) the CWB core, including the low-level Corpus Library (CL), the CWB
utilities, and the Corpus Query Processor (CQP)
 (ii) the CWB/Perl interface – itself divided into three separate Perl packages,
namely CWB,4 CWB-CL and CWB-Web
 (iii) CQP web: is the most recent addition
MICROCONCORD
The type of computer-generated concordance produced by Micro Concord (the
KWIC, or "keyword-in-context" index) evolved in the late 1950s
Micro Concord searches the text of five plays in under a minute
a concordance program which has been developed specifically for the language
teacher/learner.
MicroConcord is a well-designed basic concordancer
useful for a variety of applications, and robustness and simplicity
Suitable for novices and for classroom use.
MicroConcord's user interface is simple and intuitive
the user specifies search word(s), a directory containing texts to be searched, and
the text files, with an option to select up to 500 files from 963 directories
THANK YOU !!

More Related Content

What's hot

Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsAdnanBaloch15
 
Discourse Analysis (Intro to Linguistics)
Discourse Analysis (Intro to Linguistics)Discourse Analysis (Intro to Linguistics)
Discourse Analysis (Intro to Linguistics)Sabilla Ramadhani
 
Translation studies....
Translation studies....Translation studies....
Translation studies....AdnanBaloch15
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsIrum Malik
 
General linguistics
General linguisticsGeneral linguistics
General linguisticszhian asaad
 
Applied linguisticss
Applied linguisticssApplied linguisticss
Applied linguisticssAprian0704
 
Linguistic inequality ppt
Linguistic inequality pptLinguistic inequality ppt
Linguistic inequality pptzhian fadhil
 
Language Attitude by Karahan F
Language Attitude by Karahan FLanguage Attitude by Karahan F
Language Attitude by Karahan FSuhana Ahmad
 
Sociolinguistics Speech Communities
Sociolinguistics Speech CommunitiesSociolinguistics Speech Communities
Sociolinguistics Speech CommunitiesWildan Al-Qudsy
 
Chapter 3 The Process of Translation Chapter 3
Chapter 3 The Process of Translation Chapter 3Chapter 3 The Process of Translation Chapter 3
Chapter 3 The Process of Translation Chapter 3Ivet Sanchez
 
Discourse Analysis
Discourse AnalysisDiscourse Analysis
Discourse AnalysisAyesha Mir
 
Theory of translation
Theory of translationTheory of translation
Theory of translationytsogzolmaa
 
Sense relations (linguistics)
Sense relations (linguistics)Sense relations (linguistics)
Sense relations (linguistics)Erick Mwacha
 
Discourse and conversation
Discourse and conversationDiscourse and conversation
Discourse and conversationbrightmoon90900
 
Noam chomsky and generative grammar
Noam chomsky and generative grammarNoam chomsky and generative grammar
Noam chomsky and generative grammarAsia Fareed
 

What's hot (20)

Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Discourse Analysis (Intro to Linguistics)
Discourse Analysis (Intro to Linguistics)Discourse Analysis (Intro to Linguistics)
Discourse Analysis (Intro to Linguistics)
 
Translation studies....
Translation studies....Translation studies....
Translation studies....
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
General linguistics
General linguisticsGeneral linguistics
General linguistics
 
Applied linguisticss
Applied linguisticssApplied linguisticss
Applied linguisticss
 
Linguistic inequality ppt
Linguistic inequality pptLinguistic inequality ppt
Linguistic inequality ppt
 
Language Attitude by Karahan F
Language Attitude by Karahan FLanguage Attitude by Karahan F
Language Attitude by Karahan F
 
Foregrounding By Muhammad Azam
Foregrounding By Muhammad AzamForegrounding By Muhammad Azam
Foregrounding By Muhammad Azam
 
Sociolinguistics Speech Communities
Sociolinguistics Speech CommunitiesSociolinguistics Speech Communities
Sociolinguistics Speech Communities
 
Chapter 3 The Process of Translation Chapter 3
Chapter 3 The Process of Translation Chapter 3Chapter 3 The Process of Translation Chapter 3
Chapter 3 The Process of Translation Chapter 3
 
Applied Linguistics
Applied LinguisticsApplied Linguistics
Applied Linguistics
 
Discourse Analysis
Discourse AnalysisDiscourse Analysis
Discourse Analysis
 
Theory of translation
Theory of translationTheory of translation
Theory of translation
 
Conversational Structure
Conversational StructureConversational Structure
Conversational Structure
 
Sense relations (linguistics)
Sense relations (linguistics)Sense relations (linguistics)
Sense relations (linguistics)
 
Translation periods By Christine Joanne Librero-Desacado
Translation periods   By Christine Joanne Librero-DesacadoTranslation periods   By Christine Joanne Librero-Desacado
Translation periods By Christine Joanne Librero-Desacado
 
Discourse and conversation
Discourse and conversationDiscourse and conversation
Discourse and conversation
 
Noam chomsky and generative grammar
Noam chomsky and generative grammarNoam chomsky and generative grammar
Noam chomsky and generative grammar
 
Discourse Analysis
Discourse Analysis Discourse Analysis
Discourse Analysis
 

Similar to Corpus Linguistics :Analytical Tools

Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming languageVasavi College of Engg
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinalProf. Wim Van Criekinge
 
Language translators
Language translatorsLanguage translators
Language translatorsAditya Sharat
 
ABCD Open Source Software for managing ETD repositories
ABCD Open Source Software for managing ETD repositoriesABCD Open Source Software for managing ETD repositories
ABCD Open Source Software for managing ETD repositoriessangeetadhamdhere
 
Corpus studio Erwin Komen
Corpus studio Erwin KomenCorpus studio Erwin Komen
Corpus studio Erwin KomenCLARIAH
 
Lecture 1 introduction to language processors
Lecture 1  introduction to language processorsLecture 1  introduction to language processors
Lecture 1 introduction to language processorsRebaz Najeeb
 
Compiler_Lecture1.pdf
Compiler_Lecture1.pdfCompiler_Lecture1.pdf
Compiler_Lecture1.pdfAkarTaher
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling WorldsIstvan Rath
 
Jayse farrell resume
Jayse farrell resumeJayse farrell resume
Jayse farrell resumeJayse Farrell
 
Programming languages in bioinformatics by dr. jayarama reddy
Programming languages in bioinformatics by dr. jayarama reddyProgramming languages in bioinformatics by dr. jayarama reddy
Programming languages in bioinformatics by dr. jayarama reddyDr. Jayarama Reddy
 
Ant conc ~design & development of a freeware
Ant conc ~design & development of a freewareAnt conc ~design & development of a freeware
Ant conc ~design & development of a freewaresarahannelazarus
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructurekaveirious
 
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...Prof Chethan Raj C
 
BCA IPU VB.NET UNIT-I
BCA IPU VB.NET UNIT-IBCA IPU VB.NET UNIT-I
BCA IPU VB.NET UNIT-IVaibhavj1234
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformaticianChristian Frech
 

Similar to Corpus Linguistics :Analytical Tools (20)

Antconc
AntconcAntconc
Antconc
 
Antconc
AntconcAntconc
Antconc
 
Ant conc notes
Ant conc notesAnt conc notes
Ant conc notes
 
Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming language
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
 
Language translators
Language translatorsLanguage translators
Language translators
 
ABCD Open Source Software for managing ETD repositories
ABCD Open Source Software for managing ETD repositoriesABCD Open Source Software for managing ETD repositories
ABCD Open Source Software for managing ETD repositories
 
methods and resources
methods and resourcesmethods and resources
methods and resources
 
Corpus studio Erwin Komen
Corpus studio Erwin KomenCorpus studio Erwin Komen
Corpus studio Erwin Komen
 
Lecture 1 introduction to language processors
Lecture 1  introduction to language processorsLecture 1  introduction to language processors
Lecture 1 introduction to language processors
 
Compiler_Lecture1.pdf
Compiler_Lecture1.pdfCompiler_Lecture1.pdf
Compiler_Lecture1.pdf
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
 
Jayse farrell resume
Jayse farrell resumeJayse farrell resume
Jayse farrell resume
 
Programming languages in bioinformatics by dr. jayarama reddy
Programming languages in bioinformatics by dr. jayarama reddyProgramming languages in bioinformatics by dr. jayarama reddy
Programming languages in bioinformatics by dr. jayarama reddy
 
Ant conc ~design & development of a freeware
Ant conc ~design & development of a freewareAnt conc ~design & development of a freeware
Ant conc ~design & development of a freeware
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
 
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
 
BCA IPU VB.NET UNIT-I
BCA IPU VB.NET UNIT-IBCA IPU VB.NET UNIT-I
BCA IPU VB.NET UNIT-I
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
 
Unixshellscript 100406085942-phpapp02
Unixshellscript 100406085942-phpapp02Unixshellscript 100406085942-phpapp02
Unixshellscript 100406085942-phpapp02
 

Recently uploaded

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 

Recently uploaded (20)

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 

Corpus Linguistics :Analytical Tools

  • 2. Prepared By Mr. Jitendra B. Patil Assistant Professor of English Pratap College Amalner Dist – Jalgaon (Maharashtra) Pin-425401 Mob.- 919421655091 Email- jitendrapca@gmail.com
  • 4.  Used widely since1980  Produced originally by Brigham Young University, Utah  Can provide fast retrieval of large corpora  Has two separate programs  WC Index batch process – to index a text file or corpus  Produces a series of annotated files  Runs on plain ASCII file  Early versions took about 20 minutes to index 100k files
  • 5.  WC View runs as a menu to locate pre-indexed data  Provides fast retrieval of all tokens of morphemes  WC can provide many options for the amount of contexts  From single to about fifty lines  Good for rapid exploration of text  Not flexible for sorting and formatting output of analyses
  • 7.  Research oriented software for corpus analyses  Developed at University of Torranto  First released in 1989  a system of 15 programs for MS-DOS  supports the extended ASCII character set of the IBM PC  The TACT system is multilingual  is designed to do text-retrieval and analysis on literary works
  • 8.  is used to retrieve occurrences of a word, word pattern, or word combination  Output-in the form of a concordance, a list, or a table  can do simple kinds of analysis, such as sorted frequencies of letters, words or phrases, type-token statistics  is intended for individual literary texts, or small to mid-size groups of such texts  Processing a text with TACT normally begins with tagging or marking up an ASCII copy of the text
  • 9.  a text-editor to insert these tags, usually within diamond-bracket delimiters  mark-up helps one to refine word-selections  mark proper names (of people and places), episodes, date, location, audience, narrative mode, theme, etc.  four programs can be used: Preproc, Makedct, Tagtext, and Satdct, to add tags to each word of the ASCII text  with other font-editing tools, its capabilities can be extended to other modern European languages, such as French, German, and Greek.
  • 11.  A set of programmes- to process linguistically relevant data  is divided into several groups which perform typical functions  the first of these-lexical analysis  Lexa- allows one to tag and lemmatize any text or series of texts with a minimum of effort.  the user specifies what (possible) words are to be assigned to what lemmas  flexibility in design is given highest priority
  • 12.  flexibility:  number of items- are user-determinable  the structure of each programme as user-friendly
  • 14.  a widely-used architecture for corpus analysis  originally designed at the IMS, University of Stuttgart  consists of a set of tools for indexing, managing and querying very large corpora with multiple layers of word-level annotation.  CWB’s central component - Corpus Query Processor (CQP)  (CQP)-  an extremely powerful and efficient concordance system implementing a flexible two-level search
  • 15.  (CQP)-allows complex query patterns to be specified  at the level of an individual word or annotation  at the level of a fully- or partially-specified pattern of tokens  Several key improvements were made to the CWB core:  (i) support for multiple character sets Unicode (in the form of UTF-8)  (ii) support for powerful Perl-style regular expressions in CQP queries, based on the open-source PCRE library  (CQP)-allows complex query patterns to be specified
  • 16.  at the level of an individual word or annotation  at the level of a fully- or partially-specified pattern of tokens  Several key improvements were made to the CWB core:  (i) support for multiple character sets Unicode (in the form of UTF-8)  (ii) support for powerful Perl-style regular expressions in CQP queries, based on the open-source PCRE library  (iv) support for larger corpus sizes of up to 2 billion words on 64-bit platforms.
  • 17.  CWB, the IMS Open Corpus Workbench, is somewhat misleadingly named  as it is not in any sense a comprehensive or general “workbench” for corpus linguistics  Instead, it is a powerful and flexible system for indexing and searching corpus Data  CWB actually consists of three different software packages:  (i) the CWB core, including the low-level Corpus Library (CL), the CWB utilities, and the Corpus Query Processor (CQP)
  • 18.  (ii) the CWB/Perl interface – itself divided into three separate Perl packages, namely CWB,4 CWB-CL and CWB-Web  (iii) CQP web: is the most recent addition
  • 20. The type of computer-generated concordance produced by Micro Concord (the KWIC, or "keyword-in-context" index) evolved in the late 1950s Micro Concord searches the text of five plays in under a minute a concordance program which has been developed specifically for the language teacher/learner. MicroConcord is a well-designed basic concordancer useful for a variety of applications, and robustness and simplicity Suitable for novices and for classroom use.
  • 21. MicroConcord's user interface is simple and intuitive the user specifies search word(s), a directory containing texts to be searched, and the text files, with an option to select up to 500 files from 963 directories