SlideShare a Scribd company logo
Corpus Linguistics
Analytical Tools
Prepared By
Mr. Jitendra B. Patil
Assistant Professor of English
Pratap College Amalner
Dist – Jalgaon (Maharashtra)
Pin-425401 Mob.- 919421655091
Email- jitendrapca@gmail.com
 WORDCRUNCHER
 Used widely since1980
 Produced originally by Brigham Young University, Utah
 Can provide fast retrieval of large corpora
 Has two separate programs
 WC Index batch process – to index a text file or corpus
 Produces a series of annotated files
 Runs on plain ASCII file
 Early versions took about 20 minutes to index 100k files
 WC View runs as a menu to locate pre-indexed data
 Provides fast retrieval of all tokens of morphemes
 WC can provide many options for the amount of contexts
 From single to about fifty lines
 Good for rapid exploration of text
 Not flexible for sorting and formatting output of analyses
TACT
(TEXT ANALYSIS COMPUTING TOOLS)
 Research oriented software for corpus analyses
 Developed at University of Torranto
 First released in 1989
 a system of 15 programs for MS-DOS
 supports the extended ASCII character set of the IBM PC
 The TACT system is multilingual
 is designed to do text-retrieval and analysis on literary works
 is used to retrieve occurrences of a word, word pattern, or word combination
 Output-in the form of a concordance, a list, or a table
 can do simple kinds of analysis, such as sorted frequencies of letters, words
or phrases, type-token statistics
 is intended for individual literary texts, or small to mid-size groups of such
texts
 Processing a text with TACT normally begins with tagging or marking up an
ASCII copy of the text
 a text-editor to insert these tags, usually within diamond-bracket delimiters
 mark-up helps one to refine word-selections
 mark proper names (of people and places), episodes, date, location, audience,
narrative mode, theme, etc.
 four programs can be used: Preproc, Makedct, Tagtext, and Satdct, to add tags
to each word of the ASCII text
 with other font-editing tools, its capabilities can be extended to other modern
European languages, such as French, German, and Greek.
LEXA: Corpus Processing Software
 A set of programmes- to process linguistically relevant data
 is divided into several groups which perform typical functions
 the first of these-lexical analysis
 Lexa- allows one to tag and lemmatize any text or series of texts with a
minimum of effort.
 the user specifies what (possible) words are to be assigned to what lemmas
 flexibility in design is given highest priority
 flexibility:
 number of items- are user-determinable
 the structure of each programme as user-friendly
CBW: Corpus Workbench
 a widely-used architecture for corpus analysis
 originally designed at the IMS, University of Stuttgart
 consists of a set of tools for indexing, managing and querying very large corpora
with multiple layers of word-level annotation.
 CWB’s central component - Corpus Query Processor (CQP)
 (CQP)-
 an extremely powerful and efficient concordance system implementing a
flexible two-level search
 (CQP)-allows complex query patterns to be specified
 at the level of an individual word or annotation
 at the level of a fully- or partially-specified pattern of tokens
 Several key improvements were made to the CWB core:
 (i) support for multiple character sets Unicode (in the form of UTF-8)
 (ii) support for powerful Perl-style regular expressions in CQP queries, based
on the open-source PCRE library
 (CQP)-allows complex query patterns to be specified
 at the level of an individual word or annotation
 at the level of a fully- or partially-specified pattern of tokens
 Several key improvements were made to the CWB core:
 (i) support for multiple character sets Unicode (in the form of UTF-8)
 (ii) support for powerful Perl-style regular expressions in CQP queries, based
on the open-source PCRE library
 (iv) support for larger corpus sizes of up to 2 billion words on 64-bit
platforms.
 CWB, the IMS Open Corpus Workbench, is somewhat misleadingly named
 as it is not in any sense a comprehensive or general “workbench” for corpus
linguistics
 Instead, it is a powerful and flexible system for indexing and searching corpus
Data
 CWB actually consists of three different software packages:
 (i) the CWB core, including the low-level Corpus Library (CL), the CWB
utilities, and the Corpus Query Processor (CQP)
 (ii) the CWB/Perl interface – itself divided into three separate Perl packages,
namely CWB,4 CWB-CL and CWB-Web
 (iii) CQP web: is the most recent addition
MICROCONCORD
The type of computer-generated concordance produced by Micro Concord (the
KWIC, or "keyword-in-context" index) evolved in the late 1950s
Micro Concord searches the text of five plays in under a minute
a concordance program which has been developed specifically for the language
teacher/learner.
MicroConcord is a well-designed basic concordancer
useful for a variety of applications, and robustness and simplicity
Suitable for novices and for classroom use.
MicroConcord's user interface is simple and intuitive
the user specifies search word(s), a directory containing texts to be searched, and
the text files, with an option to select up to 500 files from 963 directories
THANK YOU !!

More Related Content

What's hot

Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsAlicia Ruiz
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
1101989
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
Alex Curtis
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...
RajpootBhatti5
 
Corpus Linguistics: An Introduction
Corpus Linguistics: An IntroductionCorpus Linguistics: An Introduction
Corpus Linguistics: An Introduction
Nanang Zubaidi
 
Corpus linguistics, ch6
Corpus linguistics, ch6Corpus linguistics, ch6
Corpus linguistics, ch6
VivaAs
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsRaul Vargas
 
corpus linguistics.pptx
corpus linguistics.pptxcorpus linguistics.pptx
corpus linguistics.pptx
Subramanian Mani
 
An Introduction to Applied Linguistics part 2
An Introduction to Applied Linguistics part 2An Introduction to Applied Linguistics part 2
An Introduction to Applied Linguistics part 2Samira Rahmdel
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
Prof.Ravindra Borse
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
Fatima Batool
 
Discourse and corpus
Discourse and corpusDiscourse and corpus
Discourse and corpus
Pascual Pérez-Paredes
 
General linguistics
General linguisticsGeneral linguistics
General linguistics
zhian asaad
 
Applied linguistics ppt
Applied linguistics pptApplied linguistics ppt
Applied linguistics ppt
KarimSamnani4
 
What can corpus software do? Routledge chpt 11
 What can corpus software do? Routledge chpt 11 What can corpus software do? Routledge chpt 11
What can corpus software do? Routledge chpt 11
RajpootBhatti5
 
Interlanguage errors
Interlanguage errorsInterlanguage errors
Interlanguage errorsShona Whyte
 
First language acquisition
First language acquisitionFirst language acquisition
First language acquisitionSilvia Borba
 
Background of English, its Spread, Functions & Status
Background of English, its Spread, Functions & StatusBackground of English, its Spread, Functions & Status
Background of English, its Spread, Functions & Status
Ali Soomro
 

What's hot (20)

Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
 
Types of corpus linguistics Parallel ,aligned...
 Types of corpus linguistics Parallel ,aligned... Types of corpus linguistics Parallel ,aligned...
Types of corpus linguistics Parallel ,aligned...
 
Corpus Linguistics: An Introduction
Corpus Linguistics: An IntroductionCorpus Linguistics: An Introduction
Corpus Linguistics: An Introduction
 
Corpus linguistics, ch6
Corpus linguistics, ch6Corpus linguistics, ch6
Corpus linguistics, ch6
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
corpus linguistics.pptx
corpus linguistics.pptxcorpus linguistics.pptx
corpus linguistics.pptx
 
An Introduction to Applied Linguistics part 2
An Introduction to Applied Linguistics part 2An Introduction to Applied Linguistics part 2
An Introduction to Applied Linguistics part 2
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Antconc
AntconcAntconc
Antconc
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Discourse and corpus
Discourse and corpusDiscourse and corpus
Discourse and corpus
 
Theoretical linguistics report
Theoretical linguistics reportTheoretical linguistics report
Theoretical linguistics report
 
General linguistics
General linguisticsGeneral linguistics
General linguistics
 
Applied linguistics ppt
Applied linguistics pptApplied linguistics ppt
Applied linguistics ppt
 
What can corpus software do? Routledge chpt 11
 What can corpus software do? Routledge chpt 11 What can corpus software do? Routledge chpt 11
What can corpus software do? Routledge chpt 11
 
Interlanguage errors
Interlanguage errorsInterlanguage errors
Interlanguage errors
 
First language acquisition
First language acquisitionFirst language acquisition
First language acquisition
 
Background of English, its Spread, Functions & Status
Background of English, its Spread, Functions & StatusBackground of English, its Spread, Functions & Status
Background of English, its Spread, Functions & Status
 

Similar to Corpus Linguistics :Analytical Tools

Antconc
AntconcAntconc
Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming languageVasavi College of Engg
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
Prof. Wim Van Criekinge
 
Language translators
Language translatorsLanguage translators
Language translatorsAditya Sharat
 
ABCD Open Source Software for managing ETD repositories
ABCD Open Source Software for managing ETD repositoriesABCD Open Source Software for managing ETD repositories
ABCD Open Source Software for managing ETD repositories
sangeetadhamdhere
 
Corpus studio Erwin Komen
Corpus studio Erwin KomenCorpus studio Erwin Komen
Corpus studio Erwin Komen
CLARIAH
 
Lecture 1 introduction to language processors
Lecture 1  introduction to language processorsLecture 1  introduction to language processors
Lecture 1 introduction to language processors
Rebaz Najeeb
 
Compiler_Lecture1.pdf
Compiler_Lecture1.pdfCompiler_Lecture1.pdf
Compiler_Lecture1.pdf
AkarTaher
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
Istvan Rath
 
Jayse farrell resume
Jayse farrell resumeJayse farrell resume
Jayse farrell resume
Jayse Farrell
 
Programming languages in bioinformatics by dr. jayarama reddy
Programming languages in bioinformatics by dr. jayarama reddyProgramming languages in bioinformatics by dr. jayarama reddy
Programming languages in bioinformatics by dr. jayarama reddy
Dr. Jayarama Reddy
 
Ant conc ~design & development of a freeware
Ant conc ~design & development of a freewareAnt conc ~design & development of a freeware
Ant conc ~design & development of a freewaresarahannelazarus
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
kaveirious
 
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof Chethan Raj C
 
BCA IPU VB.NET UNIT-I
BCA IPU VB.NET UNIT-IBCA IPU VB.NET UNIT-I
BCA IPU VB.NET UNIT-I
Vaibhavj1234
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
Christian Frech
 
8 Open-Source Concept Coded Graphic Symbol support in OpenOffice.org
8 Open-Source Concept Coded Graphic Symbol support in OpenOffice.org8 Open-Source Concept Coded Graphic Symbol support in OpenOffice.org
8 Open-Source Concept Coded Graphic Symbol support in OpenOffice.org
AEGIS-ACCESSIBLE Projects
 

Similar to Corpus Linguistics :Analytical Tools (20)

Antconc
AntconcAntconc
Antconc
 
Ant conc notes
Ant conc notesAnt conc notes
Ant conc notes
 
Unit1 principle of programming language
Unit1 principle of programming languageUnit1 principle of programming language
Unit1 principle of programming language
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
 
Language translators
Language translatorsLanguage translators
Language translators
 
ABCD Open Source Software for managing ETD repositories
ABCD Open Source Software for managing ETD repositoriesABCD Open Source Software for managing ETD repositories
ABCD Open Source Software for managing ETD repositories
 
methods and resources
methods and resourcesmethods and resources
methods and resources
 
Corpus studio Erwin Komen
Corpus studio Erwin KomenCorpus studio Erwin Komen
Corpus studio Erwin Komen
 
Lecture 1 introduction to language processors
Lecture 1  introduction to language processorsLecture 1  introduction to language processors
Lecture 1 introduction to language processors
 
Compiler_Lecture1.pdf
Compiler_Lecture1.pdfCompiler_Lecture1.pdf
Compiler_Lecture1.pdf
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
 
Jayse farrell resume
Jayse farrell resumeJayse farrell resume
Jayse farrell resume
 
Programming languages in bioinformatics by dr. jayarama reddy
Programming languages in bioinformatics by dr. jayarama reddyProgramming languages in bioinformatics by dr. jayarama reddy
Programming languages in bioinformatics by dr. jayarama reddy
 
Ant conc ~design & development of a freeware
Ant conc ~design & development of a freewareAnt conc ~design & development of a freeware
Ant conc ~design & development of a freeware
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
 
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
Prof. Chethan Raj C, BE, M.Tech (Ph.D) Dept. of CSE. System Software & Operat...
 
BCA IPU VB.NET UNIT-I
BCA IPU VB.NET UNIT-IBCA IPU VB.NET UNIT-I
BCA IPU VB.NET UNIT-I
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
 
Unixshellscript 100406085942-phpapp02
Unixshellscript 100406085942-phpapp02Unixshellscript 100406085942-phpapp02
Unixshellscript 100406085942-phpapp02
 
8 Open-Source Concept Coded Graphic Symbol support in OpenOffice.org
8 Open-Source Concept Coded Graphic Symbol support in OpenOffice.org8 Open-Source Concept Coded Graphic Symbol support in OpenOffice.org
8 Open-Source Concept Coded Graphic Symbol support in OpenOffice.org
 

Recently uploaded

June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 

Recently uploaded (20)

June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 

Corpus Linguistics :Analytical Tools

  • 2. Prepared By Mr. Jitendra B. Patil Assistant Professor of English Pratap College Amalner Dist – Jalgaon (Maharashtra) Pin-425401 Mob.- 919421655091 Email- jitendrapca@gmail.com
  • 4.  Used widely since1980  Produced originally by Brigham Young University, Utah  Can provide fast retrieval of large corpora  Has two separate programs  WC Index batch process – to index a text file or corpus  Produces a series of annotated files  Runs on plain ASCII file  Early versions took about 20 minutes to index 100k files
  • 5.  WC View runs as a menu to locate pre-indexed data  Provides fast retrieval of all tokens of morphemes  WC can provide many options for the amount of contexts  From single to about fifty lines  Good for rapid exploration of text  Not flexible for sorting and formatting output of analyses
  • 7.  Research oriented software for corpus analyses  Developed at University of Torranto  First released in 1989  a system of 15 programs for MS-DOS  supports the extended ASCII character set of the IBM PC  The TACT system is multilingual  is designed to do text-retrieval and analysis on literary works
  • 8.  is used to retrieve occurrences of a word, word pattern, or word combination  Output-in the form of a concordance, a list, or a table  can do simple kinds of analysis, such as sorted frequencies of letters, words or phrases, type-token statistics  is intended for individual literary texts, or small to mid-size groups of such texts  Processing a text with TACT normally begins with tagging or marking up an ASCII copy of the text
  • 9.  a text-editor to insert these tags, usually within diamond-bracket delimiters  mark-up helps one to refine word-selections  mark proper names (of people and places), episodes, date, location, audience, narrative mode, theme, etc.  four programs can be used: Preproc, Makedct, Tagtext, and Satdct, to add tags to each word of the ASCII text  with other font-editing tools, its capabilities can be extended to other modern European languages, such as French, German, and Greek.
  • 11.  A set of programmes- to process linguistically relevant data  is divided into several groups which perform typical functions  the first of these-lexical analysis  Lexa- allows one to tag and lemmatize any text or series of texts with a minimum of effort.  the user specifies what (possible) words are to be assigned to what lemmas  flexibility in design is given highest priority
  • 12.  flexibility:  number of items- are user-determinable  the structure of each programme as user-friendly
  • 14.  a widely-used architecture for corpus analysis  originally designed at the IMS, University of Stuttgart  consists of a set of tools for indexing, managing and querying very large corpora with multiple layers of word-level annotation.  CWB’s central component - Corpus Query Processor (CQP)  (CQP)-  an extremely powerful and efficient concordance system implementing a flexible two-level search
  • 15.  (CQP)-allows complex query patterns to be specified  at the level of an individual word or annotation  at the level of a fully- or partially-specified pattern of tokens  Several key improvements were made to the CWB core:  (i) support for multiple character sets Unicode (in the form of UTF-8)  (ii) support for powerful Perl-style regular expressions in CQP queries, based on the open-source PCRE library  (CQP)-allows complex query patterns to be specified
  • 16.  at the level of an individual word or annotation  at the level of a fully- or partially-specified pattern of tokens  Several key improvements were made to the CWB core:  (i) support for multiple character sets Unicode (in the form of UTF-8)  (ii) support for powerful Perl-style regular expressions in CQP queries, based on the open-source PCRE library  (iv) support for larger corpus sizes of up to 2 billion words on 64-bit platforms.
  • 17.  CWB, the IMS Open Corpus Workbench, is somewhat misleadingly named  as it is not in any sense a comprehensive or general “workbench” for corpus linguistics  Instead, it is a powerful and flexible system for indexing and searching corpus Data  CWB actually consists of three different software packages:  (i) the CWB core, including the low-level Corpus Library (CL), the CWB utilities, and the Corpus Query Processor (CQP)
  • 18.  (ii) the CWB/Perl interface – itself divided into three separate Perl packages, namely CWB,4 CWB-CL and CWB-Web  (iii) CQP web: is the most recent addition
  • 20. The type of computer-generated concordance produced by Micro Concord (the KWIC, or "keyword-in-context" index) evolved in the late 1950s Micro Concord searches the text of five plays in under a minute a concordance program which has been developed specifically for the language teacher/learner. MicroConcord is a well-designed basic concordancer useful for a variety of applications, and robustness and simplicity Suitable for novices and for classroom use.
  • 21. MicroConcord's user interface is simple and intuitive the user specifies search word(s), a directory containing texts to be searched, and the text files, with an option to select up to 500 files from 963 directories