SlideShare a Scribd company logo
1 of 43
BNC and its Online Use
Corpus and Famous
Corpora
Presenter
Memoona Butt
Roll No. 02
corpus
•A corpus can be defined
as a systematic collection
of naturally occurring
text in electronic form.
Corpus linguistics
• Corpus linguistics is the study of
language/linguistic phenomena
through the analysis of data
obtained from a corpus.
• Corpus linguistic is the analysis of
text with the help of computer,
i.e. with specialized software.
•A corpus is always designed for a
particular purpose, the usefulness of a
ready made corpus must be judged
with regard to the purpose to which a
user intends to put it.
Famous corpora
•The Brown Corpus
•The Lancaster-Oslo/Bergen
•The London Lund Corpus
•The British National Corpus
The Brown Corpus
• The Brown Corpus of Standard
American English was the first of
the modern, computer readable,
general corpora. The corpus
consists of one million words of
American English texts printed in
1961.
The Lancaster-
Oslo/Bergen
• The Lancaster-Oslo/Bergen Corpus
is a million word collection of British
English texts which was compiled in
the 1970s in collaboration between
the University of Lancaster, The
University of Oslo, and the
Norwegian Computing Center for
the Humanities, Bergen.
The London Lund
Corpus
• The London Lund Corpus of
English derives from two projects:
the Survey of English Usage at
University College London and the
Survey of Spoken English, which was
started at Lund University in 1975.
the corpus consists of 500,000 words
of spoken British English.
The British National Corpus
• The British National Corpus is a
100 million collection of samples of
written and spoken language from a
wide range of sources, designed to
represent a wide cross-section of
British English from the later part of
the 20th century.
Creation of BNC:
• The project was developed by an
academic consortium called BNC
consortium.
• An industrial/academic consortium
lead by Oxford University press of
which the members are more
dictionary publishers.
• The Consortium was formed in
1990 and started work in 1991 on
the three year task of producing a
hundred million word corpus of
modern British English for use in
commercial and academic research.
All major decisions regarding BNC
are still made by them.
•The BNC comprises
approximately 100 million
words of
•Written texts (90%)
•Transcripts of speech (10%)
Why we use BNC
• BNC can be used to know about aspects we
did not know about a word and to check our
thoughts about its meaning. Moreover, the
corpus can help to find out the meaning of a
word not just what we think it means. We can
use BNC to check either a word is a part of
BNC or not.
Properties of British
National Corpus
Presented by:-
Hadia Tabassum
Bnc is a sample of 100 million
words including spoken and
written Britain English. It is a
balanced and finite corpus that
contains approximately 90%
written data and 10%spoken
data.
Features of British National Corpus
Spoken componentsdata in BNC:
Spoken
compone
nts
The
conversa
tion part
Task
oriented
part
The conversational part:-
• This part is largely based on recordings of every
day conversation interaction engaged in by some
127 adults aged 15 and over. Some additional
recording of under fifteen were included from
COLT. The volunteers were selected according to
demographic area of age, social group, and sex
with the aim of obtaining approximately equal
number in each group. well, conversational part
make up just over 40% of the spoken corpus.
Respondents in ‘’conversational part”
were selected according to following
properties;
Age Social
group
Sex Percenta
ge
Under
fifteen
Upper
class
Male 41.14
15-24 Middle
class Female
58.47
24-34 Lower
class
Unclassi
fied
0.38
The task oriented part:
In this material was intended to represent
those types of task oriented spoken activity
that were unlikely to be recorded by
conversational volunteers during a typical day
in their lives. e.g. Lectures, consultations,
sermons, T.V/radio broadcasting etc and this
part contains 60% of spoken corpus.
The written components:
Written
components
imaginative
Mostly
fiction
informative
Non
fictional
Continued…..
Imaginative text account for 20% and
informative text about 80% in written
components. the imaginative text are divided
into further categories prose, poetry etc. on the
other hand informative data is subdivided into
eight categories.
1.Arts 2.Natural sciences
3.Commerce 4.Applied sciences
5.Leisure 6.Social sciences
7.Beliefs and arts 8. World affairs
Abbreviations and acronyms:
BNC provides us the same abbreviated
sequence in many different ways such as
P.C, PC, P.C although the same forms
reflect different origins .(police
Constable, postcard, personal computer)
Monolingual:
Although BNC include many
different styles, verities and genera
yet it deal with only modern British
English and not with other
languages used in Britain.
Synchronic:
BNC Covers British English of the late twentieth
century ,rather than the historical development
which produced it. it is updated time by time or
with the passage of time
Editions of
BNC
Presenter
Kinza Asghar
First edition
• The first edition of BNC was
completed in 1994.
• The first general release of the corpus
for European researchers was
announced in February 1995.
BNC World
• BNC World, a slightly revised version was
made available in 2001, indicates that the
corpus is now available under license
world wide.
 BNC is available in two flavors;
1. Under the single user license (cost 50
pound) you can install the whole corpus
and the SARA software on a single
machine for personal use.
2. Alternatively, for the same price, you can
install just the corpus itself and use
whatever software you want.
BNC XML
• BNC XML is the latest version of the
British National Corpus.
• XML stands for Extensible Markup
Language.
• XML is a set of rules for encoding
documents in machine readable form.
• The main differences between this version
and the BNC World are:
1. Errors and inconsistencies have been
removed.
2. Lemma information.
3. Simplified part of speech information
added.
• BNC XML can be accessed in three ways:
1. Online use.
2. Download the corpus and XAIRA.
3. Download just the corpus and use it with
any software you want.
•Two subsets of BNC have
been produced separately:
• BNC Baby.
• BNC Sampler.
BNC Baby
• BNC Baby is a subset of
the BNC. It consists of
four one million word
samples, each compiled
as an example of a
particular genre: fiction,
newspaper, academic
writing and spoken
conversation.
BNC Sampler
• The BNC Sampler is a subset of the full BNC. It
comprises two samples of written and spoken
material of one million word each, compiled to
mirror the composition of the full BNC as far as
possible.
• The sampler was first created at Lancaster University
during the creation of the BNC.
Online use of BNC
• Go to the home page.
• Put the word into search bar and then click on the
search button.
• It will show the content in which the word is being
used.
• For instance, if we look for a word “couch” the
corpus will show us its collocations, frequency and
KWIC.
.

More Related Content

What's hot

Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Umm-e-Rooman Yaqoob
 
Aspects of Critical discourse analysis by Ruth Wodak
Aspects of Critical discourse analysis by Ruth WodakAspects of Critical discourse analysis by Ruth Wodak
Aspects of Critical discourse analysis by Ruth WodakHusnat Ahmed
 
critical discourse analysis
critical discourse analysiscritical discourse analysis
critical discourse analysissiti nursaripah
 
Introduction to Systemic Functional Linguistics
Introduction to Systemic Functional LinguisticsIntroduction to Systemic Functional Linguistics
Introduction to Systemic Functional LinguisticsAleeenaFarooq
 
Critical Discourse Analysis
Critical Discourse AnalysisCritical Discourse Analysis
Critical Discourse AnalysisHana Zarei
 
Modern linguistics
Modern linguisticsModern linguistics
Modern linguisticsamoresyoh99
 
Sociolinguistics language variations
Sociolinguistics language variationsSociolinguistics language variations
Sociolinguistics language variationsUTPL UTPL
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsAdnanBaloch15
 
Systemic functional grammar
Systemic functional grammarSystemic functional grammar
Systemic functional grammarmumayouth
 
Critical discourse analysis
Critical discourse analysisCritical discourse analysis
Critical discourse analysisHina Honey
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics introAlex Curtis
 
Critical discourse analysis wodak model
Critical discourse analysis wodak modelCritical discourse analysis wodak model
Critical discourse analysis wodak modelKomal Kazmi
 
Definitions, Origins and approaches of Sociolinguistics
Definitions, Origins and approaches of Sociolinguistics Definitions, Origins and approaches of Sociolinguistics
Definitions, Origins and approaches of Sociolinguistics AleeenaFarooq
 

What's hot (20)

Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
 
Minimalist program
Minimalist programMinimalist program
Minimalist program
 
Aspects of Critical discourse analysis by Ruth Wodak
Aspects of Critical discourse analysis by Ruth WodakAspects of Critical discourse analysis by Ruth Wodak
Aspects of Critical discourse analysis by Ruth Wodak
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Stylistic analysis
Stylistic analysisStylistic analysis
Stylistic analysis
 
critical discourse analysis
critical discourse analysiscritical discourse analysis
critical discourse analysis
 
Introduction to Systemic Functional Linguistics
Introduction to Systemic Functional LinguisticsIntroduction to Systemic Functional Linguistics
Introduction to Systemic Functional Linguistics
 
Critical Discourse Analysis
Critical Discourse AnalysisCritical Discourse Analysis
Critical Discourse Analysis
 
Modern linguistics
Modern linguisticsModern linguistics
Modern linguistics
 
History of linguistics - Schools of Linguistics
 History of linguistics - Schools of Linguistics History of linguistics - Schools of Linguistics
History of linguistics - Schools of Linguistics
 
Sociolinguistics language variations
Sociolinguistics language variationsSociolinguistics language variations
Sociolinguistics language variations
 
Generative grammar
Generative grammarGenerative grammar
Generative grammar
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Systemic functional grammar
Systemic functional grammarSystemic functional grammar
Systemic functional grammar
 
Critical discourse analysis
Critical discourse analysisCritical discourse analysis
Critical discourse analysis
 
Corpus linguistics intro
Corpus linguistics introCorpus linguistics intro
Corpus linguistics intro
 
Applied linguistics
Applied linguisticsApplied linguistics
Applied linguistics
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Critical discourse analysis wodak model
Critical discourse analysis wodak modelCritical discourse analysis wodak model
Critical discourse analysis wodak model
 
Definitions, Origins and approaches of Sociolinguistics
Definitions, Origins and approaches of Sociolinguistics Definitions, Origins and approaches of Sociolinguistics
Definitions, Origins and approaches of Sociolinguistics
 

Viewers also liked

Level II eAuthentication Request User Guide
Level II eAuthentication Request User GuideLevel II eAuthentication Request User Guide
Level II eAuthentication Request User GuideLincoln Stanley
 
Carnet de voyage QiongYou
Carnet de voyage QiongYouCarnet de voyage QiongYou
Carnet de voyage QiongYouAmélie Perrin
 
история моего автомобиля+
история моего автомобиля+история моего автомобиля+
история моего автомобиля+natasha1979393
 
MaFengWo, du site collaboratif à l'OTA
MaFengWo, du site collaboratif à l'OTAMaFengWo, du site collaboratif à l'OTA
MaFengWo, du site collaboratif à l'OTAAmélie Perrin
 
Modern Media Discourse Components: A Suite of Solutions to a Profusion of Pro...
Modern Media Discourse Components: A Suite of Solutions to a Profusion of Pro...Modern Media Discourse Components: A Suite of Solutions to a Profusion of Pro...
Modern Media Discourse Components: A Suite of Solutions to a Profusion of Pro...usmanteau
 
FEBRUARY 2017 - Pictures of the day - Feb.6 - Feb.10
FEBRUARY 2017 - Pictures of the day - Feb.6 - Feb.10FEBRUARY 2017 - Pictures of the day - Feb.6 - Feb.10
FEBRUARY 2017 - Pictures of the day - Feb.6 - Feb.10vinhbinh2010
 

Viewers also liked (10)

Level II eAuthentication Request User Guide
Level II eAuthentication Request User GuideLevel II eAuthentication Request User Guide
Level II eAuthentication Request User Guide
 
DV_Latam Forum
DV_Latam ForumDV_Latam Forum
DV_Latam Forum
 
Carnet de voyage QiongYou
Carnet de voyage QiongYouCarnet de voyage QiongYou
Carnet de voyage QiongYou
 
история моего автомобиля+
история моего автомобиля+история моего автомобиля+
история моего автомобиля+
 
MaFengWo, du site collaboratif à l'OTA
MaFengWo, du site collaboratif à l'OTAMaFengWo, du site collaboratif à l'OTA
MaFengWo, du site collaboratif à l'OTA
 
Modern Media Discourse Components: A Suite of Solutions to a Profusion of Pro...
Modern Media Discourse Components: A Suite of Solutions to a Profusion of Pro...Modern Media Discourse Components: A Suite of Solutions to a Profusion of Pro...
Modern Media Discourse Components: A Suite of Solutions to a Profusion of Pro...
 
Derecho laboral.
Derecho laboral.Derecho laboral.
Derecho laboral.
 
Cocina tailandesa
Cocina tailandesaCocina tailandesa
Cocina tailandesa
 
FEBRUARY 2017 - Pictures of the day - Feb.6 - Feb.10
FEBRUARY 2017 - Pictures of the day - Feb.6 - Feb.10FEBRUARY 2017 - Pictures of the day - Feb.6 - Feb.10
FEBRUARY 2017 - Pictures of the day - Feb.6 - Feb.10
 
Conoscenza e use cases
Conoscenza e use casesConoscenza e use cases
Conoscenza e use cases
 

Similar to Corpus and bnc

British national corpus
British national corpusBritish national corpus
British national corpusLaura P
 
British national corpus
British national corpusBritish national corpus
British national corpusLaura P
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsIrum Malik
 
Computational linguistics
Computational linguistics Computational linguistics
Computational linguistics kashmasardar
 
Corpus linguistics, ch6
Corpus linguistics, ch6Corpus linguistics, ch6
Corpus linguistics, ch6VivaAs
 
Using corpora in instruction
Using corpora in instructionUsing corpora in instruction
Using corpora in instructionJonathan Smart
 
The European(a) Newspapers Project
The European(a) Newspapers ProjectThe European(a) Newspapers Project
The European(a) Newspapers ProjectEuropeana Newspapers
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...ijnlc
 
Controlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationControlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationTobias Kuhn
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...kevig
 
A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...
A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...
A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...kevig
 
A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...
A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...
A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...kevig
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER Europe
 
Can programming be liberated from the von neumman style
Can programming be liberated from the von neumman styleCan programming be liberated from the von neumman style
Can programming be liberated from the von neumman styleshady_10
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectEuropeana Newspapers
 

Similar to Corpus and bnc (20)

British national corpus
British national corpusBritish national corpus
British national corpus
 
British national corpus
British national corpusBritish national corpus
British national corpus
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Computational linguistics
Computational linguistics Computational linguistics
Computational linguistics
 
Corpus linguistics, ch6
Corpus linguistics, ch6Corpus linguistics, ch6
Corpus linguistics, ch6
 
Corpus
CorpusCorpus
Corpus
 
Using corpora in instruction
Using corpora in instructionUsing corpora in instruction
Using corpora in instruction
 
The European(a) Newspapers Project
The European(a) Newspapers ProjectThe European(a) Newspapers Project
The European(a) Newspapers Project
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
 
Controlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationControlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for Standardization
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
 
A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...
A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...
A Rule-Based Approach for Aligning Japanese-Spanish Sentences from A Comparab...
 
A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...
A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...
A RULE-BASED APPROACH FOR ALIGNING JAPANESE-SPANISH SENTENCES FROM A COMPARAB...
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
 
Can programming be liberated from the von neumman style
Can programming be liberated from the von neumman styleCan programming be liberated from the von neumman style
Can programming be liberated from the von neumman style
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
 

Recently uploaded

Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 

Recently uploaded (20)

Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 

Corpus and bnc

  • 1.
  • 2. BNC and its Online Use
  • 4. corpus •A corpus can be defined as a systematic collection of naturally occurring text in electronic form.
  • 5.
  • 6. Corpus linguistics • Corpus linguistics is the study of language/linguistic phenomena through the analysis of data obtained from a corpus. • Corpus linguistic is the analysis of text with the help of computer, i.e. with specialized software.
  • 7. •A corpus is always designed for a particular purpose, the usefulness of a ready made corpus must be judged with regard to the purpose to which a user intends to put it.
  • 8. Famous corpora •The Brown Corpus •The Lancaster-Oslo/Bergen •The London Lund Corpus •The British National Corpus
  • 9.
  • 10. The Brown Corpus • The Brown Corpus of Standard American English was the first of the modern, computer readable, general corpora. The corpus consists of one million words of American English texts printed in 1961.
  • 11. The Lancaster- Oslo/Bergen • The Lancaster-Oslo/Bergen Corpus is a million word collection of British English texts which was compiled in the 1970s in collaboration between the University of Lancaster, The University of Oslo, and the Norwegian Computing Center for the Humanities, Bergen.
  • 12. The London Lund Corpus • The London Lund Corpus of English derives from two projects: the Survey of English Usage at University College London and the Survey of Spoken English, which was started at Lund University in 1975. the corpus consists of 500,000 words of spoken British English.
  • 13.
  • 14. The British National Corpus • The British National Corpus is a 100 million collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century.
  • 15. Creation of BNC: • The project was developed by an academic consortium called BNC consortium. • An industrial/academic consortium lead by Oxford University press of which the members are more dictionary publishers.
  • 16. • The Consortium was formed in 1990 and started work in 1991 on the three year task of producing a hundred million word corpus of modern British English for use in commercial and academic research. All major decisions regarding BNC are still made by them.
  • 17. •The BNC comprises approximately 100 million words of •Written texts (90%) •Transcripts of speech (10%)
  • 18. Why we use BNC • BNC can be used to know about aspects we did not know about a word and to check our thoughts about its meaning. Moreover, the corpus can help to find out the meaning of a word not just what we think it means. We can use BNC to check either a word is a part of BNC or not.
  • 19. Properties of British National Corpus Presented by:- Hadia Tabassum
  • 20. Bnc is a sample of 100 million words including spoken and written Britain English. It is a balanced and finite corpus that contains approximately 90% written data and 10%spoken data. Features of British National Corpus
  • 21. Spoken componentsdata in BNC: Spoken compone nts The conversa tion part Task oriented part
  • 22. The conversational part:- • This part is largely based on recordings of every day conversation interaction engaged in by some 127 adults aged 15 and over. Some additional recording of under fifteen were included from COLT. The volunteers were selected according to demographic area of age, social group, and sex with the aim of obtaining approximately equal number in each group. well, conversational part make up just over 40% of the spoken corpus.
  • 23. Respondents in ‘’conversational part” were selected according to following properties; Age Social group Sex Percenta ge Under fifteen Upper class Male 41.14 15-24 Middle class Female 58.47 24-34 Lower class Unclassi fied 0.38
  • 24. The task oriented part: In this material was intended to represent those types of task oriented spoken activity that were unlikely to be recorded by conversational volunteers during a typical day in their lives. e.g. Lectures, consultations, sermons, T.V/radio broadcasting etc and this part contains 60% of spoken corpus.
  • 26. Continued….. Imaginative text account for 20% and informative text about 80% in written components. the imaginative text are divided into further categories prose, poetry etc. on the other hand informative data is subdivided into eight categories. 1.Arts 2.Natural sciences 3.Commerce 4.Applied sciences 5.Leisure 6.Social sciences 7.Beliefs and arts 8. World affairs
  • 27. Abbreviations and acronyms: BNC provides us the same abbreviated sequence in many different ways such as P.C, PC, P.C although the same forms reflect different origins .(police Constable, postcard, personal computer)
  • 28. Monolingual: Although BNC include many different styles, verities and genera yet it deal with only modern British English and not with other languages used in Britain.
  • 29. Synchronic: BNC Covers British English of the late twentieth century ,rather than the historical development which produced it. it is updated time by time or with the passage of time
  • 31. First edition • The first edition of BNC was completed in 1994. • The first general release of the corpus for European researchers was announced in February 1995.
  • 32. BNC World • BNC World, a slightly revised version was made available in 2001, indicates that the corpus is now available under license world wide.
  • 33.  BNC is available in two flavors; 1. Under the single user license (cost 50 pound) you can install the whole corpus and the SARA software on a single machine for personal use. 2. Alternatively, for the same price, you can install just the corpus itself and use whatever software you want.
  • 34. BNC XML • BNC XML is the latest version of the British National Corpus. • XML stands for Extensible Markup Language. • XML is a set of rules for encoding documents in machine readable form.
  • 35. • The main differences between this version and the BNC World are: 1. Errors and inconsistencies have been removed. 2. Lemma information. 3. Simplified part of speech information added.
  • 36. • BNC XML can be accessed in three ways: 1. Online use. 2. Download the corpus and XAIRA. 3. Download just the corpus and use it with any software you want.
  • 37. •Two subsets of BNC have been produced separately: • BNC Baby. • BNC Sampler.
  • 38. BNC Baby • BNC Baby is a subset of the BNC. It consists of four one million word samples, each compiled as an example of a particular genre: fiction, newspaper, academic writing and spoken conversation.
  • 39. BNC Sampler • The BNC Sampler is a subset of the full BNC. It comprises two samples of written and spoken material of one million word each, compiled to mirror the composition of the full BNC as far as possible. • The sampler was first created at Lancaster University during the creation of the BNC.
  • 40. Online use of BNC • Go to the home page. • Put the word into search bar and then click on the search button. • It will show the content in which the word is being used. • For instance, if we look for a word “couch” the corpus will show us its collocations, frequency and KWIC.
  • 41.
  • 42.
  • 43. .