This document discusses corpus linguistics and what can be learned from corpora. It defines corpus linguistics as using large computerized collections of natural language texts to explore patterns of language use. Corpora can be general collections covering a wide range of language or specialized for a particular domain. Basic tools allow analyzing word frequencies, while more advanced tools like concordancers examine collocations and provide context for search words. Overall, corpora provide both quantitative and qualitative insights into real language patterns.
Corpus linguistics is the study of language based on large collections of real-world language samples stored electronically. It allows for reliable, accurate, and replicable analysis of language at a large scale and in new ways not previously possible. A corpus is a large collection of written or spoken language samples that is stored electronically and can be analyzed using specialized software. Corpus linguistics provides insights into language usage that were previously difficult to obtain at a large scale through computer-assisted analysis of large text collections.
Applied linguistics uses knowledge about language, how it is learned, and how it is used to solve real-world problems. It includes areas like second language teaching, literacy, speech pathology, and translation. Applied linguistics has developed over the 20th century through different language teaching methods like the direct method, grammar translation, and audiolingualism. More recently, it views language in holistic and integrative ways rather than discrete skills, and considers the language learner's perspective. It also takes new approaches to teaching the four language skills of listening, speaking, reading and writing. Applied linguistics often lacks definitive answers because language occurs between people and in the mind.
This document provides an introduction to corpus linguistics, including definitions of key terms and concepts. It discusses the history and development of corpus linguistics from early generational corpora in the 1960s-1990s to current applications. The document outlines different types of corpora as well as software tools used for corpus analysis. Both advantages and limitations of using corpora are presented. Finally, examples of famous corpora and applications to translation studies and research methods in corpus linguistics are briefly mentioned.
Corpus linguistics involves the collection and analysis of large bodies of text, known as corpora. It uses empirical methods to discover patterns of language use by examining naturally occurring texts. Key aspects of corpus linguistics include using large, representative text samples; analyzing frequency, collocation, and concordance; and applying findings to fields like lexicography, language teaching, and theoretical linguistics. Corpus analysis tools allow researchers to investigate features of syntax, semantics, and pragmatics that traditional intuition-based methods cannot.
Discourse analysis (Schmitt's book chapter 4)Samira Rahmdel
The document discusses discourse analysis and its approaches. It covers conversational analysis, ethnography, speech act theory, structural functional linguistics, and systemic functional linguistics. Conversational analysis examines patterns in turn-taking, adjacency pairs, and back-channel responses in natural conversations. Ethnography uses the speaking grid and analyzes speech events and genres. Structural functional linguistics developed models to analyze classroom discourse with transactions, exchanges, moves, and acts.
Introductory lecture on Corpus Linguistics. Contents: Corpus linguistics: past and present, What is a corpus?, Why use computers to study language? Corpus-based vs. Intuition-based approach, Theory vs. Methodology.
This lecture was based on McEnery et al. 2006. Corpus-based Language Studies. An Advanced resource book. Routlege.
This document provides an overview of discourse analysis and different approaches to analyzing discourse. It discusses how discourse analysis examines both spoken and written language in their social contexts. Several key approaches are described, including conversation analysis, variation theory, systemic functional linguistics, and critical discourse analysis. The document also compares differences between spoken and written language at the levels of grammar and vocabulary choice.
This document provides an overview of contrastive analysis between English and Arabic. It begins with the objectives of familiarizing trainee teachers with contrastive analysis and its pedagogical implications. The document then defines contrastive analysis and outlines its emergence. Key points of contrast between English and Arabic phonology, grammar, and other linguistic features are described. Finally, the interference of an Arabic mother tongue on learning English is discussed through case studies of errors related to redundancy, prepositions, syntax, and other areas. The document aims to help teachers address challenges English learners face due to their native language.
Corpus linguistics is the study of language based on large collections of real-world language samples stored electronically. It allows for reliable, accurate, and replicable analysis of language at a large scale and in new ways not previously possible. A corpus is a large collection of written or spoken language samples that is stored electronically and can be analyzed using specialized software. Corpus linguistics provides insights into language usage that were previously difficult to obtain at a large scale through computer-assisted analysis of large text collections.
Applied linguistics uses knowledge about language, how it is learned, and how it is used to solve real-world problems. It includes areas like second language teaching, literacy, speech pathology, and translation. Applied linguistics has developed over the 20th century through different language teaching methods like the direct method, grammar translation, and audiolingualism. More recently, it views language in holistic and integrative ways rather than discrete skills, and considers the language learner's perspective. It also takes new approaches to teaching the four language skills of listening, speaking, reading and writing. Applied linguistics often lacks definitive answers because language occurs between people and in the mind.
This document provides an introduction to corpus linguistics, including definitions of key terms and concepts. It discusses the history and development of corpus linguistics from early generational corpora in the 1960s-1990s to current applications. The document outlines different types of corpora as well as software tools used for corpus analysis. Both advantages and limitations of using corpora are presented. Finally, examples of famous corpora and applications to translation studies and research methods in corpus linguistics are briefly mentioned.
Corpus linguistics involves the collection and analysis of large bodies of text, known as corpora. It uses empirical methods to discover patterns of language use by examining naturally occurring texts. Key aspects of corpus linguistics include using large, representative text samples; analyzing frequency, collocation, and concordance; and applying findings to fields like lexicography, language teaching, and theoretical linguistics. Corpus analysis tools allow researchers to investigate features of syntax, semantics, and pragmatics that traditional intuition-based methods cannot.
Discourse analysis (Schmitt's book chapter 4)Samira Rahmdel
The document discusses discourse analysis and its approaches. It covers conversational analysis, ethnography, speech act theory, structural functional linguistics, and systemic functional linguistics. Conversational analysis examines patterns in turn-taking, adjacency pairs, and back-channel responses in natural conversations. Ethnography uses the speaking grid and analyzes speech events and genres. Structural functional linguistics developed models to analyze classroom discourse with transactions, exchanges, moves, and acts.
Introductory lecture on Corpus Linguistics. Contents: Corpus linguistics: past and present, What is a corpus?, Why use computers to study language? Corpus-based vs. Intuition-based approach, Theory vs. Methodology.
This lecture was based on McEnery et al. 2006. Corpus-based Language Studies. An Advanced resource book. Routlege.
This document provides an overview of discourse analysis and different approaches to analyzing discourse. It discusses how discourse analysis examines both spoken and written language in their social contexts. Several key approaches are described, including conversation analysis, variation theory, systemic functional linguistics, and critical discourse analysis. The document also compares differences between spoken and written language at the levels of grammar and vocabulary choice.
This document provides an overview of contrastive analysis between English and Arabic. It begins with the objectives of familiarizing trainee teachers with contrastive analysis and its pedagogical implications. The document then defines contrastive analysis and outlines its emergence. Key points of contrast between English and Arabic phonology, grammar, and other linguistic features are described. Finally, the interference of an Arabic mother tongue on learning English is discussed through case studies of errors related to redundancy, prepositions, syntax, and other areas. The document aims to help teachers address challenges English learners face due to their native language.
The document discusses corpus linguistics and different types of corpora. It defines corpus linguistics as the study of language based on large collections of electronic texts, known as corpora. It describes general corpora, specialized corpora, historical/diachronic corpora, regional corpora, learner corpora, multilingual corpora, comparable corpora, and parallel corpora. It also discusses corpus annotation, concordancing, frequency and keyword lists, collocation, and software used for corpus analysis.
Applied linguistics is the interdisciplinary study of language and its applications in real world contexts. It draws on linguistic theories and research to solve practical language-related problems. Key areas include second language acquisition, teaching methodology, testing, and the relationships between language and society, technology, and other fields. Throughout the 20th century, applied linguistics influenced the development of language teaching methods, shifting the focus from grammar translation to more communicative, meaning-based approaches grounded in theories of language acquisition and use.
This presentation answers some questions like: ''How are languages planned in multilingual countries?, What is the role of TDK in Turkish language reform?, What are the processes of Language Planning?'' Language planning in Switzerland, Canada, India and USA is mentioned in this presentation.
The document discusses four major theories of second language acquisition:
1) The behaviorist perspective which focuses on habit formation through practice and reinforcement.
2) The innatist perspective which posits that humans have an innate Universal Grammar that facilitates language learning.
3) The cognitive/developmental perspective which explains language learning through general theories of learning like information processing and interaction.
4) The sociocultural perspective which views language development as arising through social interaction, such as interacting within one's Zone of Proximal Development.
Corpus linguistics is the analysis of large collections of machine-readable texts called corpora. It utilizes computers to analyze patterns of language use in natural texts. Corpus linguistics is an empirical approach that uses quantitative and qualitative techniques on representative text samples to study topics like lexicography, grammar, dialects and language acquisition. It provides consistent, reliable analyses of complex language patterns not possible through manual analysis alone.
This document discusses discourse structure and conversation analysis. It defines conversation as a less formal type of discourse involving small numbers of participants where turns are short. Conversation analysis examines patterns in natural conversation data and how participants negotiate turn-taking through linguistic and non-linguistic signals. Turn-taking involves adjacency pairs, insertion sequences where other topics are briefly discussed, and repairs to clarify meaning. The document presents discourse as a process that is constructed through participant interaction and turn-taking signals.
The document discusses applied linguistics and interdisciplines. It defines applied linguistics as using linguistic theories and methods to solve language problems in other fields. The history of applied linguistics is discussed, along with its aims to study language learning and teaching and solve related problems. Interdisciplines that applied linguistics interacts with are sociolinguistics, psycholinguistics, neurolinguistics, and various applied areas like education, speech therapy, computing, and international relations.
The document discusses language standardization, including how and why languages become standardized. It notes that standardization is a prescriptive process that develops a standard variety of a language. Languages typically become standardized through resources like dictionaries, grammars, pronunciation guides from linguistic institutions, constitutional status as an official language, use in public domains like courts and schools, literary works, and popularity/acceptance in the community. Establishing a standard variety aims to promote national cohesion. The standard variety often reflects the language of higher socioeconomic groups. Examples are given of standardization processes and debates in countries like Brazil, Angola, Mozambique, and Cape Verde. Related scientific papers and books on topics like the politics of standardization and its
1. Corpus linguistics is the study of language using large collections of electronic texts called corpora.
2. A teacher conducted a corpus analysis of student writing to determine the most frequent words. The three most common words were "the", "for", and "it".
3. Corpora come in many types including speech, text, monolingual, parallel, and learner corpora. They are used for various linguistic analyses.
Applied linguistics is an interdisciplinary field that identifies and solves real-world language problems. It applies the knowledge of linguistics to improve practical tasks involving language. Some related fields are education, psychology, communication research, anthropology, and sociology. Applied linguistics investigates language learning and teaching problems, the role of language in culture and society, and finds solutions to language issues linguistics cannot solve alone. It covers domains like computational linguistics, sociolinguistics, psycholinguistics, neurolinguistics, and others.
Discourse analysis involves studying language beyond the sentence level, including conversations and written texts. There are various approaches to discourse analysis from different fields like sociology, linguistics, and philosophy. Sociological approaches include conversational analysis which examines turn-taking, openings/closings of conversations. Systemic functional linguistics views language as evolving based on its social functions and analyzes texts in relation to social contexts. Critical discourse analysis considers how power and social domination are reproduced through language.
This document discusses the definition and key aspects of a corpus in linguistics. It defines a corpus as a large collection of text samples that are selected and organized according to linguistic criteria. The corpus aims to represent a given language, dialect, or subset of language. It should contain a diverse range of authentic texts and be large enough to characterize different varieties and uses of the language. Important qualities of a corpus include quantity, quality, representativeness, simplicity, equality, retrievability, verifiability, augmentation, documentation, and management.
This document discusses language contact in sociolinguistics. It describes how bilingual speakers may use a third language, mix languages, or switch between languages when communicating. This can lead to lingua francas, pidgins, or code-switching. Pidgins develop between languages for trade purposes with simplified grammar and vocabulary. If passed to new generations, a pidgin may become a creole language. Code-switching refers to mixing or alternating between two languages in speech. It helps with expression or identifies mixed cultural identity. Pidgins integrate languages extensively while code-switching shifts are restricted to vocabulary within sentences.
Relationship between language, culture, and identityCool Chaandni
This document discusses the relationship between language, culture, and identity. It argues that language and culture influence each other mutually - language is shaped by culture but also shapes culture. Membership in a cultural group influences one's identity. The levels of identification conveyed through language include nationality, social class, gender, generation, and profession. Language determines ways of thinking through influencing cognition as proposed in the Sapir-Whorf hypothesis. Overall, the document presents language, culture and identity as intricately interconnected and constantly evolving.
Contrastive analysis is the systematic study of two languages to identify their structural differences and similarities. It was originally used to establish language families but was later applied to second language acquisition in the 1960s. The contrastive analysis hypothesis claimed that elements similar between a learner's first and second language would be easier to acquire, while differences would be more difficult. However, empirical evidence showed this could not predict all errors, and some uniform errors occurred regardless of first language. This led to the development of error analysis and the concept of interlanguage, seeing second language acquisition as its own rule-governed linguistic system rather than an imperfect version of the target language.
This document provides an overview of applied linguistics and how knowledge of linguistics can help teachers support English learners. It defines applied linguistics as investigating and addressing language-related problems in both first and second language acquisition. The document outlines key aspects of linguistics including phonology, morphology, syntax, semantics, and pragmatics. It explains that while teachers do not need the same depth of knowledge as applied linguistics experts, they should understand language acquisition theories and how knowledge of linguistics can help them teach English, support communication skills, evaluate students appropriately considering their backgrounds, and socialize students into the school culture.
This document provides an overview of language planning. It defines language planning as efforts to influence and modify a language's structure and function. It discusses key aspects of language planning including its goals, processes, types (status and corpus planning), ideologies, and issues. The summary focuses on language planning's aim to alter a language's role and how it is implemented through selection, codification, elaboration, and acceptance of a standardized variety.
This document defines and summarizes key terms in corpus linguistics. It discusses bootstrapping, the Brill tagger, competence-performance dichotomy, computational linguistics, computer assisted language learning, corpus linguistics, extensible markup language, Penn Treebank, Kolhapur Corpus, Hyderabad Corpus, Text Encoding Initiative, Unicode, Linguistic Data Consortium, and alignment.
This document defines and describes various terms and concepts related to corpus linguistics and natural language processing. It defines acronyms for various corpora and projects. It also defines key concepts like alignment, annotation, ambiguity, balanced corpora, concordancing, part-of-speech tagging, and probabilistic tagging using n-grams.
The document discusses corpus linguistics and different types of corpora. It defines corpus linguistics as the study of language based on large collections of electronic texts, known as corpora. It describes general corpora, specialized corpora, historical/diachronic corpora, regional corpora, learner corpora, multilingual corpora, comparable corpora, and parallel corpora. It also discusses corpus annotation, concordancing, frequency and keyword lists, collocation, and software used for corpus analysis.
Applied linguistics is the interdisciplinary study of language and its applications in real world contexts. It draws on linguistic theories and research to solve practical language-related problems. Key areas include second language acquisition, teaching methodology, testing, and the relationships between language and society, technology, and other fields. Throughout the 20th century, applied linguistics influenced the development of language teaching methods, shifting the focus from grammar translation to more communicative, meaning-based approaches grounded in theories of language acquisition and use.
This presentation answers some questions like: ''How are languages planned in multilingual countries?, What is the role of TDK in Turkish language reform?, What are the processes of Language Planning?'' Language planning in Switzerland, Canada, India and USA is mentioned in this presentation.
The document discusses four major theories of second language acquisition:
1) The behaviorist perspective which focuses on habit formation through practice and reinforcement.
2) The innatist perspective which posits that humans have an innate Universal Grammar that facilitates language learning.
3) The cognitive/developmental perspective which explains language learning through general theories of learning like information processing and interaction.
4) The sociocultural perspective which views language development as arising through social interaction, such as interacting within one's Zone of Proximal Development.
Corpus linguistics is the analysis of large collections of machine-readable texts called corpora. It utilizes computers to analyze patterns of language use in natural texts. Corpus linguistics is an empirical approach that uses quantitative and qualitative techniques on representative text samples to study topics like lexicography, grammar, dialects and language acquisition. It provides consistent, reliable analyses of complex language patterns not possible through manual analysis alone.
This document discusses discourse structure and conversation analysis. It defines conversation as a less formal type of discourse involving small numbers of participants where turns are short. Conversation analysis examines patterns in natural conversation data and how participants negotiate turn-taking through linguistic and non-linguistic signals. Turn-taking involves adjacency pairs, insertion sequences where other topics are briefly discussed, and repairs to clarify meaning. The document presents discourse as a process that is constructed through participant interaction and turn-taking signals.
The document discusses applied linguistics and interdisciplines. It defines applied linguistics as using linguistic theories and methods to solve language problems in other fields. The history of applied linguistics is discussed, along with its aims to study language learning and teaching and solve related problems. Interdisciplines that applied linguistics interacts with are sociolinguistics, psycholinguistics, neurolinguistics, and various applied areas like education, speech therapy, computing, and international relations.
The document discusses language standardization, including how and why languages become standardized. It notes that standardization is a prescriptive process that develops a standard variety of a language. Languages typically become standardized through resources like dictionaries, grammars, pronunciation guides from linguistic institutions, constitutional status as an official language, use in public domains like courts and schools, literary works, and popularity/acceptance in the community. Establishing a standard variety aims to promote national cohesion. The standard variety often reflects the language of higher socioeconomic groups. Examples are given of standardization processes and debates in countries like Brazil, Angola, Mozambique, and Cape Verde. Related scientific papers and books on topics like the politics of standardization and its
1. Corpus linguistics is the study of language using large collections of electronic texts called corpora.
2. A teacher conducted a corpus analysis of student writing to determine the most frequent words. The three most common words were "the", "for", and "it".
3. Corpora come in many types including speech, text, monolingual, parallel, and learner corpora. They are used for various linguistic analyses.
Applied linguistics is an interdisciplinary field that identifies and solves real-world language problems. It applies the knowledge of linguistics to improve practical tasks involving language. Some related fields are education, psychology, communication research, anthropology, and sociology. Applied linguistics investigates language learning and teaching problems, the role of language in culture and society, and finds solutions to language issues linguistics cannot solve alone. It covers domains like computational linguistics, sociolinguistics, psycholinguistics, neurolinguistics, and others.
Discourse analysis involves studying language beyond the sentence level, including conversations and written texts. There are various approaches to discourse analysis from different fields like sociology, linguistics, and philosophy. Sociological approaches include conversational analysis which examines turn-taking, openings/closings of conversations. Systemic functional linguistics views language as evolving based on its social functions and analyzes texts in relation to social contexts. Critical discourse analysis considers how power and social domination are reproduced through language.
This document discusses the definition and key aspects of a corpus in linguistics. It defines a corpus as a large collection of text samples that are selected and organized according to linguistic criteria. The corpus aims to represent a given language, dialect, or subset of language. It should contain a diverse range of authentic texts and be large enough to characterize different varieties and uses of the language. Important qualities of a corpus include quantity, quality, representativeness, simplicity, equality, retrievability, verifiability, augmentation, documentation, and management.
This document discusses language contact in sociolinguistics. It describes how bilingual speakers may use a third language, mix languages, or switch between languages when communicating. This can lead to lingua francas, pidgins, or code-switching. Pidgins develop between languages for trade purposes with simplified grammar and vocabulary. If passed to new generations, a pidgin may become a creole language. Code-switching refers to mixing or alternating between two languages in speech. It helps with expression or identifies mixed cultural identity. Pidgins integrate languages extensively while code-switching shifts are restricted to vocabulary within sentences.
Relationship between language, culture, and identityCool Chaandni
This document discusses the relationship between language, culture, and identity. It argues that language and culture influence each other mutually - language is shaped by culture but also shapes culture. Membership in a cultural group influences one's identity. The levels of identification conveyed through language include nationality, social class, gender, generation, and profession. Language determines ways of thinking through influencing cognition as proposed in the Sapir-Whorf hypothesis. Overall, the document presents language, culture and identity as intricately interconnected and constantly evolving.
Contrastive analysis is the systematic study of two languages to identify their structural differences and similarities. It was originally used to establish language families but was later applied to second language acquisition in the 1960s. The contrastive analysis hypothesis claimed that elements similar between a learner's first and second language would be easier to acquire, while differences would be more difficult. However, empirical evidence showed this could not predict all errors, and some uniform errors occurred regardless of first language. This led to the development of error analysis and the concept of interlanguage, seeing second language acquisition as its own rule-governed linguistic system rather than an imperfect version of the target language.
This document provides an overview of applied linguistics and how knowledge of linguistics can help teachers support English learners. It defines applied linguistics as investigating and addressing language-related problems in both first and second language acquisition. The document outlines key aspects of linguistics including phonology, morphology, syntax, semantics, and pragmatics. It explains that while teachers do not need the same depth of knowledge as applied linguistics experts, they should understand language acquisition theories and how knowledge of linguistics can help them teach English, support communication skills, evaluate students appropriately considering their backgrounds, and socialize students into the school culture.
This document provides an overview of language planning. It defines language planning as efforts to influence and modify a language's structure and function. It discusses key aspects of language planning including its goals, processes, types (status and corpus planning), ideologies, and issues. The summary focuses on language planning's aim to alter a language's role and how it is implemented through selection, codification, elaboration, and acceptance of a standardized variety.
This document defines and summarizes key terms in corpus linguistics. It discusses bootstrapping, the Brill tagger, competence-performance dichotomy, computational linguistics, computer assisted language learning, corpus linguistics, extensible markup language, Penn Treebank, Kolhapur Corpus, Hyderabad Corpus, Text Encoding Initiative, Unicode, Linguistic Data Consortium, and alignment.
This document defines and describes various terms and concepts related to corpus linguistics and natural language processing. It defines acronyms for various corpora and projects. It also defines key concepts like alignment, annotation, ambiguity, balanced corpora, concordancing, part-of-speech tagging, and probabilistic tagging using n-grams.
This document discusses how to design, acquire, and process a collection of linguistic data to form the raw material for a dictionary. It explains that a reliable dictionary requires evidence from language used in real communicative acts. While introspection and informant testing provide some evidence, they are limited due to subjectivity. Therefore, observation of language in use through large text corpora is indispensable. Key considerations in corpus design include size, inclusion of different text types and styles to avoid bias, and ensuring representativeness.
This document discusses corpus linguistics and quantitative research design. It defines a corpus as a large collection of texts used for linguistic analysis. Corpus linguistics allows researchers to empirically test hypotheses about language patterns and features based on large amounts of real-world data. Quantitative analysis of corpus data shows how frequently certain words, constructions, and patterns are used. Specialized corpora can focus on particular text types, languages, or learner language. Various software tools are used to analyze corpora through frequency lists, keyword lists, collocation analysis, and other methods.
The document discusses text corpora and provides information about the CORDE and BNC corpora. It summarizes a study comparing searches for the term "nación" in the CORDE and BNC corpora. The CORDE allowed for more in-depth analysis by providing statistics and contextual information, while the BNC only showed random results without additional data.
This document discusses corpus linguistics and methods of corpus analysis. It defines corpus linguistics as the study of language using large samples of authentic texts. It outlines the history of corpus linguistics from early manually created corpora to current large electronically stored corpora. It also discusses different types of annotation that can be applied to corpora, including part-of-speech tagging, syntactic analysis, semantic tagging, and discourse-level annotation. The document contrasts the corpus linguistics approach, which focuses on descriptive adequacy based on empirical data, with the generative grammar approach, which prioritizes explanatory adequacy through abstract principles.
1) The document discusses using blogs and other structured web data to develop linguistic corpora for research. It argues that structured web data provides large amounts of naturally occurring language data in various genres and languages.
2) Examples are given of how blog data in particular is well-structured with metadata like authorship, dates, and semantics. This structured data can be extracted and analyzed to study linguistic patterns and variation across different authors, registers, and languages.
3) One research example analyzed the distribution of future tense expressions ("will" vs. "be going to") in three English language blogs and found patterns relating to subject type that confirm theoretical assumptions.
We present a framework that combines machine learnt classifiers and taxonomies of topics to enable a more conceptual analysis of a corpus than can be accomplished using Vector Space Models and Latent Dirichlet Allocation based topic models which represent documents purely
in terms of words. Given a corpus and a taxonomy of topics, we learn a classifier per topic and annotate each document with the topics covered by it. The distribution of topics in the corpus can then be visualized as a function of the attributes of the documents. We apply this framework to the US State of the Union and presidential election speeches to observe how topics such as jobs and employment have evolved from being relatively unimportant to being the most discussed topic. We show that our framework is better than Vector Space Models and an Latent Dirichlet Allocation based topic model for performing certain kinds of analysis.
The document discusses the field of computational linguistics. It defines computational linguistics as the scientific study of language from a computational perspective, involving linguists, computer scientists, and others. The history of computational linguistics is closely tied to the development of digital computers and early applications included machine translation. Computational linguistics research includes areas like speech recognition, natural language processing, and machine translation.
This document discusses computational linguistics, including its origins, main application areas, and approaches. Computational linguistics originated from efforts in the 1950s to use computers for machine translation between languages. It aims to develop natural language processing applications like machine translation, speech recognition, and grammar checking. Research employs various approaches including developmental, structural, production-focused, and comprehension-focused methods. The field involves both computer science and linguistics.
STRUCTURED AND QUANTITATIVE PROPERTIES OF ARABIC SMS-BASED CLASSIFIED ADS SUB...ijnlc
In this paper we will present our work in studying the sublanguage of Arabic SMS-based classified ads.
This study is presented from the developer's point of view. We will use the corpus collected from an
operational system, CATS. We also compare the SMS-based and the Web-based messages. We also discuss
some quantitative properties of the studied text.
AUTOMATIC DETECTION AND LANGUAGE IDENTIFICATION OF MULTILINGUAL DOCUMENTSIRJET Journal
The document describes a study that aims to handle the task of detecting offensive language in multilingual documents using machine learning models. The proposed framework consists of three phases: preprocessing text, representing text using BERT models, and classifying text into offensive and non-offensive classes. The study examines different strategies for handling multilingualism, such as creating one classification model for multiple languages or using translation to convert all texts to one language before classification. Experimental results on a bilingual dataset show that the translation-based approach using Arabic BERT achieves over 93% F1-score and 91% precision for offensive language detection in multilingual texts.
The document discusses the British National Corpus (BNC), a 100 million word collection of samples of written and spoken British English from the late 20th century. It provides details on the BNC, including that it consists of 90% written texts and 10% transcripts of speech. Famous corpora that preceded the BNC are also mentioned, such as the Brown Corpus, Lancaster-Oslo/Bergen Corpus, and London-Lund Corpus. The origins and properties of the BNC are outlined, including its editions, spoken and written components, treatment of abbreviations, and focus on modern British English.
This document discusses syntactic variation research and provides examples of syntactic doubling. It notes four major developments in the field: 1) increased focus on dialect data; 2) incorporation of sociolinguistic methodology; 3) improved data accessibility through technology; 4) the Minimalist hypothesis that syntactic variation arises through interaction between universal principles and extra-syntactic factors. Large dialect syntax databases now allow investigation of correlations and patterns across languages. Examples are given of syntactic doubling phenomena like Wh-doubling and negative concord in various dialects.
Introduction to automated text analyses in the Political SciencesChristianRauh2
This document provides an introduction and overview of automated text analysis methods for political science research. It discusses the promises and pitfalls of automated analysis and outlines some common text analysis approaches, including corpus construction, dictionary-based analysis, text scaling, and briefly touches on topic modeling and machine learning. The document uses debates around climate change at the United Nations as a running example to demonstrate how these various methods can be applied to a research question and corpus of documents. It emphasizes that automated analyses require validation and should augment rather than replace human interpretation of texts.
Synonymy is an important yet intricate linguistic feature in the field of lexical semantics. Using the 100 million-word British National Corpus (BNC) as data and the software Sketch Engine (SkE) as an analyzing tool, this paper explores the collocational behavior and semantic prosodies of near synonyms in virtue of, owing to, thanks to, as a result of, due to and because of. The results show that these near synonyms differ in their collocational behavior and semantic prosodies. The pedagogical implications of the findings are also discussed.
The British National Corpus (BNC) is a large text corpus of samples of written and spoken British English from the late 20th century. It was created between 1991 and 1994 by an academic consortium and contains over 100 million words. The BNC consists of both written texts that make up 90% of the corpus, as well as spoken texts that comprise the remaining 10%. It provides information about English usage and helps researchers study collocations and how words are used in different contexts.
This document summarizes key aspects of learner characteristics, styles, strategies and motivation from Schmitt's 2013 book on applied linguistics. It discusses how age, gender and language aptitude can impact learning. It describes different learning style preferences including sensory, cognitive and personality-related styles. It also outlines cognitive, metacognitive, affective and social learning strategies and how they relate to skill areas. Finally, it discusses motivation as a dynamic process and models of L2 motivation including integrative and instrumental orientation and the L2 Motivational Self System.
Applied linguistics has evolved over time from focusing primarily on teaching English as a foreign language to incorporating various subfields like second language acquisition, corpus linguistics, and critical applied linguistics. It now grapples with real-world issues and seeks to balance serving practical needs with intellectual inquiry, though there are criticisms of overly grand or narrowly practical approaches. The future of applied linguistics may require more interdisciplinary work and mediating various stakeholder interests.
This document discusses formal approaches to second language acquisition (SLA) with a focus on Universal Grammar (UG). It covers:
1) UG theory which posits that languages are constrained by innate, universal principles and parameters that vary across languages.
2) Evidence that SLA is constrained by UG principles like structure dependence and subjacency, suggesting the innate language faculty remains available.
3) Debate around the initial state in SLA, specifically whether the first language and UG are available, and how parameters can be reset, with studies examining properties of the pro-drop parameter.
The document discusses the relationship between semantics and pragmatics in language. It explores how pragmatic inferences contribute to meaning, including implicatures identified by Grice. While semantic meaning comes from words alone, pragmatic meaning requires context. Some implicatures, like those from words like "but" and "therefore", affect truth conditions. Other implicatures identified by Grice, like sequential uses of "and", are argued by Relevance Theory to be explicatures that determine truth value. Numerals are also discussed as having distinct semantic meanings of "at least" and "exactly". The document concludes that the boundary between semantics and pragmatics is blurred, with pragmatic inferences contributing to conventional and truth-conditional meaning.
This document provides an overview of two theories of meaning components: Jackendoff's theory of semantic primitives and conceptual structures, and Pustejovsky's Generative Lexicon theory. It discusses Jackendoff's use of conceptual primitives like EVENT and STATE to investigate the relationship between semantics and grammar. It also outlines Pustejovsky's four levels of semantic representation for lexical items, including event structure and qualia structure. For event structure, it explains Pustejovsky's classification of states, processes, and transitions, and how event structure is modified during semantic combination.
This document provides an overview of vowels and vowel-like articulations from Ladefoged's A Course in Phonetics. It discusses cardinal vowels as reference points for describing vowel quality, including their definition based on tongue height, backness, and lip rounding. It also covers secondary cardinal vowels, advanced tongue root, rhotacized vowels, nasalization, and semivowels. Finally, it discusses four types of secondary articulatory gestures that can be added to vowels: palatalization, velarization, pharyngealization, and labialization.
The document discusses three Yankunytjatjara words - pika, mirpan, and kuya - that can be used to translate the phrase "He got angry at me" into English. Pika refers to feelings of hostility and aggression. Mirpan describes an aggrieved or offended emotional state. Kuya indicates resentment and a lack of desire to oblige. Each word is explained through its roots, suffixes, and example usages to demonstrate how they differ in their nuances of expressing anger.
This document provides an overview of sociolinguistic concepts including social factors that correlate with language variation such as gender, age, audience, and social networks. It discusses methods for collecting and analyzing sociolinguistic data, including elicitation techniques. As an example, it summarizes a sociolinguistic study of /r/ variation in Middlesbrough, England which found evidence of dialect leveling and diffusion of new variants across age and gender groups. Finally, it outlines some applications of sociolinguistics, such as informing language education policy and training for film actors.
Forensic linguistics involves three overlapping areas: 1) investigative linguistics such as authorship analysis, 2) study of written legal language including readability and interpretation, and 3) communication in legal processes like interviews and courtrooms. Investigative linguistics analyzes disputed texts using both quantitative and qualitative methods to identify idiosyncratic linguistic features and determine authorship. The study of written legal language focuses on improving comprehension through plain language reforms. Communication in legal processes examines discourse in settings like police questioning and trials.
Cognitive semantics holds that language reflects human cognitive abilities and how people conceive of the world. It focuses on how language is acquired, contextual, and based on general cognitive resources. A key theory is Conceptual Metaphor Theory, which proposes that metaphors allow understanding one domain in terms of another through systematic mappings. Metaphors exhibit features like conventionality, asymmetry, systematicity, and abstraction. They also influence linguistic behaviors and the development of language over time. Similarly, metonymy connects concepts within a domain through associated features.
This document discusses semantic roles and valency. Semantic roles refer to the number of arguments a predicate takes, while valency refers to the number of arguments a predicate has. Predicates can have valency of one or two arguments. One-argument predicates include intransitive verbs and one-place adjectives. Two-argument predicates include transitive verbs and two-place adjectives or adverbs. The semantic roles of arguments depend on whether the predicate expresses an action, relationship, or other linking roles between the arguments. The meaning of predicates is determined partly by their valency and the semantic roles of their arguments.
This document discusses the concepts of linguistic competence and communicative competence in language learning. It defines linguistic competence as knowledge of a language's formal rules of grammar and phonology. Communicative competence goes beyond this to include social and cultural knowledge needed for effective communication. It identifies four components of communicative competence: possibility, feasibility, appropriateness, and attestedness. The notion of communicative competence has influenced applied linguistics, shifting approaches to teaching English and other languages from a focus solely on mechanics to developing broader communication abilities.
The document provides an overview of mental spaces theory and conceptual integration theory as proposed by Gilles Fauconnier. Some key points:
- Mental spaces are conceptual structures that describe how language users assign and manipulate references. Meaning arises from complex cognitive processes, not language itself.
- Connections between mental spaces include identification, metonymy, and belief contexts. Diagrams are used to illustrate relationships between spaces.
- Referential opacity and presupposition are explained in relation to belief contexts and how information is shared or blocked between spaces.
- Conceptual integration theory describes how analogies are created by combining elements from different input spaces into an integrated blended space. Principles of the theory are
The document discusses semantic roles and propositions in language. It explains that a sentence expresses a proposition, which can be expressed through different sentences. A proposition contains a predicate and arguments that play semantic roles, such as an agent that performs an action or an affected entity that undergoes it. Predicates have a valency that specifies how many arguments they can take and the roles of those arguments. Some predicates have a valency of zero and do not require any explicit arguments, as in sentences using an impersonal "it" as a placeholder subject.
This document discusses different approaches to understanding emotions from a semantic perspective. It describes the debate around whether emotions are essentially physical, physiological, or depend on cognitive processes. It also outlines work analyzing the semantic components of emotion words in Russian by Iordanskaja and cross-linguistically using semantic templates by Wierzbicka. Key components identified for emotion word definitions include the emotional state, its valence, and the reason or thoughts triggering the emotion.
This document discusses the history and evolution of different approaches to teaching English as a foreign language. It begins by explaining how applied linguistics and TEFL were initially considered the same field. It then outlines four main approaches chronologically: 1) Grammar translation focused on rules and vocabulary translation with little speaking practice. 2) The direct method banned first language use and focused on immersion. 3) The natural approach emphasized meaningful input with no error correction. 4) The communicative approach shifted focus to real-world tasks and communication over grammatical forms. Each brought benefits but also limitations for developing language skills.
This document discusses different types of grammar including prescriptive, descriptive, and pedagogical grammars. It addresses issues in describing grammar such as which rules to describe, varieties of language, and the relationship between form and function. The document also covers limitations of grammatical descriptions, including the interdependence of grammar and lexis. Finally, it discusses how grammar is learned and approaches to teaching grammar, such as input flooding, guided participation, and feedback.
This document provides an overview of key concepts in sociolinguistics. It discusses how sociolinguistics studies language variation and change in relation to social factors. Some key points covered include:
- Sociolinguistics examines how social factors like region, age, and gender correlate with linguistic differences.
- Languages have standardized and non-standard varieties, and sociolinguists look at issues of prestige and stigmatization.
- Researchers describe language variation through concepts like idiolects, sociolects, and linguistic variables.
- Phonological, grammatical, and lexical variation are all studied using descriptive tools from the different levels of language.
Survey research designs are procedures used in quantitative research where investigators administer surveys to describe populations. There are two main types of survey designs - cross-sectional and longitudinal. Cross-sectional designs collect data at one point in time to measure current attitudes or practices, while longitudinal designs collect data about trends over time within the same population. Key characteristics of survey research include sampling from a population, collecting data through questionnaires or interviews, designing instruments, and obtaining a high response rate.
This document discusses speaking and pronunciation from a discourse perspective. It addresses key questions about teaching these areas:
- Both sentences and texts have value, but a discourse focus helps students notice authentic language use and better prepares them for real communication.
- Classroom activities can raise awareness of discourse features like genre, exchange structure, and conversational moves to sensitize students.
- A variety of authentic and semi-authentic spoken materials on a continuum from sentences to natural speech can be used based on availability.
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...indexPub
The recent surge in pro-Palestine student activism has prompted significant responses from universities, ranging from negotiations and divestment commitments to increased transparency about investments in companies supporting the war on Gaza. This activism has led to the cessation of student encampments but also highlighted the substantial sacrifices made by students, including academic disruptions and personal risks. The primary drivers of these protests are poor university administration, lack of transparency, and inadequate communication between officials and students. This study examines the profound emotional, psychological, and professional impacts on students engaged in pro-Palestine protests, focusing on Generation Z's (Gen-Z) activism dynamics. This paper explores the significant sacrifices made by these students and even the professors supporting the pro-Palestine movement, with a focus on recent global movements. Through an in-depth analysis of printed and electronic media, the study examines the impacts of these sacrifices on the academic and personal lives of those involved. The paper highlights examples from various universities, demonstrating student activism's long-term and short-term effects, including disciplinary actions, social backlash, and career implications. The researchers also explore the broader implications of student sacrifices. The findings reveal that these sacrifices are driven by a profound commitment to justice and human rights, and are influenced by the increasing availability of information, peer interactions, and personal convictions. The study also discusses the broader implications of this activism, comparing it to historical precedents and assessing its potential to influence policy and public opinion. The emotional and psychological toll on student activists is significant, but their sense of purpose and community support mitigates some of these challenges. However, the researchers call for acknowledging the broader Impact of these sacrifices on the future global movement of FreePalestine.
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapitolTechU
Slides from a Capitol Technology University webinar held June 20, 2024. The webinar featured Dr. Donovan Wright, presenting on the Department of Defense Digital Transformation.
Creative Restart 2024: Mike Martin - Finding a way around “no”Taste
Ideas that are good for business and good for the world that we live in, are what I’m passionate about.
Some ideas take a year to make, some take 8 years. I want to share two projects that best illustrate this and why it is never good to stop at “no”.
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxOH TEIK BIN
(A Free eBook comprising 3 Sets of Presentation of a selection of Puzzles, Brain Teasers and Thinking Problems to exercise both the mind and the Right and Left Brain. To help keep the mind and brain fit and healthy. Good for both the young and old alike.
Answers are given for all the puzzles and problems.)
With Metta,
Bro. Oh Teik Bin 🙏🤓🤔🥰
How to Setup Default Value for a Field in Odoo 17Celine George
In Odoo, we can set a default value for a field during the creation of a record for a model. We have many methods in odoo for setting a default value to the field.
2. TABLE OF
CONTENTS
03
What is corpus
linguistics?
01
02
04
05
Corpus design and
compilation
What can a corpus
tell us?
Overview of different
types of corpus
studies
How can corpora
inform language
teaching?
(Schmitt,2020)
02
4. ‘Corpus linguistics’ has enjoyed much greater popularity, both as a means to explore actual
patterns of language use and as a tool for developing materials for classroom language instruction.
Corpus linguistics uses large collections of both spoken and written natural texts that are stored on
computers.
One of the major contributions of corpus linguistics is in the area of exploring patterns of language
use.
Corpus linguistics and the term ‘corpus’ in its present-day are synonymous with computerized
corpora and methods, but they were not before.
What is corpus linguistics?
(Schmitt,2020)
04
5. An empirical approach to linguistic analysis is based on naturally occurring spoken or written data.
Advances in technology have led to a number of advantages for corpus linguists, including the
collection of larger language samples and the ability for faster and more efficient text processing .
Characteristic of corpus-based analyses of language:
o It is empirical, analysing the actual patterns of use in natural texts.
o It utilizes a large and principled collection of natural texts.
o It makes extensive use of computers for analysis, using both automatic and interactive techniques.
o It depends on both quantitative and qualitative analytical techniques.
What is corpus linguistics?
(Schmitt,2020)
05
6. A corpus refers to a large principled collection of natural texts.
The use of natural texts means that language has been collected from naturally occurring sources.
Examples of well-known corpora:
o The British National Corpus (BNC)
o The Corpus of Contemporary American English (COCA)
o The Brown Corpus
The text collection process for building a corpus needs to be principled to ensure
representativeness and balance.
What is corpus linguistics?
(Schmitt,2020)
06
7. The linguistic features or research questions being investigated will shape the collection of texts
used in creating the corpus.
Although computers make possible a wide range of sophisticated statistical techniques, human
analysts are still needed to decide what information is worth searching for, to extract that
information from the corpus and to interpret the findings.
Corpus linguistics bring together aspects of quantitative and qualitative technique.
The quantitative analyses provide an accurate view of more macro-level characteristics, whereas
the qualitative analyses provide the complementary micro-level perspective.
What is corpus linguistics?
(Schmitt,2020)
07
9. Although there is no minimum size for a text collection to be considered a corpus, an early
standard size set by the creators of the Brown Corpus was one million words.
A number of well-known specialized corpora are much smaller than that, but there is a
general assumption that for most tasks within corpus linguistics, larger corpora are more
valuable.
Modern corpora are available to other researchers and free of charge.
They enable researchers all over the world to access the same sets of data, which
encourages a higher degree of accountability in data analysis and permits collaborative
studies by different researchers.
Corpus design and compilation
(Schmitt,2020)
09
11. A. General corpora
o BNC contains 100 million words and the COCA had 560 million words.
o Brown and LOB, at a mere one million words.
It designed to be balanced and include language samples from a wide range of registers or
genres.
Most of the early general corpora were limited to written language, because written texts are
vastly easier and cheaper to compile than transcripts of speech.
A few corpora dedicated to spoken discourse.
o The Cambridge and Nottingham Corpus of Discourse in English (CANCODE).
Types of corpora
(Schmitt,2020)
11
12. B. Specialized corpora
They designed with more specific research goals in mind and they considered the most crucial
‘growth area’ for corpus linguistics.
Specialized corpora may include both spoken and written components.
o International Corpus of English (ICE)
o The TOEFL-2000 Spoken and Written Academic Language Corpus
A specialized corpus focuses on a particular spoken or written variety of language.
• Historical corpora such as the Archer Corpus (two million words of British and American English dating from
1650 to 1990).
• ‘Learner’s corpus’ (spoken or written language samples produced by non-native speakers).
Types of corpora
(Schmitt,2020)
12
14. One of the most important factors in corpus linguistics is the design of the corpus.
This design of the corpus impacts all of the analysis and results.
The composition of the corpus should reflect the anticipated research goals.
For example:
o Comparing patterns of language found in spoken and written discourse.
• The corpus has to include a range of possible spoken and written texts.
• The information derived from the corpus accurately reflects the variation possible in the
patterns being compared across the two registers.
Issues in corpus design
(Schmitt,2020)
14
15. A well-designed corpus should aim to be representative of the types of language included in it.
There are many different ways to conceive of and justify representativeness:
a. A representative of different registers (fiction, casual conversation) and topics. (national vs local news).
b. A representativeness involves the demographics of the speakers or writers (nationality, gender,
education level).
c. A representative based on production or reception (e-mail messages, newspapers).
All these issues must be weighed when deciding how much of each category to include.
In thinking about the research goals of a corpus, compilers must bear in mind the intended
distribution of the corpus.
Issues in corpus design
(Schmitt,2020)
15
17. When creating a corpus, data collection involves obtaining or creating electronic versions of the
target texts, and storing and organizing them.
Data collection for a written corpus means using a scanner and optical character recognition
(OCR) software to scan paper documents into electronic text files.
OCR is not error-free and manual proofreading and error-correction is necessary.
The data collection of spoken corpus is long and expensive.
• A transcription system (an orthographic transcription system)
• The representative of interactional characteristics of the speech in the transcripts.
An important issue for both spoken and written corpus during data collection is obtaining
permission to use the data for the corpus.
Corpus compilation
(Schmitt,2020)
17
19. A simple corpus could consist of raw text, with no additional information provided about the
origins, authors, speakers, structure or contents of the texts themselves.
Encoding some of the information in the form of markup makes the corpus more useful.
Structural markup refers to the use of codes in the texts to identify structural features of the
text.
o A written corpus (titles, authors, chapters)
o A spoken corpus (speakers, paralinguistic features)
Many corpora provide information about the contents and creation of each text in what is called
a header.
Headers include classifications of the text into categories, such as register, genre, topic domain.
Markup and annotation
(Schmitt,2020)
19
20. Some corpora are also encoded with certain types of linguistic annotation.
There are different kinds of linguistic processing or annotation:
A. Part-of-speech tagging which involves assigning a grammatical category tag to each word in the
corpus.
o ‘A goat can eat shoes’ A (indefinite article) goat (noun, singular) can (modal) eat (main verb) shoes (noun,
plural).
B. prosodic and phonetic annotation, which are not uncommon.
C. Syntactic parsing, which is much less common.
A tagged corpus allows researchers to answer different types of questions, explore the
frequency of lexical items, grammatical structures, and addresses the problem of words that
have multiple meanings or functions.
Markup and annotation
(Schmitt,2020)
20
21. What can a corpus
tell us?
03
(Schmitt,2020)
21
23. There are many levels of information that can be gathered from a corpus.
These levels range from simple word lists to complex grammatical structures and interactive
analyses.
Analyses can explore individual lexical or linguistic features or identify clusters.
The tools that are used for these analyses range from basic to complex computer programs.
The most basic information that we can get from a corpus, is frequency of occurrence
information.
o MonoConc, WordSmith Tools, and Antconc
A word list is a list of all the words that occur in the corpus that arranged in alphabetic or
frequency order.
Word counts and basic corpus tools
(Schmitt,2020)
23
26. Concordancing packages can provide additional information about lexical co-occurrence
patterns.
Once the search word is selected, the program can search the texts in the corpus and provide a
list of each occurrence of the target word in context this is called ‘key word in context’ (KWIC).
A concordance program can also provide information about words that tend to occur together in
the corpus in what is called ‘collocates’, and the resulting sets of words are called
‘collocations’.
An analysis of collocations provides important information about grammatical and semantic
patterns of use for individual lexical items.
The corpus analysis can discover patterns of use that were unnoticed before.
For example, synonymous verbs begin and start have the same grammatical potential.
Word counts and basic corpus tools
(Schmitt,2020)
26
29. In order to carry out more sophisticated types of corpus analyses, it is often necessary to have
a tagged corpus.
when a corpus is tagged, each word in the corpus is given a grammatical label.
The process of assigning grammatical labels to words is complex.
For example:‘ can’ falls into two grammatical categories.
o It can be a modal ‘I can reach the book’.
o It can be used as a noun ‘Put the paper in the can’.
Computers can accurately identify the grammatical labels for many words.
There are certain features that remain elusive, and here the program will bring the problematic
to the screen for the user to select the correct classification.
Once texts have been tagged it provide a fuller picture of the texts in a register.
Working with tagged texts
(Schmitt,2020)
29
31. Over the years, corpora have been used to address a number of interesting issues such as the
question of language change.
The area of historical linguistics which look in how language has changed over the centuries.
Scholars have also look into to language development, in first and second language situations.
Corpora have also been used to explore similarities or differences across different national or
regional varieties of English (Australian English, American English, Indian English).
There also studies explore the differences between spoken and written language.
Before corpus linguistics it was difficult to note patterns of use, since observing and tracking
use patterns was a huge task.
Overview of different types of corpus
studies
(Schmitt,2020)
31
33. There is impact of corpus linguistic studies on classroom language teaching practices.
Corpus-based studies of particular language features such as The Longman Grammar of Spoken and
Written English will serve language teachers by providing a basis for deciding which language
features and structures are important.
Teachers and materials writers can have a basis for selecting the material that is being
presented.
Rather than basing pedagogical decisions on intuitions, these decisions can now be grounded
on actual patterns of language use in various situations.
How can corpora inform language teaching?
(Schmitt,2020)
33
35. Corpus-based information can be brought to bear on language teaching in two ways:
1. Teachers can shape instruction based on corpus-based information.
• They can consult corpus studies to gain information about the features that they are teaching.
• For example:
o ‘Conversational English’ teachers could read corpus investigations on spoken language
to determine which features and grammatical structures are characteristic of conversational
English.
Bringing corpora into the language classroom
(Schmitt,2020)
35
36. 2. Learners interact with corpora.
This can take place in one of two ways:
A. If computer facilities are adequate learners can be actively involved in exploring corpora.
B. If adequate facilities do not exist teachers can bring the results from corpus searches for
use in the classroom.
The use of concordancing tasks in the classroom is a matter of some controversy.
• It strongly advocated by those who favour an inductive or data-driven approach to learning.
• It criticized by others who argue that it is difficult to guide students appropriately in the
analysis of vast numbers of linguistic examples.
Bringing corpora into the language classroom
(Schmitt,2020)
36
38. The creation of appropriate, corpus-based teaching materials takes time, careful planning and
access to a few basic tools and resources.
The activities will require access to a computer, texts and to a concordancing package.
Several vocabulary activities can be generated through simple frequency lists and concordance
output.
The vocabulary frequency list can be used to identify vocabulary words that need to be taught.
Frequency lists can also be a starting point for students to group words by grammatical
category (verb, nouns, etc.) or semantic categories.
Examples of corpus-based classroom activities
(Schmitt,2020)
38
39. Concordances of target words can be used to better understand those words’ meanings and
usage.
The use of a word and its patterning characteristics also contribute to its meaning senses.
For example, words often are seen as synonymous when actually, their use is not synonymous.
Dictionaries often list the ‘resulting copulas’ become, turn, go and come as synonyms, with
meanings like ‘to become’, ‘to get to be’, ‘to result’, ‘to turn out’.
Most dictionaries provide no clues to how these four words might differ in meaning.
Corpus research shows that these words differ dramatically in their typical contexts of use.
Examples of corpus-based classroom activities
(Schmitt,2020)
39
40. o ‘turn’ change of colour or physical appearance.
(The water turned grey)
o ‘go’ describes a change to a negative state.
(go crazy, go bad, go wrong)
o ‘come’ describe a change to a more active state.
(come awake, come alive)
If corpus activities coupled with dictionary activities, they can provide a much richer language-
learning environment for student.
The patterns of language use that can be discovered through corpus linguistics will continue to
reshape the way we think of language.
Examples of corpus-based classroom activities
(Schmitt,2020)
40
41. Schmitt, N. (2020). An introduction to applied linguistics. Routledge.
RESOURCES
41