Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Role of language engineering to preserve endangered languages
1. Role of Language Engineering to Preserve Endangered Languages
Amit Kumar Jha1, Sumit Kumar Gupta2, Piyush Pratap Singh3
Centre for Information and Language Engineering
Mahatma Gandhi Antarrashtriya Hindi Vishwvidyalaya , Wardha (MS)
Abstract:
Endangered language (EL) is the language community incorporates less number of
speakers of that particular language. EL is likely to become extinct in the near future. Many
languages are failing out of use and being substituted by others is more widely used in the
region or nation. Language Engineering (LE) is the subfield of computer science which
explores the field of language related software and its feasible hardware development. With
the help of language engineering, man and machine interface can be designed to preserve for
longer time.
The paper also states about how the applications of language engineering contributes
the significant role to preserve Endangered Language (EL). Documentation is the primary
task to preserve in appropriate shape to move ahead with time frame. This paper also
described that how the LE makes easy to the process of language documentation. Being
assisted of Digital data, we preserve EL due to durability of digital data more than others type
of data.
There are some languages whose literature is linguistically very rich but the number
of speaker is countable, such as Nihali, language which also count in future as EL. Therefore
for preserving these types of EL, the documentation and Digitalization of the EL is quite
crucial, which proliferate the ease of access of EL to major class of people.
Keywords: Language Engineering, Endangered Language, Language Documentation, Man
machine interaction, Speech technology.
Introduction:
Language Engineering (LE) is the subfield of computer science which explores the field of
language related software and its feasible hardware development.
Language Engineering
Computer Science Engineering
Software Engineering Hardware Engineering
Language RelatedSoftware Language RelatedHardware
2. In other words, Language Engineering is the application of knowledge of
language to the development of computer systems which can recognise, understand, interpret,
and generate human language in all its forms.
In practice, Language Engineering comprises a set of techniques and language
resources. The former are implemented in computer software and the latter are a repository of
knowledge which can be accessed by computer software.
The ultimate goal of LE is to develop a machine which is able to understand and
generate natural language. This language may be endangered. Two types of processing is
conduct in LE – Text Processing and Speech Processing.
As part of an important effort to document endangered languages before they
become extinct. A variety of endangered language repositories have emerged to provided
shared locations for field linguists to store data. These repositories vary greatly in the way
their collections are organized and in the metadata they collect from depositors. The
availability of collections of low resource language data has the potential for some interesting
tasks in Natural Language and Speech Processing. However, the variety of formats in these
repositories makes it difficult to know how much of the available data is suitable for such a
task.
The world is experiencing an unprecedented wave of language extinctions. There
are between 6,000 and 7,000 languages currently spoken, and between 50 to 90 per cent of
those will be extinct by the year 2100. Language extinction results in loss of cultural
identities, knowledge systems, and the variety of data needed to understand the structure of
language in the mind. Documenting endangered languages preserves data and stimulates
language maintenance and revitalisation.
If a person knows more than two languages his thinking and reasoning capacity is
more. The loss of speakers in one language is the gain of speakers of another language,
except for cases of genocide. Languages are generally replaced when an entire speech
community shifts to another language. Replacing languages are very often official state
languages. A language becomes endangered when the language does not transfer to the next
generation. There are some languages which has a number of speakers of old age but that
language didn’t transferred to the new or next generation then that language become
endangered.
As it is known to all that Indian society is the society of multi-language. So being
a speaker of the language every person should transfer their own language to the next
generation.
Application of Language Engineering:
The applications of Language Engineering are divided in two groups – Text Processing
application and Speech Processing application.
The applications of language engineering are as follow-
➢ Speech Generation– With the help of language engineering we can generate the
speech of Endangered Language by a machine. If a machine will be able to generate
EL then we can preserve that Language. Speech Generation is the application area of
LE, Which is used to generate the speech of any natural language.
➢ Language Translator– Language translator is the application of LE. Language
translator or Machine translator is a machine which is able to translate one language
3. to another language. The first language is called source language and the second
language is called the target language. If the Source language or the target language is
EL, EL is preventing by this Language Translator system.. If the Endangered
language translator is developed then that language may be used for a long time.
➢ Speech-to-Text– It is the process of converting speech to text. This is the task of
documentation. If we convert speech file to text file of EL then we preserve that
language.
➢ Text-to-Speech – Text-to-speech system is the system in which text data is input and
it return speech data as output. It plays important role in Man-Machine interaction.
➢ Language Teaching- Language Teaching is the process of teaching a language. With
the help of LE we can create a system for teaching a language. If EL teaching system
is created EL may be preserve. As it is known that there are some language which has
the speakers of old age and this language doesn’t transfer to the next generation. After
some that language becomes dead. To preserve this language this system is important.
➢ Transcription Tool -Transcription is the process in which one script to another
script. A person which is unknown to a specific language, its script and pronunciation,
the role of Transcription tool is important in this context. If Transcription tool for an
EL will be developed then we increase the number of people to understand that
language.
➢ Text or Document summarization–It is one of the earliest applications of discourse
structure analysis. it is used to summarize the text or Document if we summarize a
text than it is easy to read and understand.
➢ Information Extraction - The task of information extraction is to extract from text-
named entities13 relations that hold between them, and event structures in which they
play a role. IE systems focus on specific domains (e.g., terrorist incidents) or specific
types of relations (e.g., people and their dates of birth, protein–protein interactions).
Event structures are often described by templates in IE, where the named entities to be
extracted fill in specific slots.
➢ Speaker Identification and Verification Speech Recognition - Speaker
Identification and verification means to identify and verify that what is speaking. This
is an important application in forensic science.
➢ Speech Recognition – Speech Recognition is the application of language
engineering. It identifies what is to speak.
➢ Character Image Recognition – Character Image Recognition, recognize the
character in an image file. This is the application area of LE. It is used to captcha
recognition.
➢ Segmentation – Segmentation is a task to segment the natural language to its
consistent part i.e. we can segment the paragraph of natural language into sentences,
phrases, words and syllables.
➢ Question-Answering System – Question-Answering system is an application of LE.
If we asked a question to this system than this system return an appropriate answer of
this question. We can asked the question in natural language. If we have the facility to
asked the question in EL, then the speaker of EL feel comfort to asked the question.
4. ➢ Word sense Disambiguation – As it is known that there are some words in all
natural languages which has more meaning, it is called ambiguity. To resolve this
problem is called Disambiguation. LE try to develop a system which is able to
disambiguate the sense of the words, this system is called word sense disambiguation.
To preserve EL the role of language engineering is very important. Language
documentation is the main process to preserve EL. Language Documentation is the process in
which the speech and text corpus of that language is collected. For collecting speech corpus
of endangered language the researcher has gone to the field where the speakers of endangered
language live. They record the sound in natural environment. After recording the sound file
they analyze that sound files.
If a Linguist researchers want to document one of the larger languages such as
English, Chinese, Hindi etc., they can rely on already existing data and quite easily find
samples of written and spoken language from which they could build up their documentation:
books, newspapers and other written documents from the past and the present, many of these
already digitalized, television and radio shows that can be recorded or simply downloaded
from the Internet, language used in Internet forums and other social media, and many more.
Because computers, the Internet and recording devices are widely available, the amount of
such data and its accessibility is growing rapidly. For endangered languages, the situation is
often completely different. Many of these languages do not have a written tradition and
written data may be completely unavailable or sparse, the languages are not used in the
media, or their speakers do not use the Internet (and if they do, they often use another
language). In such cases, linguists must start from scratch and collect as much data as
possible by recording speakers of a given language. Ideally, language documentation contains
representative samples from different speakers – representing different age groups, different
professions, of both sexes, and different origins –, but in the case of endangered languages
this may not be possible, because the number of speakers is too small and/or there are only
elder speakers.An important issue apart from the number of speakers and amount of data
concerns the communication between the linguists or other researchers who want to
document a language, and the language community. In the case of endangered or minority
languages, the documenters often are outsiders, not members of the community. They may
not be fluent speakers of the language in question and can communicate with the speakers in
a second or a third language. This often leads to an unnatural use of the language that is to be
documented.
To preserve an endangered language, digitalization of EL is necessary like
language documentation. Digitalization is the process in which data is the store in the form of
digital. The durability of digital data is more than others types of data. To preserve EL by
Digitalization we convert and store data in digital form i.e. text, sound, image etc. The
researchers should create study material of EL in digital form.
Systematic study of a system is called Engineering. Engineering is the process
through which any task become easy and efficient. So we engineered the language.
5. Reference List :
1. B. WEBBER, M. EGG and V. KORDONI (2012). Discourse structure and language
technology. Natural Language Engineering
2. Jurafsky, Martin (et.al. ) Speech and Language Processing.Prentice Hall, Englewood
Cliffs, New Jersey 07632
3. Reiter, E. and Dale, R. (2000). Building Natural Language Generation
Systems.Cambridge University Press, Cambridge.
4. Yarowsky, D. (1996). Homograph disambiguation in text-to-speech synthesis.
InProgress in Speech Synthesis, pp. 159–175. Springer-Verlag, Berlin.
5. Small, S. L. and Rieger, C. (1982). Parsing and comprehending withWord Experts.In
Lehnert,W. G. and Ringle, M. H. (Eds.), Strategies for Natural Language
Processing,pp. 89–147. Lawrence Erlbaum, New Jersey.
6. www.sppel.org