Technological Approaches to
Linguistic Documentation
and
Metadocumentation
Pankaj Dwivedi
Gulab Chand
Somdev Kar
Indian Institute of Technology Ropar
Rupnagar, Punjab 140001
India
27 March 2014 1
Language Documentation
Principles and methods used for the
recording and analysis of primary
language and cultural materials, and
metadata about them.
Unlike before, with the revolution in the
area of information technologies, it is now
possible to maintain organized and long-
lasting linguistic and cultural records.
27 March 2014 2
Why documenting languages is
IMPORTANT?
Half of the world’s language may no
longer to continue to exist after a few
more generations as they are not being
learnt by children as first languages
(Austin & Sallabank, 2011).
Crystal (2002) claims that the rate of
language disappearance is as high as two
languages each month.
27 March 2014 3
How ?
 Creating Dictionaries
 Preparing Language Teaching Materials
 Archiving
 Language Corpora (Written & Spoken)
27 March 2014 4
What is needed?
Lot of language data and latest technology
Language data: Text, Audio and Video
Technology: software and tools which can
handle the language data and platforms
wherein these data can be effectively made
use of.
27 March 2014 5
What do we need?
 Language data ( No Problem)
 Platforms (will see later on)
 Latest TOOLS and SOFTWARE for:
1. Recording and Capturing
2. Analysis
3. Archiving
4. Mobilization
27 March 2014 6
ONE MOMENT!!!
Is ‘Latest’ the best?
or
Old is gold?
CHOOSE CAREFULLY !!!
27 March 2014 7
Is ‘TECHNOLOGY’ adoption
always good?
 Languages may live on without orthography.
But no language will be able to function as
administrative language in a modern society
without a developed language technology
(Trosterud, 2006).
 Technology changes quickly and an uncritical
adoption of new tools and technologies might
compromise with long-term
sustainability, portability, usability and
compatibility with other platforms (Bird &
Simons, 2003). 27 March 2014 8
Striking a balance
 Portability: operating
systems, formats, software, encodings
 Sustainability: long-term preservation
and usefulness
 Maintenance and Distribution:
finances, space, tools and reach
 Access and protocols: paid or free, open
or closed, research or business, full or
partial
27 March 2014 9
Capturing Audio Media
27 March 2014 10
Why or Why not WAV?
27 March 2014 11
Capturing Video Media
27 March 2014 12
 CODECS
CONTAINERS
27 March 2014 13
Capturing Digital Text
 Character Encoding:
Unicode, ASCII, Windows/ANSI, Bi
g5, Latin 5 etc.
 Data Encoding:
XML, SGML, MSWord etc.
 File Encoding: plain-
text, PDF, MSWord etc.
27 March 2014 14
Digital text: An overview
27 March 2014 15
Analysis tools
 Transcription
 Annotation
 Translation
 Metadata Management
27 March 2014 16
Popular Tools
27 March 2014 17
Metadata Management
 Cataloguing: title, speakers, collectors, time
and place, language name etc.
 Descriptive: information about
content, relationship to other content etc.
 Structural: structures and patterns
 Technical: description of
formats, encoding, required tools and software
 Administrative: work log, access protocol etc.
(Nathan &Austin, 2004)
27 March 2014 18
Platforms
1. Online Language Archives:
Examples:OLAC, ANLA, ELAR, CLA, The
Language Archive, PARADISEC etc.
2. Social Media:
Facebook, Twitter, Blogs, etc.
Examples: ‘Indigenous Tweets’ and
‘Facebook in your language’ by Prof. Kevin
Scannell
27 March 2014 19
Conclusion
In the generation when the rate of language
death is at its peak, if we choose to use
moribund technologies to create and preserve
language data, when technologies die, unique
heritage is also lost or encrypted (Bird &
Simons, 2003).
We must keep in mind:
Purpose, Presentation, Portability
and
Preservation
27 March 2014 20
References
 Austin, P., & Sallabank, J. (Eds.) (2011). The
Cambridge handbook of endangered languages.
Cambridge University Press
 Bird, S., & Simons, G. (2003). Seven dimensions of
portability for language documentation and
description. Language, 79(3), pp. 557-582
 Crystal, D. (2002). Language death. Cambridge
University Press.
 Nathan, D., & Austin, P. (2004). Reconceiving
metadata: language documentation through thick and
thin. Language documentation and
description, 2, 179-187.
27 March 2014 21
 Trosterud, T. (2006). Grammatically based
language technology for minority languages.
TRENDS IN LINGUISTICS STUDIES AND
MONOGRAPHS, 175, 293.
27 March 2014 22
Thank You!
Questions and Feedback.
27 March 2014 23

Technological approaches to linguistic documentation and meta-documentation

  • 1.
    Technological Approaches to LinguisticDocumentation and Metadocumentation Pankaj Dwivedi Gulab Chand Somdev Kar Indian Institute of Technology Ropar Rupnagar, Punjab 140001 India 27 March 2014 1
  • 2.
    Language Documentation Principles andmethods used for the recording and analysis of primary language and cultural materials, and metadata about them. Unlike before, with the revolution in the area of information technologies, it is now possible to maintain organized and long- lasting linguistic and cultural records. 27 March 2014 2
  • 3.
    Why documenting languagesis IMPORTANT? Half of the world’s language may no longer to continue to exist after a few more generations as they are not being learnt by children as first languages (Austin & Sallabank, 2011). Crystal (2002) claims that the rate of language disappearance is as high as two languages each month. 27 March 2014 3
  • 4.
    How ?  CreatingDictionaries  Preparing Language Teaching Materials  Archiving  Language Corpora (Written & Spoken) 27 March 2014 4
  • 5.
    What is needed? Lotof language data and latest technology Language data: Text, Audio and Video Technology: software and tools which can handle the language data and platforms wherein these data can be effectively made use of. 27 March 2014 5
  • 6.
    What do weneed?  Language data ( No Problem)  Platforms (will see later on)  Latest TOOLS and SOFTWARE for: 1. Recording and Capturing 2. Analysis 3. Archiving 4. Mobilization 27 March 2014 6
  • 7.
    ONE MOMENT!!! Is ‘Latest’the best? or Old is gold? CHOOSE CAREFULLY !!! 27 March 2014 7
  • 8.
    Is ‘TECHNOLOGY’ adoption alwaysgood?  Languages may live on without orthography. But no language will be able to function as administrative language in a modern society without a developed language technology (Trosterud, 2006).  Technology changes quickly and an uncritical adoption of new tools and technologies might compromise with long-term sustainability, portability, usability and compatibility with other platforms (Bird & Simons, 2003). 27 March 2014 8
  • 9.
    Striking a balance Portability: operating systems, formats, software, encodings  Sustainability: long-term preservation and usefulness  Maintenance and Distribution: finances, space, tools and reach  Access and protocols: paid or free, open or closed, research or business, full or partial 27 March 2014 9
  • 10.
  • 11.
    Why or Whynot WAV? 27 March 2014 11
  • 12.
    Capturing Video Media 27March 2014 12  CODECS
  • 13.
  • 14.
    Capturing Digital Text Character Encoding: Unicode, ASCII, Windows/ANSI, Bi g5, Latin 5 etc.  Data Encoding: XML, SGML, MSWord etc.  File Encoding: plain- text, PDF, MSWord etc. 27 March 2014 14
  • 15.
    Digital text: Anoverview 27 March 2014 15
  • 16.
    Analysis tools  Transcription Annotation  Translation  Metadata Management 27 March 2014 16
  • 17.
  • 18.
    Metadata Management  Cataloguing:title, speakers, collectors, time and place, language name etc.  Descriptive: information about content, relationship to other content etc.  Structural: structures and patterns  Technical: description of formats, encoding, required tools and software  Administrative: work log, access protocol etc. (Nathan &Austin, 2004) 27 March 2014 18
  • 19.
    Platforms 1. Online LanguageArchives: Examples:OLAC, ANLA, ELAR, CLA, The Language Archive, PARADISEC etc. 2. Social Media: Facebook, Twitter, Blogs, etc. Examples: ‘Indigenous Tweets’ and ‘Facebook in your language’ by Prof. Kevin Scannell 27 March 2014 19
  • 20.
    Conclusion In the generationwhen the rate of language death is at its peak, if we choose to use moribund technologies to create and preserve language data, when technologies die, unique heritage is also lost or encrypted (Bird & Simons, 2003). We must keep in mind: Purpose, Presentation, Portability and Preservation 27 March 2014 20
  • 21.
    References  Austin, P.,& Sallabank, J. (Eds.) (2011). The Cambridge handbook of endangered languages. Cambridge University Press  Bird, S., & Simons, G. (2003). Seven dimensions of portability for language documentation and description. Language, 79(3), pp. 557-582  Crystal, D. (2002). Language death. Cambridge University Press.  Nathan, D., & Austin, P. (2004). Reconceiving metadata: language documentation through thick and thin. Language documentation and description, 2, 179-187. 27 March 2014 21
  • 22.
     Trosterud, T.(2006). Grammatically based language technology for minority languages. TRENDS IN LINGUISTICS STUDIES AND MONOGRAPHS, 175, 293. 27 March 2014 22
  • 23.
    Thank You! Questions andFeedback. 27 March 2014 23