Presentation at 36th LAUD Symposium, 3rd April 2014

Language Documentation 20 years on

  1. 1. 1 Language documentation 20 years on Peter K. Austin Department of Linguistics SOAS, University of London 36th International LAUD Symposium, Landau 3rd April 2014
  2. 2. 2 © 2014 Peter K. Austin Creative commons licence Attribution-NonCommercial-NoDerivs CC BY-NC-ND
  3. 3. 3 Outline • Language documentation in 1995 and today • Identifying developments and trends • Some current challenges • Documentation practice • Archiving • The output gap • Conclusions
  4. 4. 4 Note Today’s presentation is an attempt at a critical analysis of experiences across the world over the past 20 years, not to criticise or blame anyone, but in order to seek to understand developments and possible directions for the future. The analysis builds on work with colleagues at SOAS and elsewhere but I alone am to blame for any errors or shortcomings.
  5. 5. 5 Language documentation • “concerned with the methods, tools, and theoretical underpinnings for compiling a representative and lasting multipurpose record of a natural language or one of its varieties” (Himmelmann 1998) • has developed over the last 20 years in response to the urgent need to make an enduring record of the world‟s many endangered languages and to support speakers of these languages in their desire to maintain them, fuelled also by developments in information, media and communication technologies • concerned with roles of language speakers and their rights and needs
  6. 6. 6 What documentary linguistics is not • it's not about collecting stuff to preserve it without analysing it • it's not = description + technology • it's not necessarily about endangered languages per se • it's not a fad
  7. 7. 7 Indicators that Lang Doc has „arrived‟ Graduate student interest • 140 students graduated from SOAS MA in Language Documentation and Description 2004- 14 – currently 27 are enrolled • 10 graduates in PhD in Field Linguistics – 20 currently enrolled • other documentation programmes, eg. UTAustin have similar experience
  8. 8. 8 Publications: books and journals • Gippert et al 2006 Essentials of Language Documentation. Mouton • Tsunoda 2006 Language endangerment and language revitalization: an introduction • Language Documentation and Description – 11 issues (2,000+ copies sold), 2 in prep • Language Documentation and Conservation – 6 issues (on-line only) • Cambridge Handbook of Endangered Languages 2011 • Routledge Essential Readings 2011 • Oxford Bibliography Online 2012
  9. 9. 9 Big money – DoBeS projects
  10. 10. 10 ELAR deposits
  11. 11. 11 Main features (Himmelmann 2006:15) • Primary data – collection and analysis of an array of primary language data to be made available for a wide range of users; • Accountability – access to primary data and representations of it makes evaluation of linguistic analyses possible and expected; • Long-term storage and preservation of primary data – includes a focus on archiving in order to ensure that documentary materials are made available to potential users now and into the distant future;
  12. 12. 12 Main features (cont.) • Interdisciplinary teams – documentation requires input and expertise from a range of disciplines and is not restricted to mainstream (“core”) linguistics alone • Cooperation with and direct involvement of the speech community – active and collaborative work with community members both as producers of language materials and as co-researchers • Outcome is annotated and translated corpus of archived representative materials on a language
  13. 13. 13 LangDoc promised • To make linguistics what many have claimed it always wanted to be, ie. “the scientific study of human language”, by: • Paying proper attention to data (making linguistics properly empirical) • Paying proper attention to analysis in relation to data (metadata, value-adding to the corpus) • To change the socio-political academic balance between “fieldworkers” and “armchair linguists” (typologists, theoreticians) by providing a foundation (theory, best practices) for data collection and analysis • To change the balance between “outsider” (linguist) and “insider” (speaker, community member) through empowerment, skills transfer and training
  14. 14. 14 • Language Documentation has failed to live up to its promises in all three areas, and in many ways continues what has been seen as “normal science” in Linguistics, especially in relation to outputs and evaluations of them • There are many challenges facing the field, but also exciting opportunities to be explored – we identify some of these later
  15. 15. 15 A 2010 example – Stuart McGill • 4 year PhD project at SOAS, plus 2 year post-doc • documentation of Cicipu (Niger-Congo, north-west Nigeria) in collaboration with native speaker researchers • outcomes:  a corpus of texts (video, ELAN, Toolbox)  2,000 item lexicon  archive (956 files, 50Gbytes)  overview grammar (134 pages)  analysis of agreement (158 pages)  website, cassette tapes, books, orthography proposal and workshop
  16. 16. 16 Stuart McGill Cicipu corpus
  17. 17. 17 Cicipu Toolbox
  18. 18. 18 The documentation model 2000-2010 Noah‟s arc(hive) – saving the morphemes 2-by-2
  19. 19. 19 Despite the rhetoric • lone wolf linguists primarily focussed on language • little interdisciplinary interest • the linguist decides what to deliver to communities (dictionaries, orthographies, story collections, etc.)
  20. 20. 20 Key concepts in this period • Standards (data, metadata, project designs) • Tools (transcription, glossing) • Preservation (“archival standard”)
  21. 21. 21 Consequences • objectification and commodification: “reduction of languages to common exchange values, particularly in competitive and programmatic contexts such as grant-seeking and standard- setting where languages are necessarily compared and ranked” (Dobrin, Austin, Nathan) • lack of audio skills : little or no knowledge about recording arts and microphone types, properties and placement (microphone choice and handling is the single greatest determiner of recording quality) • video madness: video recordings made without reference to hypotheses, goals, or methodology, simply because the technology is available, portable and relatively inexpensive • corpus taming: little ability at corpus and metadata management, file naming and bundle organisation
  22. 22. 22 ILG blindness many documenters believe that interlinear glossing is the ‘gold standard’ of annotation but it is very time-consuming and illegible to non-linguists – overview annotations may be a preferred as a primary goal: ‘roadmap’ or index of a recording – approximately time-aligned information about what is in the recording, who is participating, and other interesting phenomena
  23. 23. 23 Holton 2014 Item 408: Oral Literature Collection, Tape 343, Side B. Robert Zuboff (Kak’weidí clan, Kaakáakw Hít) and Susie James (Chookaneidi clan, T’akdeintaan yádi), July 27, 1972; interviewed by Nora Marks Dauenhauer, migrated from reel to CD. Length 60:14. Content by DK: story of how the Sea Otter came to be is told, 0-4:15; raven sounds are given by Zuboff, and their meaning/use, 4:16-11:10; Zuboff tells a story about a man who became an invisible man (tlékanáa) (13:24); 11:11-13:24; story of a man named Naawan that bit the tongue off of a raven, 13:25-16:09; general conversation and questions about Tlingit phrases, 16:10-19:57; story of a man named Gáneix, 19:58-21:40; discussion about language and storytelling, mention of the Salmon Boy story, 21:41-24:12; Zuboff tells the story about the Woman that Raised the Wood Worm, attributes the story’s people, 24:13-27:34; Susie and Nora talk, Susie speaks about the Man Who Commanded the Tides (Yookis kookeik) and his sister and raven. She then tells the story of bringing in the house that was way out on the ocean and how raven got the octopus tentacle to bring in the house. She then talks about the type of resources that were in the house but not in detail. She mentions the whale, cod etc. She then goes back to the man who commanded the tide and rescues his mother by placing her in the skin of a black duck, 27:35 to the end of the recording. Notes on file.
  24. 24. 24 Files, files and more files • data – for the sake of data (mining) • archivism – quantifiable properties such as recording hours, data volume, and file parameters, and technical desiderata like ‘archival quality’ and ‘portability’ become reference points in assessing the aims and outcomes of language documentation – these are not measures of quality documentary dog archiving tail X
  25. 25. 25 Important concepts since 2010 • diversity: of goals, contexts, people, data, corpora, outcomes • move away from Noah’s Arc(hive) to more focused documentation, eg. ELDP 2012 grant list: bark cloth making, libation rituals, fishing practices, child language, interactive speech, and ethnobotany • diverse inputs – field interviews, experiments and observations (traditionally the bread and butter of documentation and description) but also Youtube uploads, Twitter feeds, Facebook, blogs, email, chat, Skype, local pedagogy in revitalisation • diverse outputs – books, papers and archive deposits (the bread and butter of 1990’s documentation) but also Youtube uploads, Twitter posts, Facebook, blogs, email, chat, Skype, local pedagogy in revitalisation, mobile apps
  26. 26. 26 • collaboration: working with communities to determine project goals and outcomes • Archiving 2.0: building on Web 2.0 models that link people (rather than documents or files) to create contexts for exchange and sharing, with language archives as a locus for interactivity • incremental documentation and archiving
  27. 27. 27 Archive 2.0: social media models • traditionally archiving focussed heavily on preservation • however documentation often deals with highly sensitive topics (sacred stories, gossip) • needs powerful but flexible access management • transparency – ease of understanding • use positively – social networking model • access through relationships • relationships and sharing produce new opportunities • ELAR URCS system
  28. 28. 28 ELAR URCS system • e.g. Trevor Johnston Auslan deposit • Logged in user displays
  29. 29. 29 OAIS model OAIS archives define three types of ‘packages’ ingestion, archive, dissemination: Archive Dissemination afd_34 dfa dfadf fds fdafds afd_34 dfa dfadf fds fdafds afd_34 dfa dfadf fds fdafds afd_34 dfa dfadf fds fdafds afd_34 dfa dfadf fds fdafds IngestionProducers Designated communities
  30. 30. 30 ELAR archive 2.0 model
  31. 31. 31 Rethinking archive participation • users e.g. add bookmarks, negotiate access • depositors e.g. updating and editing content • negotiate access • monitoring usage • collaborations • exchange & share information • establish groups • community curation
  32. 32. 32 User xx has just applied for access to restricted material in the deposit johnston2012auslan. The following message was attached to the application: "Hello [depositor], xx here. I'm interested in having a look at some of your video deposit, including annotation files. I am working on a project documenting Central Australian Indigenous sign with yy (see If ok, I'd like to see how you do the annotation - we have worked out a template and annotation protocol, but this needs a lot of refinement. Regards, MC"
  33. 33. 33 This email is to inform you that user xx's application for access to restricted material in the deposit kunbarlang-389 has just been approved. The depositor included the following note to the user: "Hi xx I've approved your access to this collection, but you should know that there is an update in the material I've just deposited, with much more information on both music and texts. I'd be happy to give you access to that when it is processed. Next time I come to London (October or November this year) I'd be happy to meet up if you would like to discuss."
  34. 34. 34 User xx has just applied for access to restricted material in the deposit cappadocian-375. The following message was attached to the application: "Dear [depositor], I work as a research assistant in Nevsehir University in Cappadocia, Turkey. As you know, Cappadocian language has some relics in this region despite speakers of Cappadocian do not live anymore. In my university, there are few research on this subject with collaboration of Greek friends and local societies … I would like to access to your material … By the way, i would like to interview with you about Cappadocian language for our international journal of art and language. I hope you will have time for our journal . Thank you in advance."
  35. 35. 35 This email is to inform you that user xx's application for access to restricted material in the deposit johnston2012auslan has just been approved. The depositor included the following note to the user: "I am giving you user access which means you should be able to see the ELAN eaf annotation files for the topics "The boy who cried wolf" and for "The hare and the tortoise. You should also be able to see most other movies except those tagged "1a" "4a" and "5". If you cannot see the ELAN eaf annotations I hope the problem will be fixed soon. I told the ELAR team about this."
  36. 36. 36 Response with advice about usage “I would have no objection to you getting the movies of these conversation(s) and the eafs from us. Please contact me directly at my work email … Remember however that the conversational material should not be shown publicly or in a publication if there is any suggestion the participants might feel embarrassed by being identified and people seeing what they have said. (They did give there permission for the corpus to be accessible and viewable, but sometimes people have said things they regret and would not like shown publicly. I made this restriction after seeing the videos and reconsidering their privacy issues.)”
  37. 37. 37 Rethinking the archive model • progressive archiving – a challenge to whole approach of documentary linguistics so far • establish user account at beginning of project – users add and manage/update resources over time • user accounts show access and usage/downloads analytics – cf.
  38. 38. 38 “classical” archiving collect resources/data archive them Collect, process, publish Archive And hope that death does not intervene progressive archiving
  39. 39. 39 39 Summary re Archiving 2.0 • flavour of archives changes from finality and completeness to open and evolutionary • questions for archives about what a “deposit” or “depositor” really is • archives recast as providers of services within a revised, ‘holistic’ documentation
  40. 40. 40 Meta-documentation • meta-documentation = documentation of language documentation models, processes and outcomes • the goals, methods and conditions (linguistic, social, physical, technical, historical, bio graphical) under which the data and analysis was produced • meta-documentation should be as rich and appropriate as the documentary materials themselves
  41. 41. 41 Why? • developing good ways of presenting and using language documentations • future preservation of the outcomes of current documentation projects • sustainability of field • helping future researchers learn from the successes and failed experiments of those presently grappling with issues in language documentation (Austin 2010) • documenting IP contributions and career trajectories (Conathan 2011)
  42. 42. 42 Meta-documentation categories • identity of stakeholders involved and their roles in the project • attitudes of language consultants, both towards their languages and towards the documenter and documentation project • relationships with consultants and community • goals and methodology of researcher, including research methods and tools (see Lüpke 2010), corpus theorisation (Woodbury 2011), theoretical assumptions embedded in annotation (abbreviations, glosses), potential for revitalisation
  43. 43. 43 • biography of the project, including background knowledge and experience of the researcher and main consultants (eg. how much fieldwork the researcher had done at the beginning of the project and under what conditions, what training the researcher and consultants had received) • for funded projects, includes original grant application and any amendments, reports to the funder, email communications with the funder and/or any discussions with an archive
  44. 44. 44 Shifting the sociology of the academy? • The development of language documentation from 1995 looked like a possible avenue to legitimise data collection and analysis and shift the sociological power balance between ‘theoretical linguists’ and ‘fieldworkers’ (or ‘butterfly collectors’) as it developed its own theoretical and analytical machinery • This is the context that led in 2010 to the LSA Resolution Recognizing the Scholarly Merit of Language Documentation
  45. 45. 45 LSA Resolution Recognizing the Scholarly Merit of Language Documentation “[a] shift in practice has broadened the range of scholarly work to include not only grammars, dictionaries, and text collections, but also archives of primary data, electronic databases, corpora, critical editions of legacy materials, pedagogical works designed for the use of speech communities, software, websites, or other digital media; the products of language documentation and work supporting linguistic vitality are of significant importance to the preservation of linguistic diversity, are fundamental and permanent contributions to the foundation of linguistics, and are intellectual achievements which require sophisticated analytical skills, deep theoretical knowledge, and broad linguistic expertise;
  46. 46. 46 “the Linguistic Society of America supports the recognition of these materials as scholarly contributions to be given weight in the awarding of advanced degrees and in decisions on hiring, tenure, and promotion of faculty. It supports the development of appropriate means of review of such works so that their functionality, import, and scope can be assessed relative to other language resources and to more traditional publications” But this has not happened – why?
  47. 47. 47 There is an output gap
  48. 48. 48 The output gap • Outputs from language documentation projects have bifurcated into: • Published grammars, (bilingual) dictionaries and (glossed) texts – ‘revival’ of familiar genres linguists have been comfortable with for 100+ years • Archive deposits – hundreds or thousands of files, professionally curated by archivists, but often poorly organised or structured, with little if any contextualisation
  49. 49. 49 What is missing? • Meta-documentation – the documentation of documentation projects, goals, methods, IP contributions, outcomes • New (unfamiliar) genres that link and contextualise analytical outputs and the archival corpus: • ethnographies of documentation project designs • accounts of data collection (cf. archaeology ‘field report’) • finding-aids to corpus collections • ‘exhibitions’ or ‘guided tours’ of archival deposits • Evaluation measures that enable properly-based peer assessment of documentations, equivalent to the way traditional outputs are judged
  50. 50. 50 Open access refereed online publication • provides a new (relatively inexpensive) platform to shift power away from traditional publishers • unfortunately, current attempts to do this, eg. Language Science Press, merely replicate in digital form existing and familiar genres of output, while pushing the costs of formatting etc. back to the author (and proofing to ‘volunteers’) • we need new genres, new experiments in publication and evaluation to bridge the output gap and to realise the potential that language documentation promised to rebalance the sociology of linguistics as a field
  51. 51. 51 EL Publishing • A new online venture to be launched soon which will: • have the infrastructure of familiar models of publication (editorial board, peer assessment, etc.) • provide a platform to encourage experiments in new genres of output • provide a space and an interface to move towards evaluations of these new outputs so that the underlying desire of the LSA statement might be realised
  52. 52. 52 Conclusions • 20 years ago Language Documentation promised a new approach to the study of human language that paid better attention to data collection and analysis • it appeared to be an opportunity to shift the socio-political academic balance between “fieldworkers” and “armchair linguists” (typologists, theoreticians) by providing a foundation (theory, best practices) for documentation, in contrast to language description • Over the past 20 years, and especially the last 10 years, we have seen shifts in the goals, methods, foci and contexts of Language Documentation to make it more pluralistic, open, and socially networked and responsive • However challenges remain, including encouraging new genres that bridge the output gap, more reflexivity, and better engagement with interdisciplinarity and the ethnography of our research and its contexts
  53. 53. 53 Thank you!