Meta-documentary linguistics


Published on

Talk presented at Aboriginal Languages Workshop, Kioloa, NSW, Australia, 11-13 March 2010

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Meta-documentary linguistics

  1. 1. Meta-documentary linguistics Prof Peter K. Austin Linguistics Department, SOAS RCLT, La Trobe University 13 March 2009
  2. 3. Outline <ul><li>Defining documentary linguistics </li></ul><ul><li>Products and processes </li></ul><ul><li>Meta-documentary linguistics </li></ul><ul><li>Meta-documentation and legacy data </li></ul><ul><li>The Guwamu project </li></ul><ul><li>The Mantharta languages project </li></ul><ul><li>Conclusions </li></ul>
  3. 4. Defining documentary linguistics <ul><li>“ documentary linguistics is the subfield of linguistics that is ‘concerned with the methods, tools, and theoretical underpinnings for compiling a representative and lasting multipurpose record of a natural language or one of its varieties’ (Himmelmann 2006:v). A similar definition is given by Woodbury (2010) as ‘the creation, annotation, preservation, and dissemination of transparent records of a language’. Language documentation is by its nature multidisciplinary, and as Woodbury (2010) notes, it draws on ‘concepts and techniques from linguistics, ethnography, psychology, computer science, recording arts, and more’” ( Austin 2010:12) </li></ul>
  4. 5. Himmelmann’s key features <ul><li>focus on primary data – language documentation concerns the collection and analysis of an array of primary language data to be made available for a wide range of users; </li></ul><ul><li>explicit concern for accountability – access to primary data and representations of it makes evaluation of linguistic analyses possible and expected; </li></ul><ul><li>concern for long-term storage and preservation of primary data – language documentation includes a focus on archiving in order to ensure that documentary materials are made available to potential users into the distant future; </li></ul>
  5. 6. <ul><li>work in interdisciplinary teams – documentation requires input and expertise from a range of disciplines and is not restricted to linguistics alone; </li></ul><ul><li>close cooperation with and direct involvement of the speech community – language documentation requires active and collaborative work with community members both as producers of language materials and as co-researchers. </li></ul><ul><li>Austin (2010) adds: </li></ul><ul><li>diversity – as researchers respond to the unique and particular social, cultural and linguistic contexts within which individual languages are spoken, documentation projects are showing a diversity of approaches, techniques, methodologies, skills and responses. </li></ul>
  6. 7. Products and processes <ul><li>“ The sets of records, coherent or not, are often called language documentations; but since that is what we are calling the activity as a whole, I will call such sets language documentary corpora (or just corpora ); and I will call the ideas according to which a corpus is said to cohere or ‘add up’ its ( corpus) theorization . Corpus theorizations, and even principles for corpus theorization, can both offer a space for invention and become a matter of contention and debate.” (Woodbury 2010, emphasis added) </li></ul><ul><li>“ of special interest is the range of concerted, programmed documentary activities motivated by impending language loss and aimed at creating a final record. These activities raise issues of corpus theorization; but in addition, they raise questions about the participants, their purposes, and the various stakeholders in the activity or program of activity or project: we may refer to this set of questions as the project design … of a language documentation activity” </li></ul>
  7. 8. Products and processes again <ul><li>“ it does seem clear that documentary linguists have been on relatively comfortable ground in thinking about the products of linguistic research : conceptually distinguishing an annotated corpus or documentation of a language from a higher order description of its patterning …, reasserting the intellectual value of vocabulary … and oral discourse (as represented in texts) alongside grammar, extending the range of documentary outputs to include items like primers and orthographies that are targeted directly at non-academic audiences … They have also enriched the inventory of digital data models , formats , and software tools that facilitate documentary research and enable the preservation and dissemination of its results” </li></ul><ul><li>“ linguists have also begun devoting attention to the social processes set in motion by their research, from the conceptualization of fieldwork to the dissemination of its products. This is a new development , so new, in fact, that even as recently as the late 1990s the editors of a volume exploring the practical and methodological issues raised by linguistic fieldwork (Newman and Ratliffe 2001) found themselves hard pressed to find a publisher (Newman 2009). It is here that the discussions about language documentation taking place today are most exploratory and driven by tension” (Dobrin & Berson, 2010, emphasis added) </li></ul>
  8. 9. Metadata <ul><li>‘ [m]etadata is the additional information about data that enables the management, identification, retrieval and understanding of that data. The metadata should explain not only the provenance of the data (e.g. names and details of people recorded), but also the methods used in collecting and representing it.’ (Nathan 2010b, emphasis added) </li></ul><ul><li>metadata is required not only for archiving but also for the very management , identification , retrieval and understanding of the data within the documentation project once the transfer process (see above) is undertaken and value-adding is to be done. The way files are named and structured in folders is itself a type of metadata (see Nathan 2010b), and as Nathan and Austin 2004 argue, any data added to the recordings (including transcription, translation, annotation etc.) should be seen as ‘thick metadata’ (contrasted with the ‘thin’ cataloguing metadata often promoted in discussions of language documentation, e.g. by the E-MELD School of Best Practice). (Austin 2010, emphasis added) </li></ul>
  9. 10. Meta-documentation <ul><li>‘ [a]nother way to think of metadata is as meta-documentation, the documentation of your data itself, and the conditions (linguistic, social, physical, technical, historical, biographical) under which it was produced. Such meta-documentation should be as rich and appropriate as the documentary materials themselves.’ (Nathan 2010) </li></ul>
  10. 11. Meta-documentary linguistics <ul><li>Meta-documentary linguistics would be (to adapt Himmelmann): the methods , tools , and theoretical underpinnings for setting up, carrying out and concluding a documentary linguistics research project. It would be the documentation of the documentation research itself. How could we arrive at a theory of meta-documentation? Possible avenues: </li></ul><ul><li>deductive approach: postulation of axioms and theorems; </li></ul><ul><li>inductive approach : examination of current and past documentations (so-called ‘legacy materials’) to analyse practices and identify operating principles (as well as lacunae); </li></ul><ul><li>comparative approach: examine what other relevant and related fields have done in their meta-documentation, and see what is applicable and what not to documentary linguistics. </li></ul>
  11. 12. Meta-documentation – deductive <ul><li>the identity of the stakeholders that were involved and their roles in the project (cf. Woodbury “project design”, Dobrin & Berson “social processes”); </li></ul><ul><li>the attitudes of language consultants, both towards their languages and towards the documentation project; </li></ul><ul><li>the methodology of the researcher, including research methods and tools, any theoretical assumptions encoded through things such as abbreviations or glosses, as well as relationships with the consultants and the community (Good 2010 mentions what he called ‘the 4 Cs’: ‘contact, consent, compensation, culture’) (cf. Woodbury “corpus theorization”); </li></ul>
  12. 13. <ul><li>the biography of the project, including background knowledge and experience of the researcher and main consultants (eg. how much fieldwork the researcher had done at the beginning of the project and under what conditions, what training the researcher and consultants had received). For a funded project, the project biography would include the original grant application and any amendments, reports to the funder, email communications with the funder and/or any discussions with an archive, such as reviews of sample data; </li></ul><ul><li>any agreements entered into, whether formal or informal (such as a Memorandum of Understanding, payment arrangements, and any promises and expectations issued to stakeholders) about all aspects of the project, eg. IPR, present/future outcomes, etc. </li></ul>
  13. 14. <ul><li>This kind of information is invaluable, not only for the researcher and others involved in a project, but also for any other future parties wishing to make sense of the project and its history and context. </li></ul><ul><li>Unfortunately, linguists have typically been poor at recording and encoding this kind of information, meaning that work is often difficult with so-called ‘legacy data’, especially materials that only become available once the researcher has died (see Bowern 2003, Innes 2009, O’Meara & Good 2009). This is an area for further development within language documentation theory and practice </li></ul>
  14. 15. Meta-documentation – inductive <ul><li>Guwamu project </li></ul><ul><li>Stephen Wurm’s fieldnotes of language elicitation (translations from English to Guwamu) collected from Willy Willis in Goodooga 1955; 100 double-sided pages of notes with phonetic transcription and glosses in Hungarian shorthand; short tape recording </li></ul><ul><li>glosses decoded by Wurm and recorded on tape in 1977; fieldnotes copied and glosses added by Austin 1977, 138 pages, copy deposited with AIATSIS Library </li></ul>
  15. 17. <ul><li>jama inda goammu ŋalgaŋanda? Do you speak Guwamu? </li></ul><ul><li>bađarinj ŋalla He is sick. </li></ul><ul><li>balgaru ŋunan ugwɛ:ilɛja A few days ago I camped there. </li></ul><ul><li>balunj ŋadju ilu iđamanjgija juraŋunda I will leave my axe here with you all. </li></ul>
  16. 18. Jeannie Bell (2009) Toolbox format – 651 sentences <ul><li> ef Guwamu.010 </li></ul><ul><li>wm balunj ŋadju ilu iđamanjgija i:balunda </li></ul><ul><li> x balunj ngadju ilu idhama-nj-gi=ya iibalu-nda </li></ul><ul><li>mr axe 1SgPOSS here leave-INT-FUT=1Sg 2Du-DAT </li></ul><ul><li>fg I will leave my axe here with you 2. </li></ul><ul><li>so Wm-p64 (27B) </li></ul><ul><li>dt 19/May/2008 </li></ul><ul><li>Cf. Austin analysis discussed below </li></ul>
  17. 19. Problems with form of original <ul><li>handwriting sometimes difficult to decipher </li></ul><ul><li>orthography – Wurm’s transcription is not documented but appears to be similar to IPA – it is quite low level phonetic but both overdifferentiates (eg. recording gemination for consonants) and underdifferentiates (eg. failing to distinguish apico-alveolar and lamino-dental nasals) </li></ul><ul><li>shorthand notations – Wurm’s glosses are mostly in Hungarian shorthand </li></ul><ul><li>word boundaries sometimes incorrect </li></ul><ul><li>sometimes cryptic glossing, or apparently wrong glossing </li></ul><ul><li>changing understandings over time of the language being recorded – Wurm clearly was working out the structure of Guwamu as he went along (and there are some comments in the fieldnotes which indicate his guesses about particular morphemes) so his transcription varies from the first page to the last </li></ul>
  18. 20. compare <ul><li>Bowern 2003 mentions that Laves began to analyse his Bardi material when writing it down and made mistakes as a result, ie. didn’t write what he heard but what he thought he heard. Also, Steele (2005:84) writing about William Dawes’ notebooks on the Sydney languages comments: “ In order to be in a position to make some assessment of the soundness of an interpretation of a word, expression or sentence provided by Dawes, it is useful to have an idea of at which stage of his language learning an entry was created.” </li></ul>
  19. 21. Other problems <ul><li>lack of context – we know nothing of how the material was recorded, what sessions took place, the background of the speaker and his involvement in the project (on tape he sounds enthusiastic, at least when signing). No information is available about agreements entered into or any compensation arrangements. </li></ul><ul><li>unclarity about protocol , ie. access and usage rights to the materials in their various forms. The copy of Austin’s notes at AIATSIS have access restrictions: “ Closed access - Principal's permission. Closed copying & quotation Principal's permission. Not for Inter-Library Loan” </li></ul><ul><li>stakeholder issues – Jeannie Bell has begun this project at the request of Guwamu people, however it is not clear (or documented?) what agreements have been negotiated between her as editor and the community, or even what ‘community’ means in this context (and its relation to Willy Willis). It is important to keep track of each person’s contribution to the project (Willy Willis, Wurm, Austin, Bell, the modern community members). When we set up the data structures we should make this clear in our metadata. </li></ul>
  20. 22. Austin 2010 data structures <ul><li> ef unique ID for each sentence </li></ul><ul><li>wm Wurm’s phonetic transcription of Guwamu </li></ul><ul><li> x-B Bell’s phonemicisation of Guwamu </li></ul><ul><li>mg-B Bell’s morpheme-by-morpheme glossing </li></ul><ul><li> x-A Austin’s phonemicisation of Wurm’s data </li></ul><ul><li>m-A Austin’s analysis of morpheme forms </li></ul><ul><li>mg-A Austin’s morpheme-by-morpheme glossing </li></ul><ul><li>cat Austin’s assignment of syntactic category information </li></ul><ul><li>subcat Austin’s assignment of syntactic sub-category information </li></ul><ul><li>lxnum unique ID numbers for morphemes in Austin’s Guwamu lexicon </li></ul><ul><li>fg Wurm’s free gloss </li></ul><ul><li> t-B Bell’s notes </li></ul><ul><li> t-A Austin’s notes </li></ul><ul><li> ec recorder – by default SAW </li></ul><ul><li>sp speaker – by default WW </li></ul><ul><li>creat creator of original Toolbox files – by default JB </li></ul><ul><li>ed editor of Toolbox files – by default PA </li></ul><ul><li>so source reference in Wurm’s fieldnotes </li></ul><ul><li>dt Date stamp for last edit of this record </li></ul>
  21. 23. <ul><li> ef Guwamu.002 </li></ul><ul><li>wm bađarinj ŋalla </li></ul><ul><li> x-B badharinj nga=lla </li></ul><ul><li>mg-B sick AuxV=3Sg </li></ul><ul><li> x-A badharinyngala </li></ul><ul><li>m-A badhariny -nga -la </li></ul><ul><li>mg-A be.sick -pres -3sg </li></ul><ul><li>cat v -suff -suff </li></ul><ul><li>subcat vi -vinfl -proagr </li></ul><ul><li>lxnum 093 -048 -011 </li></ul><ul><li>fg (He) is sick. </li></ul><ul><li> t-B note auxiliary </li></ul><ul><li> t-A -nga- is present tense verb inflection not auxiliary </li></ul><ul><li> ec SAW </li></ul><ul><li>sp WW </li></ul><ul><li>creat JB </li></ul><ul><li>ed PA </li></ul><ul><li>so Wm-p5 (3B) </li></ul><ul><li>dt 09/Mar/2010 </li></ul>
  22. 24. <ul><li> ef Guwamu.003 </li></ul><ul><li>wm balgaru ŋ unan ugw ɛ :il ɛ ja </li></ul><ul><li> x-B balgaru ngunan ugweei-le=ya </li></ul><ul><li>mg-B few days there sleep-PST=1Sg </li></ul><ul><li> x-A balgaru ngunan wugarilaya </li></ul><ul><li>m-A balgaru ngunan wugari -la -ya </li></ul><ul><li>mg-A day.before.yesterday there sleep -past -1sg </li></ul><ul><li>cat adv dem v -suff -suff </li></ul><ul><li>subcat adv dem vi -vinfl -proagr </li></ul><ul><li>lxnum 042 196 192 -010 -028 </li></ul><ul><li>fg A few days ago I camped there. </li></ul><ul><li> t-B </li></ul><ul><li> t-A SAW apparently misheard the verb, cf. Guwamu.415 [ugarilgija] </li></ul><ul><li> ec SAW </li></ul><ul><li>sp WW </li></ul><ul><li>creat JB </li></ul><ul><li>so Wm-p27 (14A) </li></ul><ul><li>dt 12/Feb/2010 </li></ul>
  23. 25. Richer legacy meta-documentation <ul><li>Austin’s Mantharta languages project </li></ul><ul><li>Meta-documentation of stakeholders </li></ul><ul><li>Meta-documentation of texts </li></ul><ul><li>Meta-documentation of grammar </li></ul>
  24. 26. Stakeholder – meta-data data structure <ul><li>id unique identifier </li></ul><ul><li> ame European name </li></ul><ul><li> ole role in project </li></ul><ul><li>dofb date of birth </li></ul><ul><li>pofb place of birth </li></ul><ul><li>dofd date of death </li></ul><ul><li>sect section </li></ul><ul><li> otem totemic affiliation </li></ul><ul><li>ctotem conception totem </li></ul><ul><li>csite conception site </li></ul><ul><li>kin kinship relations to other stakeholders </li></ul><ul><li>lg primary language affiliation </li></ul><ul><li>dialect dialect affiliation </li></ul><ul><li> ote notes </li></ul><ul><li>ib bibliographical reference </li></ul><ul><li>photo digital photograph </li></ul><ul><li>dt date of last record update </li></ul>
  25. 27. Stakeholder – example <ul><li>id JB </li></ul><ul><li> ame Jack Butler </li></ul><ul><li> ole speaker </li></ul><ul><li>dofb 1901-05-04 </li></ul><ul><li>pofb wilukampal Caraline Well </li></ul><ul><li>dofd 1986-05-10 </li></ul><ul><li>sect karimarra </li></ul><ul><li> otem wariyarra </li></ul><ul><li>ctotem papalhura </li></ul><ul><li>csite pirtanngura </li></ul><ul><li>kin brother of Joe Butler, son of Silver, step-son of </li></ul><ul><li> yawartawari. </li></ul><ul><li>lg Jiwarli </li></ul><ul><li>dialect </li></ul><ul><li> ote </li></ul><ul><li>ib </li></ul><ul><li>photo butler1.jpg </li></ul><ul><li>dt 27/Jun/2004 </li></ul>
  26. 29. Text – metadata data structure <ul><li> num text unique identifier </li></ul><ul><li>genre genre of text </li></ul><ul><li> ec ID of recorder [links to stakeholder meta-data] </li></ul><ul><li>sp ID of speaker(s) [links to stakeholder meta-data] </li></ul><ul><li> ecdt date of recording </li></ul><ul><li> ape reference number of tape </li></ul><ul><li>dur duration of recording </li></ul><ul><li> randt date of transcription </li></ul><ul><li> ransc ID of transcriber [links to stakeholder meta-data] </li></ul><ul><li> ransrc transcription source notebook reference </li></ul><ul><li>desc description of text content (in prose summary) </li></ul><ul><li>lxnum lexicon ID reference for important referents </li></ul><ul><li>cf cross-reference </li></ul><ul><li>ib bibliography ID cross-reference </li></ul><ul><li> t note </li></ul><ul><li>scount count of number of sentences </li></ul><ul><li>snum ID numbers of constituent sentences </li></ul><ul><li>date date record created </li></ul><ul><li>dt date of last update </li></ul>
  27. 30. Text – metadata example <ul><li> num ji38 </li></ul><ul><li>genre mythology </li></ul><ul><li> ec PA </li></ul><ul><li>sp JB </li></ul><ul><li> ecdt 1983-11-03 </li></ul><ul><li> ape SP31 </li></ul><ul><li>dur </li></ul><ul><li> randt 1984-05-16 </li></ul><ul><li> ransc PA </li></ul><ul><li> ransrc N9p25-29 </li></ul><ul><li>desc This is a well-known myth of Emu and Turkey (also occurring as Emu and Brolga in eastern Australia) found throughout the whole of Australia. The distribution of the myth is discussed in Austin and Tindale (1985:19); see also a discussion of Emu and Turkey in Berndt and Berndt (1989:400-401). The competition between these two birds is seen in stories recorded from all throughout Australia. Other versions to be found in Tonkinson (1974:73-74), McConnel (1958:91-94). </li></ul><ul><li>lxnum 306, 307 </li></ul><ul><li>cf </li></ul><ul><li>ib 0019, 0060, 0069, 0076 </li></ul><ul><li> t </li></ul><ul><li>scount 033 </li></ul><ul><li>snum ji38s001, ji38s002, ji38s003, ji38s004, ji38s005, ji38s006, ji38s007, ji38s008, ji38s009, ji38s010, ji38s011, ji38s012, ji38s013, ji38s014, ji38s015, ji38s016, ji38s017, ji38s018, ji38s019, ji38s020, ji38s021, ji38s022, ji38s023, ji38s024, ji38s025, ji38s026, ji38s027, ji38s028, ji38s029, ji38s030, ji38s031, ji38s032, ji38s033 </li></ul><ul><li>date 2001-02-10 </li></ul><ul><li>dt 2010-03-10 </li></ul>
  28. 31. Grammar -- metadata <ul><li>A Grammar of the Mantharta Languages </li></ul><ul><li>Chapter 1 </li></ul><ul><li>1.2. Sources for this study </li></ul><ul><li>1.2.1. Background to research </li></ul><ul><li>1.2.2. Fieldwork </li></ul><ul><li>1.2.3. The nature of the material collected </li></ul><ul><li>1.2.4. Dying languages? </li></ul><ul><li>1.2.5. Language consultants </li></ul><ul><li>1.3. Previous linguistic research </li></ul>
  29. 32. Conclusions <ul><li>We need a new development within language documentation, namely meta-documentary linguistics, which aims to document the goals, processes, methods and structures of language documentation projects. We can develop this field by theorisation, investigation of current and past practices, and by exploring comparative approaches. By creating meta-documentation for projects now we will hopefully reduce the legacy data problems for future researchers compared to the legacy data problems that we face today (because such meta-documentation was not done in the past). </li></ul>
  30. 33. References <ul><li>Austin, Peter K. 2010a. Current Issues in Language Documentation. In Peter K. Austin (ed.) Language Documentation and Description , Volume 7:12-33. London: SOAS. </li></ul><ul><li>Bowern, Claire. 2003. ‘Laves’ Bardi Texts’ Foundation for Endangered Languages. In Joe Blythe & M. Brown (eds.)  Maintaining the links: Language, identity and the land . Proceedings of FEL VII , Broome, Western Australia: FEL </li></ul><ul><li>Dobrin, Lise and Josh Berson. 2010. Chapter 10: Speakers and Language Documentation. In Peter K. Austin and Julia Sallabank (eds.) Handbook of Endangered Languages. Cambridge: Cambridge University Press. </li></ul><ul><li>Good, Jeff 2010 Documenting consent, access and rights. Presentation at LSA Annual Meeting OLAC workshop on archiving, Baltimore. </li></ul><ul><li>Innes, Pamela 2009 Ethical problems in archival research: Beyond accessibility. Journal of Language and Communication . </li></ul><ul><li>Nathan, David. 2010b. Archiving and language documentation: from disk space to MySpace. In Peter K. Austin (ed.) Language Documentation and Description, Volume 7 . London: SOAS. </li></ul><ul><li>Nathan, David and Peter K. Austin. 2004. Reconceiving metadata: language documentation through thick and thin. In Peter K. Austin (ed.) Language Documentation and Description, Volume 2 , 179-187. London: SOAS. </li></ul><ul><li>Newman, Paul and Martha Ratliffe (eds.). 2001a. Linguistic Fieldwork . Cambridge: Cambridge University Press. </li></ul><ul><li>O’Meara,Carolyn & Jeff Good, 2009 Ethical issues in legacy language resources. Journal of Language and Communication . </li></ul><ul><li>Steele, Jeremy 2005 The Aboriginal language of Sydney: a partial reconstruction of the indigenous language of Sydney based on the notebooks of William Dawes of 1790-91, informed by other records of the Sydney and surrounding languages to c.1905. Macquarie University MA thesis. </li></ul><ul><li>Woodbury, Anthony C. 2010. Chapter 9: Language Documentation. In Peter K. Austin and Julia Sallabank (eds.) Handbook of Endangered Languages. Cambridge: Cambridge University Press. </li></ul>
  31. 34. Thank you <ul><li>The End </li></ul>