• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project
 

Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project

on

  • 2,388 views

Paper at LREC2004 (May 2004, Lisbon)

Paper at LREC2004 (May 2004, Lisbon)

Statistics

Views

Total Views
2,388
Views on SlideShare
2,388
Embed Views
0

Actions

Likes
2
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project Presentation Transcript

    • Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project Baden Hughes 1 , David Penton 1 , Steven Bird 1 , Catherine Bow 1 , Gillian Wigglesworth 1 , Patrick McConvell 2 and Jane Simpson 3 1 University of Melbourne, 2 AIATSIS, 3 University of Sydney
    • Overview
      • Introduction
      • Requirements
      • Data Model
      • Implementation
        • Data Entry
        • Reports, Queries and Searches
        • Exports
        • Synchronisation
        • Administration
      • Conclusion
    • Introduction
      • A metadata creation and management tool for a multiple fieldworker, longitudinal, child language acquisition research project
      • Addressing the need for principled metadata creation as well as best practice data creation
      • Challenging deployment scenario which is typical of numerous field-oriented linguistic research and language data collection projects
    • Requirements
      • Data Management
        • Metadata for complex multimodal data
        • Relational data for participants
        • Delineation between participant roles
        • Not just collection, but reports and queries
      • Research Methodology
        • Integration with tool of choice for analysis
        • 2 stage enquiry process - metadata then data
        • Extensible controlled vocabularies
        • User defined fields (particularly lists)
      • Technology
        • Full support for data entry and enquiry in both online and offline modes
        • Metadata collection with maximum utility to project without precluding other renderings eg as OLAC or IMDI catalogue
        • Easy to install and use on multiple platforms
    • Data Model
      • Tools for modelling
        • DBDesigner (open source, XML based, multi-platform)
      • Challenges for modelling
        • Multiple interlinked media, sessions, and transcripts
        • Differentiating between participants and focus children in multiple contexts
        • Incomplete personal data eg no DOB
        • Non-linear progression through educational system
        • Multiple types of anthropological relations
        • Non-standardised linguistic classification and nomenclature
    • Implementation
      • Architecture
        • (fully independent) networked client-server
        • single line of code difference between client and server installation
        • Underlying requirement to provide full functionality in both online or offline environments
      • Technology Platform
        • PHP, PEAR scripting language
        • MySQL database engine
        • Apache HTTP server
        • fundamentally open source, cross-platform
    •  
    • Data Entry
      • Forms based data entry
        • Participant Form
        • Session Form
      • Feature of both these forms is the “build your own list” form interface which allows end user to construct a list of parameters and then apply instances of these parameters within the parent form
        • educational progress
        • session-media-transcript
    • Reports, Queries and Searches
      • Simple Reports
        • for frequently used 2 dimensional queries
          • eg participants by fieldworker
          • eg participants by gender
      • Advanced Reports
        • design your own query interface
      • Full Text Query
        • Boolean support
        • full database index query
    •  
    • Exports
      • Generate headers for CLAN
        • eg @participants
      • Generate Physical Media Labels
        • Eg FM025.A.DV, FM025.A.MD
      • Generate File Names for Transcriptions
        • eg DEV00012004049.trn
      • XML-based database dump
    • Synchronisation
      • Client -> Server
        • SQL query identifies all changed data since last sync
        • Export and serialize as XML
        • Compress, checksum
        • Transfer over HTTP
        • Checksum, uncompress
        • Serialise XML to SQL
        • Import SQL into database
      • Server -> Client is this process in reverse
    •  
    • Administration
      • User facilitated editing of
        • System data
          • Synchronisation – server settings
        • Extensible controlled vocabularies
          • Languages – linked to Ethnologue and AIATSIS codes
          • Locations – geographical metadata
          • Activities/tasks – both locally and globally defined
        • User administration
          • Access (personal metadata)
          • Roles (fieldworker, administrator …)
        • Project administration
          • Fieldworker activity
    • Conclusion
      • Feature of note is complete online and offline operation
      • Research methodology is indicative of many field linguistics projects
      • Available for other interested parties to build on and extend
      • http://www.cs.mu.oz.au/research/lt/projects/acla-db
    • Acknowledgements
      • The research reported here is supported by the Australian Research Council Discovery Project Grant DP0343189.