LREC 2010


Published on

- Adolphs, S., Carter, R. and Knight, D.

- Second phase multi-modal corpora: Heterogeneous datasets for linguistic analysis.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

LREC 2010

  1. 1. Developing heterogeneous corpora using the Digital Replay System (DRS) Dawn Knight, Paul Tennent, Svenja Adolphs and Ronald Carter
  2. 2. DRS: Digital Replay System (video demo)
  3. 3. DReSS II: Research aims and objectives <ul><li>To produce of digital records, which combine familiar forms of data with computational recordings of interaction </li></ul><ul><ul><li>Not only to record novel forms of data but to develop means whereby social scientists can inspect the opaque character of social interaction and communication in the digital society </li></ul></ul><ul><ul><li>Development will be driven by an experimental project that seeks to explore a day in the life of a member of the digital society </li></ul></ul><ul><li>These studies will be complemented by new forms of corpus analysis that go beyond existing techniques to ‘pump prime’ the development of the population observatory </li></ul>DReSS II: Research aims and objectives
  4. 4. Nottingham eLanguage Corpus (NeLC) <ul><li>‘ Data’ types include (in progress): </li></ul><ul><ul><li>SMS/ MMS messaging </li></ul></ul><ul><ul><li>Blogging </li></ul></ul><ul><ul><li>Chat room and message board discourse </li></ul></ul><ul><ul><li>Email Usage </li></ul></ul><ul><ul><li>Face-to-face situated discourse </li></ul></ul><ul><ul><li>GPS or manual map-based tracking </li></ul></ul><ul><ul><li>Web browsing activity (automated logging of sites) </li></ul></ul><ul><ul><li>Phone calls (home and/or mobile) </li></ul></ul><ul><ul><li>Text messaging </li></ul></ul><ul><ul><li>Video calls (mobile or online, e.g. Skype </li></ul></ul>
  5. 5. Capturing extra-contextual information <ul><li>Brown notes that the following aspects of a discursive situation are important to consider in order to help conceptualise context (1989: 98): </li></ul><ul><ul><li>Features of the external situation of utterance: speaker/ hearer, place / time etc </li></ul></ul><ul><ul><li>Background knowledge (particularly socio-cultural knowledge) which the learner brings to the interpretation of the discourse </li></ul></ul><ul><ul><li>Linguistic features of the discourse- particularly the range of vocabulary and syntactic structures, cohesiveness and rhetorical structures </li></ul></ul>
  6. 6. E.g. SMS Component- details includes <ul><ul><li>Date and time sent/ received </li></ul></ul><ul><ul><li>Identity of sender receiver (age, occupation, basic relationship to participant) </li></ul></ul><ul><ul><li>Location when sent/received </li></ul></ul><ul><ul><li>Activity in location </li></ul></ul><ul><ul><li>Content of message </li></ul></ul><ul><ul><li>Semantic field of the message (i.e. content semantically coded) </li></ul></ul><ul><ul><li>Linguistic function of the message </li></ul></ul>
  7. 7. Key issues and Challenges <ul><li>Refine the processes of data collection and collation. </li></ul><ul><ul><ul><li>- Automated extraction of content and/or metadata information (whole or part) </li></ul></ul></ul><ul><ul><ul><li>- Create methods for organising information (primary and secondary- from raw video to codes) within and across media types and specific datasets </li></ul></ul></ul><ul><li>B. Develop methods for dataset i nterrogation and mining </li></ul><ul><li>C. Compose system(s) for (re)presenting data </li></ul>
  8. 8. Software requirements- 1 <ul><li>The ability to search data and metadata in a principled and specific way (encoded and/or transcribed text-based data), within and/or across the three global domains of data; devices/ data type(s), time and/or location and participants/ given contributions. </li></ul>
  9. 9. Software requirements- 2 <ul><li>Tools that allow for the frequency profiling of events/ elements within and across domains (providing raw counts, basic statistical analysis tools, and methods of graphing such). </li></ul><ul><li>New methods for drilling into the data, through mining specific relationships within and between domain(s). This may be comparable to current social networking software, mind maps or more topologically based methods. </li></ul><ul><li>Graphing tools for mapping the incidence of words or events, for example, over time and for comparing sub-corpora and domain specific characteristics. </li></ul>
  10. 10. DRS Demo- some new developments
  11. 11. Acknowledgements Research team The Digital Records for e-Social Science Project is funded by the ESRC.