Your SlideShare is downloading. ×
0
LREC 2010
LREC 2010
LREC 2010
LREC 2010
LREC 2010
LREC 2010
LREC 2010
LREC 2010
LREC 2010
LREC 2010
LREC 2010
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

LREC 2010

331

Published on

- Adolphs, S., Carter, R. and Knight, D. …

- Adolphs, S., Carter, R. and Knight, D.

- Second phase multi-modal corpora: Heterogeneous datasets for linguistic analysis.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
331
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Developing heterogeneous corpora using the Digital Replay System (DRS) Dawn Knight, Paul Tennent, Svenja Adolphs and Ronald Carter
  • 2. DRS: Digital Replay System (video demo)
  • 3. DReSS II: Research aims and objectives <ul><li>To produce of digital records, which combine familiar forms of data with computational recordings of interaction </li></ul><ul><ul><li>Not only to record novel forms of data but to develop means whereby social scientists can inspect the opaque character of social interaction and communication in the digital society </li></ul></ul><ul><ul><li>Development will be driven by an experimental project that seeks to explore a day in the life of a member of the digital society </li></ul></ul><ul><li>These studies will be complemented by new forms of corpus analysis that go beyond existing techniques to ‘pump prime’ the development of the population observatory </li></ul>DReSS II: Research aims and objectives
  • 4. Nottingham eLanguage Corpus (NeLC) <ul><li>‘ Data’ types include (in progress): </li></ul><ul><ul><li>SMS/ MMS messaging </li></ul></ul><ul><ul><li>Blogging </li></ul></ul><ul><ul><li>Chat room and message board discourse </li></ul></ul><ul><ul><li>Email Usage </li></ul></ul><ul><ul><li>Face-to-face situated discourse </li></ul></ul><ul><ul><li>GPS or manual map-based tracking </li></ul></ul><ul><ul><li>Web browsing activity (automated logging of sites) </li></ul></ul><ul><ul><li>Phone calls (home and/or mobile) </li></ul></ul><ul><ul><li>Text messaging </li></ul></ul><ul><ul><li>Video calls (mobile or online, e.g. Skype </li></ul></ul>
  • 5. Capturing extra-contextual information <ul><li>Brown notes that the following aspects of a discursive situation are important to consider in order to help conceptualise context (1989: 98): </li></ul><ul><ul><li>Features of the external situation of utterance: speaker/ hearer, place / time etc </li></ul></ul><ul><ul><li>Background knowledge (particularly socio-cultural knowledge) which the learner brings to the interpretation of the discourse </li></ul></ul><ul><ul><li>Linguistic features of the discourse- particularly the range of vocabulary and syntactic structures, cohesiveness and rhetorical structures </li></ul></ul>
  • 6. E.g. SMS Component- details includes <ul><ul><li>Date and time sent/ received </li></ul></ul><ul><ul><li>Identity of sender receiver (age, occupation, basic relationship to participant) </li></ul></ul><ul><ul><li>Location when sent/received </li></ul></ul><ul><ul><li>Activity in location </li></ul></ul><ul><ul><li>Content of message </li></ul></ul><ul><ul><li>Semantic field of the message (i.e. content semantically coded) </li></ul></ul><ul><ul><li>Linguistic function of the message </li></ul></ul>
  • 7. Key issues and Challenges <ul><li>Refine the processes of data collection and collation. </li></ul><ul><ul><ul><li>- Automated extraction of content and/or metadata information (whole or part) </li></ul></ul></ul><ul><ul><ul><li>- Create methods for organising information (primary and secondary- from raw video to codes) within and across media types and specific datasets </li></ul></ul></ul><ul><li>B. Develop methods for dataset i nterrogation and mining </li></ul><ul><li>C. Compose system(s) for (re)presenting data </li></ul>
  • 8. Software requirements- 1 <ul><li>The ability to search data and metadata in a principled and specific way (encoded and/or transcribed text-based data), within and/or across the three global domains of data; devices/ data type(s), time and/or location and participants/ given contributions. </li></ul>
  • 9. Software requirements- 2 <ul><li>Tools that allow for the frequency profiling of events/ elements within and across domains (providing raw counts, basic statistical analysis tools, and methods of graphing such). </li></ul><ul><li>New methods for drilling into the data, through mining specific relationships within and between domain(s). This may be comparable to current social networking software, mind maps or more topologically based methods. </li></ul><ul><li>Graphing tools for mapping the incidence of words or events, for example, over time and for comparing sub-corpora and domain specific characteristics. </li></ul>
  • 10. DRS Demo- some new developments
  • 11. Acknowledgements Research team The Digital Records for e-Social Science Project is funded by the ESRC.

×