SPinTX Corpus-to-Classroom:ATeacher-Centered Pedagogical Interface forthe Spanish in Texas CorpusBarbara E. Bullock, Almei...
Who we are• Barbara E. Bullock & Almeida Jacqueline Toribio• Project Directors / Sociolinguistics Researchers• Rachael Gil...
Agenda• Part 1: Introduction to the Corpus-to-Classroom Project• Part 2: Project Results• The SpinTX Video Archive: a peda...
Corpus-to-Classroom4
Corpora in the Classroom: the promise• Corpus: a large, structured, collection of language• Benefits:• Naturalistic langua...
Corpora in the Classroom: the reality• Large linguistic corpora are of limited utility to untrainedend users.• Designed fo...
Our two-pronged approachSpanish in Texas Corpus ProjectA project of COERLL, a National Foreign LanguageResource Center (20...
Spanish in Texas Corpus: Goals• To make publically available authentic data aboutvariation in Spanish as spoken in Texas• ...
Corpus-to-Classroom: Goals• develop a pedagogically friendly interface for usingthe Spanish in Texas corpus• involve teach...
Corpus OverviewSpanish in Texas corpus• Approx. 92 videos of sociolinguistic interviews (avg.30–45 min)• Transcribed (appr...
Corpus Tagging: Basic• Time-synced captions• Part-of-speech tags (dual language)• POS• POS, simplified• Gender• Tense• Asp...
Corpus Tagging: Pedagogical• Topics (manually added)• Automatic tags using custom rulesets• Grammatical• aggregated from t...
13
Interview Metadata
Original Transcript (from Automatic Sync)
Upload Video and Transcript to YouTube
Review Transcript in Google Docs
Download SRT file
Prepare Transcript for TreeTagger
Run through TreeTagger
Combine Data from SRT File andTreeTagger File, and add additional Tags
Divide CSV Files and Videos into Clips andadjust Timings and Numberings
The SpinTX Video Archive: apedagogically-friendly interfaceto the Spanish in Texas Corpus23
Needs assessment: teacher interviews• How do you use authentic video in your teaching?• Describe searches you have done in...
Needs assessment results: primary goals• Enable teachers to easily videos that suit thecurriculum/work plan• Search by gra...
Needs assessment results: secondary goals• Materials for teacher trainers• Teachers of heritage learners can learn about l...
27
Ideas for future development• Advanced search capability• support for wildcards• improved phrase searching• improved “keyw...
Formative evaluation of Beta versionData collection methods:• Online user survey• Web analytics (navigation patterns, popu...
Involving Teachers in theDevelopment of OER30
Workshops with Educators• Summer 2012 Workshop• ~100 secondary and college Spanish teachers• Fall 2012 Working Group• ~10 ...
Sample materials from the community (1)32
33
Sample materials from the community (2)• Idea from teacher workshop: Use videos for grammarlessons to develop the student‟...
Template development ideas• Using video clips from the SpinTX video archive, createan activity for classroom use (at any l...
Publication of OER• Community-developed lesson plans will be available onthe SpinTX website by August, 2013• We encourage ...
A Model for Open SourceCorpus Development37
Open source development• Open Source Software• TreeTagger (part-of-speech tagger)• Drupal• Open API‟s• YouTube Captioning ...
Enable sharing of content and data• With educators:• SpinTX interface allows embedding, downloading, & social sharingof vi...
Open content licenses• Creative Commons provides licenses for OpenEducational Resources• We use CC BY-NC-SA (Attribution, ...
Open Project Documentation• Research protocols, development processes andmethodologies, and other project documentationpub...
Questions42
Links• SpinTX Video Archive:http://www.spintx.org• Spanish in Texas Corpus:http://www.spanishintexas.org43
Upcoming SlideShare
Loading in …5
×

SPinTX Corpus-to-Classroom: A Teacher-Centered Pedagogical Interface for the Spanish in Texas Corpus

1,440 views

Published on

Presentation at CALICO 2013: Corpora provide a promising way of creating language learning materials that accurately depict languages, but corpus search interfaces typically aren't designed with this goal in mind. The SPinTX Corpus-to-Classroom project is developing a website for educators to search and adapt authentic video for the teaching of Spanish. This presentation will describe the main results to date: (1) a pedagogically friendly interface to search over 300 tagged video clips from the Spanish in Texas Corpus; (2) tools for educators to easily create lessons and activities based on the videos; (3) an open source model for developing video corpora for language learning.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,440
On SlideShare
0
From Embeds
0
Number of Embeds
74
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Will introduce corpora in general, our source corpus, and the pedagogical corpus
  • Discuss examples briefly one at a time.How frequently do teachers use them?How easy are they to use?Emphasis on YouTube as probably the most popular in language classes, but hard to use.
  • Discuss examples briefly one at a time.How frequently do teachers use them?How easy are they to use?Emphasis on YouTube as probably the most popular in language classes, but hard to use.
  • Describe original corpusThis is similar to the other corpora we looked at earlierIntroduce SpinTX corpus and highlight differences
  • Will introduce corpora in general, our source corpus, and the pedagogical corpus
  • We asked teachers how they use videos and how they would like to use videos. (interviews and focus groups
  • We asked teachers how they use videos and how they would like to use videos.Here is how we havemet their needs
  • We asked teachers how they use videos and how they would like to use videos.Here is how we havemet their needs
  • 1. Anonymous userWatch intro video.Show search criteria: topics, grammar, pragmatics, keywords, etc.Show video page: related items, transcripts with highlighting, sharing & downloading tabs2. Registered userHow to favorite and tag a videoTagged video lists
  • We asked teachers how they use videos and how they would like to use videos.Here is how we havemet their needs
  • We asked teachers how they use videos and how they would like to use videos.Here is how we havemet their needs
  • But that’s not all!
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • 5 guidelines for developing open corporaWill also illustrate how we have implemented each guideline
  • SPinTX Corpus-to-Classroom: A Teacher-Centered Pedagogical Interface for the Spanish in Texas Corpus

    1. 1. SPinTX Corpus-to-Classroom:ATeacher-Centered Pedagogical Interface forthe Spanish in Texas CorpusBarbara E. Bullock, Almeida JacquelineToribio, Rachael Gilg, Martí Quixal & ArthurWendorf
    2. 2. Who we are• Barbara E. Bullock & Almeida Jacqueline Toribio• Project Directors / Sociolinguistics Researchers• Rachael Gilg• Project Manager / Web Developer• Arthur Wendorf• Corpus Linguist / Developer• Martí Quixal• Computational Linguist / Developer• Carl Blyth• Director of COERLL2
    3. 3. Agenda• Part 1: Introduction to the Corpus-to-Classroom Project• Part 2: Project Results• The SpinTX Video Archive: a pedagogically-friendly interface to theSpanish in Texas Corpus• Involving teachers in the development of open educationalresources• A model for open source corpus development3
    4. 4. Corpus-to-Classroom4
    5. 5. Corpora in the Classroom: the promise• Corpus: a large, structured, collection of language• Benefits:• Naturalistic language use• Motivation• „Real‟ language• Discovery learning• Examples:5
    6. 6. Corpora in the Classroom: the reality• Large linguistic corpora are of limited utility to untrainedend users.• Designed for researchers, not educators.• Collections such as YouTube are popular for languageclasses, but can present problems• Searching for appropriate content is time-consuming usingavailable search methods.• Content is not necessarily openly-licensed and can disappearwithout warning.6
    7. 7. Our two-pronged approachSpanish in Texas Corpus ProjectA project of COERLL, a National Foreign LanguageResource Center (2010-2014)• Video interviews provide rich contentSpinTX: Corpus-to-Classroom ProjectGrant from the University of Texas LonghornInnovation Fund for Technology (2012-2013)• Collection of pre-selected, corrected, annotatedclips from the larger corpus• Open-source, pedagogically-friendly search andauthoring tools7
    8. 8. Spanish in Texas Corpus: Goals• To make publically available authentic data aboutvariation in Spanish as spoken in Texas• for education• for research• Encourage teachers/students/public to view localvarieties as a resource8
    9. 9. Corpus-to-Classroom: Goals• develop a pedagogically friendly interface for usingthe Spanish in Texas corpus• involve teachers and learners, via crowd-sourcing,social networking, and workshops, in thedevelopment of open educational resources• create a model for using open source tools and apedagogical interface that can be adapted for anylanguage corpus collection9
    10. 10. Corpus OverviewSpanish in Texas corpus• Approx. 92 videos of sociolinguistic interviews (avg.30–45 min)• Transcribed (approx. 600,000 words)• Time-synced video caption files• Tagged for linguistic featuresSpinTX Video Archive corpus• Approx. 327 video clips from 33 speakers (avg. 1-4min)• Transcribed (approx. 80,000 words)• Time-synced video caption files• Tagged for linguistic and pedagogical features• Completely open (no registration required, open CClicense)• Teacher-friendly interface10
    11. 11. Corpus Tagging: Basic• Time-synced captions• Part-of-speech tags (dual language)• POS• POS, simplified• Gender• Tense• Aspect• Mood• Speaker identification• Age• Gender• Region11
    12. 12. Corpus Tagging: Pedagogical• Topics (manually added)• Automatic tags using custom rulesets• Grammatical• aggregated from textbooks• Pragmatics• discourse markers, place holders (“este”), attenuators• Vocabulary• concept words• Functional (planned)• greetings, ask for help, express opinions• Bilingual forms (planned)• CS, loans, loan translations12
    13. 13. 13
    14. 14. Interview Metadata
    15. 15. Original Transcript (from Automatic Sync)
    16. 16. Upload Video and Transcript to YouTube
    17. 17. Review Transcript in Google Docs
    18. 18. Download SRT file
    19. 19. Prepare Transcript for TreeTagger
    20. 20. Run through TreeTagger
    21. 21. Combine Data from SRT File andTreeTagger File, and add additional Tags
    22. 22. Divide CSV Files and Videos into Clips andadjust Timings and Numberings
    23. 23. The SpinTX Video Archive: apedagogically-friendly interfaceto the Spanish in Texas Corpus23
    24. 24. Needs assessment: teacher interviews• How do you use authentic video in your teaching?• Describe searches you have done in the past for videocontent. What were you looking for and were you able tofind it?• How can you imagine using clips from the Spanish inTexas video corpus in your classes?24
    25. 25. Needs assessment results: primary goals• Enable teachers to easily videos that suit thecurriculum/work plan• Search by grammar, theme, vocabulary, etc.• Provide open, non-ephemeral content• Downloadable from open site with a license enabling remixing• Curating sets of videos for comparison and study• Favoriting and tagging videos• Provide access to supporting materials.• Creating a “community of practice” around the videos so materialscan be shared among educators.25
    26. 26. Needs assessment results: secondary goals• Materials for teacher trainers• Teachers of heritage learners can learn about local variation• Video recording as a cross-competence task• Interviews collected by students can be contributed to the corpus26
    27. 27. 27
    28. 28. Ideas for future development• Advanced search capability• support for wildcards• improved phrase searching• improved “keyword in context” result view• Data visualizations• word and/or tag clouds• language maps• Enhanced word-level annotations• hover over a word in a transcript and see all annotations28
    29. 29. Formative evaluation of Beta versionData collection methods:• Online user survey• Web analytics (navigation patterns, popular content)• Search analytics• User observation and feedback through ongoingworkshops and focus groupsResults will drive future development of the interface.29
    30. 30. Involving Teachers in theDevelopment of OER30
    31. 31. Workshops with Educators• Summer 2012 Workshop• ~100 secondary and college Spanish teachers• Fall 2012 Working Group• ~10 Univ. of Texas Spanish teachers• Spring 2013 Workshops• Multiple conferences & Univ. of Texas Spanish teachers• Summer 2013 Working Group• ~10 secondary and college Spanish teachers31
    32. 32. Sample materials from the community (1)32
    33. 33. 33
    34. 34. Sample materials from the community (2)• Idea from teacher workshop: Use videos for grammarlessons to develop the student‟s metalinguistic and criticalthinking skills as they pertain to language.• Searched and selected clips for lesson on “por vs. para”.• Lesson tested in heritage learners class.• Anecdotal evidence that video lessons were effective andmotivating to students.34
    35. 35. Template development ideas• Using video clips from the SpinTX video archive, createan activity for classroom use (at any level).• Focus on Topics: Familia, Idioma, Identidad• Focus on Grammar: Por vs. Para, Gustar, Ser vs. Estar• Four steps• Predict: Before watching• Observe: While watching• Discuss: After watching• Produce: Follow-up activity35
    36. 36. Publication of OER• Community-developed lesson plans will be available onthe SpinTX website by August, 2013• We encourage the publication of videos on third-partyplatforms for remixing educational content, such as TedEd(http://www.ed.ted.com)36
    37. 37. A Model for Open SourceCorpus Development37
    38. 38. Open source development• Open Source Software• TreeTagger (part-of-speech tagger)• Drupal• Open API‟s• YouTube Captioning API• Google Fusion Tables API• Custom code developed for the project• Freely available in our GitHub repository: http://github.com/coerll38
    39. 39. Enable sharing of content and data• With educators:• SpinTX interface allows embedding, downloading, & social sharingof videos and transcripts.• With researchers:• Source tagged data in our GitHub repositoryhttps://github.com/coerll/SpinTXCorpusData• Documentation of data in our GitHub wikihttps://github.com/coerll/SpinTXCorpusData/wiki39
    40. 40. Open content licenses• Creative Commons provides licenses for OpenEducational Resources• We use CC BY-NC-SA (Attribution, Non-Commercial, Share-Alike)40
    41. 41. Open Project Documentation• Research protocols, development processes andmethodologies, and other project documentationpublically available:• Corpus-to-Classroom Blog: http://sites.la.utexas.edu/corpus-to-classroom/• “For Researchers” page onspanishintexas.orghttp://spanishintexas.org/for-researchers/41
    42. 42. Questions42
    43. 43. Links• SpinTX Video Archive:http://www.spintx.org• Spanish in Texas Corpus:http://www.spanishintexas.org43

    ×