June 14, 2013 | IALLT ConferenceA Video Corpus for Language LearningOpen Source Tools & Materials from the Corpus-to-Class...
Who we are• Rachael Gilg• Project Manager / Web Developer• Arthur Wendorf• Educational Technologist / Developer / Spanish ...
3
Agenda1. Introduction to the Corpus-to-Classroom Project2. Project results:• The SpinTX Video Archive: a pedagogically-fri...
Introduction to the Corpus-to-Classroom Project5
Corpora in the Classroom: the promise• Corpus = a large, structured, collection of language• Benefits for language learnin...
Example: CORPUS DEL ESPAÑOL7
Example: CORPUS DEL ESPAÑOL8Pros:• View examples of language in context.• Linguistic annotations enable searchingby part-o...
Example: CORPUS DEL ESPAÑOL9Cons:• Designed for researchers, not educators.• Limited utility to untrained end users.• Cont...
Example: YouTube10
Example: YouTube11Pros:• Engaging video content, many with captions.• Many videos are openly licensed (CC-BY).
Example: YouTube12Cons• Searching is time-consuming.• Content can disappear without warning.• Sometimes blocked by K12 sch...
Our two-pronged approachSpinTX: Corpus-to-ClassroomGrant from the University of TexasLonghorn Innovation Fund forTechnolog...
Spanish in Texas Corpus• Goals:• make publically available authentic data about variation inSpanish as spoken in Texas• fo...
Corpus-to-Classroom• Goals:• develop a pedagogically friendly interface for the Spanish inTexas Corpus• involve teachers a...
About the Corpus16Spanish in Texas Corpus SpinTX Video Archive92 sociolinguistic interview videos(avg. 30–45 min)327 video...
17
The SpinTX Video Archive: apedagogically-friendly interfaceto the Spanish in Texas Corpus18
Needs assessment with educators19
Needs assessment with educators20• How do you use authentic video in your teaching?• How do you find videos to use? What p...
Primary goals of the interface• Enable educators to easily find and use videos that suitthe curriculum.• Search by grammar...
Secondary goals of the interface• Employ in the development of materials for teachertraining.• Engage students as co-resea...
23
Technical Overview of SpinTX Archive• Drupal 7• Taxonomy module integration• Community tags module• Apache Solr search eng...
Ideas for future development• Advanced search capability• support for wildcards• improved phrase searching• improved “keyw...
Formative evaluation of Beta versionData collection methods:• Online user survey (http://goo.gl/4Lbbg)• Web analytics (nav...
Formative evaluation of Beta versionData collection methods:• Online user survey (http://goo.gl/4Lbbg)• Web analytics (nav...
Involving Teachers in theDevelopment of OER28
Workshops with Educators• Summer 2012 Workshop• ~100 secondary and college Spanish teachers• Fall 2012 Working Group• ~10 ...
Sample materials from the community (1)30
31
Sample materials from the community (2)• Idea from teacher workshop: Use videos for grammarlessons to develop the student‟...
Current Templates• Four templates:• Cloze• Data-Driven Learning (DDL)• Variation• Schema33
Cloze Template34
Cloze Template: Activity35
Data-Driven Learning (DDL) Template36
Data-Driven Learning (DDL) Template:Activity37
Variation Template: Pre-class Preparation38
Variation Template: Activity39
Schema Template: Pre-class Preparation40
Schema Template: Activity41
Publication of OER• Templates and community-developed lesson plans will beavailable on the SpinTX website by August, 2013•...
A Model for Open SourceCorpus Development43
Sharing development practices and code• Use of open source software and open API‟s• Custom code developed for the project•...
Recruit „locally‟• Recruit and train interns• Internal Review Board training• Video shooting and audio recording• Practice...
Interview protocol• Sampling of a large set of questions (~75)• from NPR Storycorps (Historias)• biographical information•...
Interview Metadata
Processing the Videos• Intake interview materials• create unique ID for video and forms• archive raw video and remove from...
Original Transcript from Automatic Sync
Upload video and transcript to YouTube for syncing
Download SRT file
Prepare Transcript for TreeTagger
Run through TreeTagger
Combine Data from SRT File andTreeTagger File, and add additional Tags
Manual clip selection and description
Divide CSV Files and Videos into Clips andadjust Timings and Numberings
Automatic Pedagogical Annotation of Clips57
SpinTX Clip Data Published on GitHubhttp://www.github.com/coerll58
Questions?59
Links• SpinTX Video Archive:http://www.spintx.org• Spanish in Texas Corpus:http://www.spanishintexas.org• Slides from this...
Upcoming SlideShare
Loading in …5
×

A Video Corpus for Language Learning: Open Source Tools & Materials from the Corpus-to-Classroom Project

602 views

Published on

Presentation at IALLT 2013

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
602
On SlideShare
0
From Embeds
0
Number of Embeds
166
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Results: still in progress!
  • Will introduce corpora in general, our source corpus, and the pedagogical corpus
  • Discuss examples briefly one at a time.How frequently do teachers use them?How easy are they to use?Emphasis on YouTube as probably the most popular in language classes, but hard to use.
  • Considering the pros and cons of these types of corpus interfaces, we took a two-pronged approach to developing a pedagogically friendly corpus.One the one hand there is the Spanish in Texas project – collecting sociolinguistic video interviews since 2010Recently got a grant focused on developing a pedagogically friendly interface to this existing corpus.
  • Both for research and for education dual purpseShow that language is alive and to view local varieties positively rather than negatively
  • To give you a sense of the scope of the corpus we are working with.
  • Will introduce corpora in general, our source corpus, and the pedagogical corpus
  • We asked teachers how they use videos and how they would like to use videos. (interviews and focus groups)
  • We asked teachers how they use videos and how they would like to use videos. (interviews and focus groups)
  • Teachers of heritage learners can learn about local variationInterviews collected by students can be contributed to the corpus
  • 1. Anonymous userWatch intro video.Show search criteria: topics, grammar, pragmatics, keywords, etc.Show video page: related items, transcripts with highlighting, sharing & downloading tabs2. Registered userHow to favorite and tag a videoTagged video lists
  • We asked teachers how they use videos and how they would like to use videos.Here is how we havemet their needs
  • Observe how teachers are using the system to develop OER
  • Observe how teachers are using the system to develop OER
  • But that’s not all!
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • Pull up favorited videoHide target wordsProject video and cloze text in front of class
  • Discuss prescriptive rules for target as a class.Students pull up worksheet (example)Students complete worksheet by finding and recording examples, and then indicating whether they think it is a standard or non-standard use
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • 5 guidelines for developing open corporaWill also illustrate how we have implemented each guideline
  • A Video Corpus for Language Learning: Open Source Tools & Materials from the Corpus-to-Classroom Project

    1. 1. June 14, 2013 | IALLT ConferenceA Video Corpus for Language LearningOpen Source Tools & Materials from the Corpus-to-Classroom Project
    2. 2. Who we are• Rachael Gilg• Project Manager / Web Developer• Arthur Wendorf• Educational Technologist / Developer / Spanish Instructor• Martí Quixal• Computational Linguist / Developer / Spanish Instructor• Almeida Jacqueline Toribio & Barbara E. Bullock• Project Co-Directors• Carl Blyth• Director of COERLL2
    3. 3. 3
    4. 4. Agenda1. Introduction to the Corpus-to-Classroom Project2. Project results:• The SpinTX Video Archive: a pedagogically-friendlyinterface to the Spanish in Texas Corpus.• Involving teachers in the development of openeducational resources.• A model for open source corpus development.4
    5. 5. Introduction to the Corpus-to-Classroom Project5
    6. 6. Corpora in the Classroom: the promise• Corpus = a large, structured, collection of language• Benefits for language learning:• Naturalistic language use• Motivation• „Real‟ language• Discovery learning6
    7. 7. Example: CORPUS DEL ESPAÑOL7
    8. 8. Example: CORPUS DEL ESPAÑOL8Pros:• View examples of language in context.• Linguistic annotations enable searchingby part-of-speech, etc.
    9. 9. Example: CORPUS DEL ESPAÑOL9Cons:• Designed for researchers, not educators.• Limited utility to untrained end users.• Content not openly licensed.
    10. 10. Example: YouTube10
    11. 11. Example: YouTube11Pros:• Engaging video content, many with captions.• Many videos are openly licensed (CC-BY).
    12. 12. Example: YouTube12Cons• Searching is time-consuming.• Content can disappear without warning.• Sometimes blocked by K12 schools.
    13. 13. Our two-pronged approachSpinTX: Corpus-to-ClassroomGrant from the University of TexasLonghorn Innovation Fund forTechnology (2012-2013)13Spanish in Texas VideoCorpusA project of COERLL, aNational Foreign LanguageResource Center (2010-2014)
    14. 14. Spanish in Texas Corpus• Goals:• make publically available authentic data about variation inSpanish as spoken in Texas• for education• for research• encourage teachers/students/public to view local varietiesas a resource14A collection of sociolinguistic video interviews thatprovide rich content for language learning.
    15. 15. Corpus-to-Classroom• Goals:• develop a pedagogically friendly interface for the Spanish inTexas Corpus• involve teachers and learners in the development of openeducational resources based on the corpus• create a model for using open source tools and a pedagogicalinterface that can be adapted for any language corpus15A searchable collection of pre-selected, corrected, annotated clips from the largercorpus
    16. 16. About the Corpus16Spanish in Texas Corpus SpinTX Video Archive92 sociolinguistic interview videos(avg. 30–45 min)327 video clips from 33 speakers (avg.1-4 min)Transcribed (approx. 650,000 words) Transcribed (approx. 80,000 words)Time-synced video caption files Time-synced video caption filesTagged for linguistic features Tagged for linguistic and pedagogicalfeaturesCompletely open (no registrationrequired, open CC license)Teacher-friendly interface
    17. 17. 17
    18. 18. The SpinTX Video Archive: apedagogically-friendly interfaceto the Spanish in Texas Corpus18
    19. 19. Needs assessment with educators19
    20. 20. Needs assessment with educators20• How do you use authentic video in your teaching?• How do you find videos to use? What problems doyou encounter?• How can you imagine using the Spanish in Texasvideos in your classes?
    21. 21. Primary goals of the interface• Enable educators to easily find and use videos that suitthe curriculum.• Search by grammar point, theme, vocabulary, etc.• Enable accessibility and content openness.• Downloadable from open site with a license enabling remixing• Enable educators to curate sets of videos for comparisonand study.• Favoriting and tagging videos• Provide access to supporting materials (lessonplans, activity templates, etc).• Develop a community to share ready-made materials andtemplates21
    22. 22. Secondary goals of the interface• Employ in the development of materials for teachertraining.• Engage students as co-researchers.22
    23. 23. 23
    24. 24. Technical Overview of SpinTX Archive• Drupal 7• Taxonomy module integration• Community tags module• Apache Solr search engine• Keyword search• Faceted browsing24
    25. 25. Ideas for future development• Advanced search capability• support for wildcards• improved phrase searching• improved “keyword in context” result view• Data visualizations• word and/or tag clouds• language maps• Enhanced word-level annotations• hover over a word in a transcript and see all annotations25
    26. 26. Formative evaluation of Beta versionData collection methods:• Online user survey (http://goo.gl/4Lbbg)• Web analytics (navigation patterns, popular content)• Search analytics• User observation and feedback through ongoingworkshops and focus groups26
    27. 27. Formative evaluation of Beta versionData collection methods:• Online user survey (http://goo.gl/4Lbbg)• Web analytics (navigation patterns, popular content)• Search analytics• User observation and feedback through ongoingworkshops and focus groups27Results of formative evaluation will drive futuredevelopment of the interface.
    28. 28. Involving Teachers in theDevelopment of OER28
    29. 29. Workshops with Educators• Summer 2012 Workshop• ~100 secondary and college Spanish teachers• Fall 2012 Working Group• ~10 Univ. of Texas Spanish teachers• Spring 2013 Workshops• Multiple conferences & Univ. of Texas Spanish teachers• Summer 2013 Working Group• ~10 secondary and college Spanish teachers29
    30. 30. Sample materials from the community (1)30
    31. 31. 31
    32. 32. Sample materials from the community (2)• Idea from teacher workshop: Use videos for grammarlessons to develop the student‟s metalinguistic and criticalthinking skills as they pertain to language.• Searched and selected clips for lesson on “por vs. para”.• Lesson tested in heritage learners class.• Anecdotal evidence that video lessons were effective andmotivating to students.32
    33. 33. Current Templates• Four templates:• Cloze• Data-Driven Learning (DDL)• Variation• Schema33
    34. 34. Cloze Template34
    35. 35. Cloze Template: Activity35
    36. 36. Data-Driven Learning (DDL) Template36
    37. 37. Data-Driven Learning (DDL) Template:Activity37
    38. 38. Variation Template: Pre-class Preparation38
    39. 39. Variation Template: Activity39
    40. 40. Schema Template: Pre-class Preparation40
    41. 41. Schema Template: Activity41
    42. 42. Publication of OER• Templates and community-developed lesson plans will beavailable on the SpinTX website by August, 2013• We encourage the publication of videos on third-partyplatforms for remixing educational content.42
    43. 43. A Model for Open SourceCorpus Development43
    44. 44. Sharing development practices and code• Use of open source software and open API‟s• Custom code developed for the project• Public GitHub repository: http://github.com/coerll• Project documentation (research protocols, developmentprocesses and methodologies, etc):• Corpus-to-Classroom Blog: http://sites.la.utexas.edu/corpus-to-classroom/• “For Researchers” page onspanishintexas.orghttp://spanishintexas.org/for-researchers/44
    45. 45. Recruit „locally‟• Recruit and train interns• Internal Review Board training• Video shooting and audio recording• Practice interviews on site• Recruit family, friends, acquaintances• Any Spanish-speaking resident of TX• Conduct interviews in their home communities45
    46. 46. Interview protocol• Sampling of a large set of questions (~75)• from NPR Storycorps (Historias)• biographical information• Average Length: 30-45 min.• Language: Spanish and mixed• Consent form and talent release• Metadata on speaker and interviewer• Google docs46
    47. 47. Interview Metadata
    48. 48. Processing the Videos• Intake interview materials• create unique ID for video and forms• archive raw video and remove from camera• Video and transcript preparation• Edit and export videos using Final Cut Pro• Sound and image correction• Upload to Automatic Sync to be transcribed by bilingual transcriber• 3-5 day turnaround• Approx $85 per hour of video48
    49. 49. Original Transcript from Automatic Sync
    50. 50. Upload video and transcript to YouTube for syncing
    51. 51. Download SRT file
    52. 52. Prepare Transcript for TreeTagger
    53. 53. Run through TreeTagger
    54. 54. Combine Data from SRT File andTreeTagger File, and add additional Tags
    55. 55. Manual clip selection and description
    56. 56. Divide CSV Files and Videos into Clips andadjust Timings and Numberings
    57. 57. Automatic Pedagogical Annotation of Clips57
    58. 58. SpinTX Clip Data Published on GitHubhttp://www.github.com/coerll58
    59. 59. Questions?59
    60. 60. Links• SpinTX Video Archive:http://www.spintx.org• Spanish in Texas Corpus:http://www.spanishintexas.org• Slides from this Presentation will be posted at:http://www.slideshare.net/spanish_in_texas60

    ×