Your SlideShare is downloading. ×
A Video Corpus for Language Learning: Open Source Tools & Materials from the Corpus-to-Classroom Project
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

A Video Corpus for Language Learning: Open Source Tools & Materials from the Corpus-to-Classroom Project

370
views

Published on

Presentation at IALLT 2013

Presentation at IALLT 2013

Published in: Education, Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
370
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Results: still in progress!
  • Will introduce corpora in general, our source corpus, and the pedagogical corpus
  • Discuss examples briefly one at a time.How frequently do teachers use them?How easy are they to use?Emphasis on YouTube as probably the most popular in language classes, but hard to use.
  • Considering the pros and cons of these types of corpus interfaces, we took a two-pronged approach to developing a pedagogically friendly corpus.One the one hand there is the Spanish in Texas project – collecting sociolinguistic video interviews since 2010Recently got a grant focused on developing a pedagogically friendly interface to this existing corpus.
  • Both for research and for education dual purpseShow that language is alive and to view local varieties positively rather than negatively
  • To give you a sense of the scope of the corpus we are working with.
  • Will introduce corpora in general, our source corpus, and the pedagogical corpus
  • We asked teachers how they use videos and how they would like to use videos. (interviews and focus groups)
  • We asked teachers how they use videos and how they would like to use videos. (interviews and focus groups)
  • Teachers of heritage learners can learn about local variationInterviews collected by students can be contributed to the corpus
  • 1. Anonymous userWatch intro video.Show search criteria: topics, grammar, pragmatics, keywords, etc.Show video page: related items, transcripts with highlighting, sharing & downloading tabs2. Registered userHow to favorite and tag a videoTagged video lists
  • We asked teachers how they use videos and how they would like to use videos.Here is how we havemet their needs
  • Observe how teachers are using the system to develop OER
  • Observe how teachers are using the system to develop OER
  • But that’s not all!
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • Pull up favorited videoHide target wordsProject video and cloze text in front of class
  • Discuss prescriptive rules for target as a class.Students pull up worksheet (example)Students complete worksheet by finding and recording examples, and then indicating whether they think it is a standard or non-standard use
  • This will be an ongoing process that will hopefully eventually be taken over by the users.
  • 5 guidelines for developing open corporaWill also illustrate how we have implemented each guideline
  • Transcript

    • 1. June 14, 2013 | IALLT ConferenceA Video Corpus for Language LearningOpen Source Tools & Materials from the Corpus-to-Classroom Project
    • 2. Who we are• Rachael Gilg• Project Manager / Web Developer• Arthur Wendorf• Educational Technologist / Developer / Spanish Instructor• Martí Quixal• Computational Linguist / Developer / Spanish Instructor• Almeida Jacqueline Toribio & Barbara E. Bullock• Project Co-Directors• Carl Blyth• Director of COERLL2
    • 3. 3
    • 4. Agenda1. Introduction to the Corpus-to-Classroom Project2. Project results:• The SpinTX Video Archive: a pedagogically-friendlyinterface to the Spanish in Texas Corpus.• Involving teachers in the development of openeducational resources.• A model for open source corpus development.4
    • 5. Introduction to the Corpus-to-Classroom Project5
    • 6. Corpora in the Classroom: the promise• Corpus = a large, structured, collection of language• Benefits for language learning:• Naturalistic language use• Motivation• „Real‟ language• Discovery learning6
    • 7. Example: CORPUS DEL ESPAÑOL7
    • 8. Example: CORPUS DEL ESPAÑOL8Pros:• View examples of language in context.• Linguistic annotations enable searchingby part-of-speech, etc.
    • 9. Example: CORPUS DEL ESPAÑOL9Cons:• Designed for researchers, not educators.• Limited utility to untrained end users.• Content not openly licensed.
    • 10. Example: YouTube10
    • 11. Example: YouTube11Pros:• Engaging video content, many with captions.• Many videos are openly licensed (CC-BY).
    • 12. Example: YouTube12Cons• Searching is time-consuming.• Content can disappear without warning.• Sometimes blocked by K12 schools.
    • 13. Our two-pronged approachSpinTX: Corpus-to-ClassroomGrant from the University of TexasLonghorn Innovation Fund forTechnology (2012-2013)13Spanish in Texas VideoCorpusA project of COERLL, aNational Foreign LanguageResource Center (2010-2014)
    • 14. Spanish in Texas Corpus• Goals:• make publically available authentic data about variation inSpanish as spoken in Texas• for education• for research• encourage teachers/students/public to view local varietiesas a resource14A collection of sociolinguistic video interviews thatprovide rich content for language learning.
    • 15. Corpus-to-Classroom• Goals:• develop a pedagogically friendly interface for the Spanish inTexas Corpus• involve teachers and learners in the development of openeducational resources based on the corpus• create a model for using open source tools and a pedagogicalinterface that can be adapted for any language corpus15A searchable collection of pre-selected, corrected, annotated clips from the largercorpus
    • 16. About the Corpus16Spanish in Texas Corpus SpinTX Video Archive92 sociolinguistic interview videos(avg. 30–45 min)327 video clips from 33 speakers (avg.1-4 min)Transcribed (approx. 650,000 words) Transcribed (approx. 80,000 words)Time-synced video caption files Time-synced video caption filesTagged for linguistic features Tagged for linguistic and pedagogicalfeaturesCompletely open (no registrationrequired, open CC license)Teacher-friendly interface
    • 17. 17
    • 18. The SpinTX Video Archive: apedagogically-friendly interfaceto the Spanish in Texas Corpus18
    • 19. Needs assessment with educators19
    • 20. Needs assessment with educators20• How do you use authentic video in your teaching?• How do you find videos to use? What problems doyou encounter?• How can you imagine using the Spanish in Texasvideos in your classes?
    • 21. Primary goals of the interface• Enable educators to easily find and use videos that suitthe curriculum.• Search by grammar point, theme, vocabulary, etc.• Enable accessibility and content openness.• Downloadable from open site with a license enabling remixing• Enable educators to curate sets of videos for comparisonand study.• Favoriting and tagging videos• Provide access to supporting materials (lessonplans, activity templates, etc).• Develop a community to share ready-made materials andtemplates21
    • 22. Secondary goals of the interface• Employ in the development of materials for teachertraining.• Engage students as co-researchers.22
    • 23. 23
    • 24. Technical Overview of SpinTX Archive• Drupal 7• Taxonomy module integration• Community tags module• Apache Solr search engine• Keyword search• Faceted browsing24
    • 25. Ideas for future development• Advanced search capability• support for wildcards• improved phrase searching• improved “keyword in context” result view• Data visualizations• word and/or tag clouds• language maps• Enhanced word-level annotations• hover over a word in a transcript and see all annotations25
    • 26. Formative evaluation of Beta versionData collection methods:• Online user survey (http://goo.gl/4Lbbg)• Web analytics (navigation patterns, popular content)• Search analytics• User observation and feedback through ongoingworkshops and focus groups26
    • 27. Formative evaluation of Beta versionData collection methods:• Online user survey (http://goo.gl/4Lbbg)• Web analytics (navigation patterns, popular content)• Search analytics• User observation and feedback through ongoingworkshops and focus groups27Results of formative evaluation will drive futuredevelopment of the interface.
    • 28. Involving Teachers in theDevelopment of OER28
    • 29. Workshops with Educators• Summer 2012 Workshop• ~100 secondary and college Spanish teachers• Fall 2012 Working Group• ~10 Univ. of Texas Spanish teachers• Spring 2013 Workshops• Multiple conferences & Univ. of Texas Spanish teachers• Summer 2013 Working Group• ~10 secondary and college Spanish teachers29
    • 30. Sample materials from the community (1)30
    • 31. 31
    • 32. Sample materials from the community (2)• Idea from teacher workshop: Use videos for grammarlessons to develop the student‟s metalinguistic and criticalthinking skills as they pertain to language.• Searched and selected clips for lesson on “por vs. para”.• Lesson tested in heritage learners class.• Anecdotal evidence that video lessons were effective andmotivating to students.32
    • 33. Current Templates• Four templates:• Cloze• Data-Driven Learning (DDL)• Variation• Schema33
    • 34. Cloze Template34
    • 35. Cloze Template: Activity35
    • 36. Data-Driven Learning (DDL) Template36
    • 37. Data-Driven Learning (DDL) Template:Activity37
    • 38. Variation Template: Pre-class Preparation38
    • 39. Variation Template: Activity39
    • 40. Schema Template: Pre-class Preparation40
    • 41. Schema Template: Activity41
    • 42. Publication of OER• Templates and community-developed lesson plans will beavailable on the SpinTX website by August, 2013• We encourage the publication of videos on third-partyplatforms for remixing educational content.42
    • 43. A Model for Open SourceCorpus Development43
    • 44. Sharing development practices and code• Use of open source software and open API‟s• Custom code developed for the project• Public GitHub repository: http://github.com/coerll• Project documentation (research protocols, developmentprocesses and methodologies, etc):• Corpus-to-Classroom Blog: http://sites.la.utexas.edu/corpus-to-classroom/• “For Researchers” page onspanishintexas.orghttp://spanishintexas.org/for-researchers/44
    • 45. Recruit „locally‟• Recruit and train interns• Internal Review Board training• Video shooting and audio recording• Practice interviews on site• Recruit family, friends, acquaintances• Any Spanish-speaking resident of TX• Conduct interviews in their home communities45
    • 46. Interview protocol• Sampling of a large set of questions (~75)• from NPR Storycorps (Historias)• biographical information• Average Length: 30-45 min.• Language: Spanish and mixed• Consent form and talent release• Metadata on speaker and interviewer• Google docs46
    • 47. Interview Metadata
    • 48. Processing the Videos• Intake interview materials• create unique ID for video and forms• archive raw video and remove from camera• Video and transcript preparation• Edit and export videos using Final Cut Pro• Sound and image correction• Upload to Automatic Sync to be transcribed by bilingual transcriber• 3-5 day turnaround• Approx $85 per hour of video48
    • 49. Original Transcript from Automatic Sync
    • 50. Upload video and transcript to YouTube for syncing
    • 51. Download SRT file
    • 52. Prepare Transcript for TreeTagger
    • 53. Run through TreeTagger
    • 54. Combine Data from SRT File andTreeTagger File, and add additional Tags
    • 55. Manual clip selection and description
    • 56. Divide CSV Files and Videos into Clips andadjust Timings and Numberings
    • 57. Automatic Pedagogical Annotation of Clips57
    • 58. SpinTX Clip Data Published on GitHubhttp://www.github.com/coerll58
    • 59. Questions?59
    • 60. Links• SpinTX Video Archive:http://www.spintx.org• Spanish in Texas Corpus:http://www.spanishintexas.org• Slides from this Presentation will be posted at:http://www.slideshare.net/spanish_in_texas60