SlideShare a Scribd company logo
1 of 60
June 14, 2013 | IALLT Conference
A Video Corpus for Language Learning
Open Source Tools & Materials from the Corpus-to-Classroom Project
Who we are
• Rachael Gilg
• Project Manager / Web Developer
• Arthur Wendorf
• Educational Technologist / Developer / Spanish Instructor
• Martí Quixal
• Computational Linguist / Developer / Spanish Instructor
• Almeida Jacqueline Toribio & Barbara E. Bullock
• Project Co-Directors
• Carl Blyth
• Director of COERLL
2
3
Agenda
1. Introduction to the Corpus-to-Classroom Project
2. Project results:
• The SpinTX Video Archive: a pedagogically-friendly
interface to the Spanish in Texas Corpus.
• Involving teachers in the development of open
educational resources.
• A model for open source corpus development.
4
Introduction to the Corpus-to-
Classroom Project
5
Corpora in the Classroom: the promise
• Corpus = a large, structured, collection of language
• Benefits for language learning:
• Naturalistic language use
• Motivation
• „Real‟ language
• Discovery learning
6
Example: CORPUS DEL ESPAÑOL
7
Example: CORPUS DEL ESPAÑOL
8
Pros:
• View examples of language in context.
• Linguistic annotations enable searching
by part-of-speech, etc.
Example: CORPUS DEL ESPAÑOL
9
Cons:
• Designed for researchers, not educators.
• Limited utility to untrained end users.
• Content not openly licensed.
Example: YouTube
10
Example: YouTube
11
Pros:
• Engaging video content, many with captions.
• Many videos are openly licensed (CC-BY).
Example: YouTube
12
Cons
• Searching is time-consuming.
• Content can disappear without warning.
• Sometimes blocked by K12 schools.
Our two-pronged approach
SpinTX: Corpus-to-Classroom
Grant from the University of Texas
Longhorn Innovation Fund for
Technology (2012-2013)
13
Spanish in Texas Video
Corpus
A project of COERLL, a
National Foreign Language
Resource Center (2010-2014)
Spanish in Texas Corpus
• Goals:
• make publically available authentic data about variation in
Spanish as spoken in Texas
• for education
• for research
• encourage teachers/students/public to view local varieties
as a resource
14
A collection of sociolinguistic video interviews that
provide rich content for language learning.
Corpus-to-Classroom
• Goals:
• develop a pedagogically friendly interface for the Spanish in
Texas Corpus
• involve teachers and learners in the development of open
educational resources based on the corpus
• create a model for using open source tools and a pedagogical
interface that can be adapted for any language corpus
15
A searchable collection of pre-
selected, corrected, annotated clips from the larger
corpus
About the Corpus
16
Spanish in Texas Corpus SpinTX Video Archive
92 sociolinguistic interview videos
(avg. 30–45 min)
327 video clips from 33 speakers (avg.
1-4 min)
Transcribed (approx. 650,000 words) Transcribed (approx. 80,000 words)
Time-synced video caption files Time-synced video caption files
Tagged for linguistic features Tagged for linguistic and pedagogical
features
Completely open (no registration
required, open CC license)
Teacher-friendly interface
17
The SpinTX Video Archive: a
pedagogically-friendly interface
to the Spanish in Texas Corpus
18
Needs assessment with educators
19
Needs assessment with educators
20
• How do you use authentic video in your teaching?
• How do you find videos to use? What problems do
you encounter?
• How can you imagine using the Spanish in Texas
videos in your classes?
Primary goals of the interface
• Enable educators to easily find and use videos that suit
the curriculum.
• Search by grammar point, theme, vocabulary, etc.
• Enable accessibility and content openness.
• Downloadable from open site with a license enabling remixing
• Enable educators to curate sets of videos for comparison
and study.
• Favoriting and tagging videos
• Provide access to supporting materials (lesson
plans, activity templates, etc).
• Develop a community to share ready-made materials and
templates
21
Secondary goals of the interface
• Employ in the development of materials for teacher
training.
• Engage students as co-researchers.
22
23
Technical Overview of SpinTX Archive
• Drupal 7
• Taxonomy module integration
• Community tags module
• Apache Solr search engine
• Keyword search
• Faceted browsing
24
Ideas for future development
• Advanced search capability
• support for wildcards
• improved phrase searching
• improved “keyword in context” result view
• Data visualizations
• word and/or tag clouds
• language maps
• Enhanced word-level annotations
• hover over a word in a transcript and see all annotations
25
Formative evaluation of Beta version
Data collection methods:
• Online user survey (http://goo.gl/4Lbbg)
• Web analytics (navigation patterns, popular content)
• Search analytics
• User observation and feedback through ongoing
workshops and focus groups
26
Formative evaluation of Beta version
Data collection methods:
• Online user survey (http://goo.gl/4Lbbg)
• Web analytics (navigation patterns, popular content)
• Search analytics
• User observation and feedback through ongoing
workshops and focus groups
27
Results of formative evaluation will drive future
development of the interface.
Involving Teachers in the
Development of OER
28
Workshops with Educators
• Summer 2012 Workshop
• ~100 secondary and college Spanish teachers
• Fall 2012 Working Group
• ~10 Univ. of Texas Spanish teachers
• Spring 2013 Workshops
• Multiple conferences & Univ. of Texas Spanish teachers
• Summer 2013 Working Group
• ~10 secondary and college Spanish teachers
29
Sample materials from the community (1)
30
31
Sample materials from the community (2)
• Idea from teacher workshop: Use videos for grammar
lessons to develop the student‟s metalinguistic and critical
thinking skills as they pertain to language.
• Searched and selected clips for lesson on “por vs. para”.
• Lesson tested in heritage learners class.
• Anecdotal evidence that video lessons were effective and
motivating to students.
32
Current Templates
• Four templates:
• Cloze
• Data-Driven Learning (DDL)
• Variation
• Schema
33
Cloze Template
34
Cloze Template: Activity
35
Data-Driven Learning (DDL) Template
36
Data-Driven Learning (DDL) Template:
Activity
37
Variation Template: Pre-class Preparation
38
Variation Template: Activity
39
Schema Template: Pre-class Preparation
40
Schema Template: Activity
41
Publication of OER
• Templates and community-developed lesson plans will be
available on the SpinTX website by August, 2013
• We encourage the publication of videos on third-party
platforms for remixing educational content.
42
A Model for Open Source
Corpus Development
43
Sharing development practices and code
• Use of open source software and open API‟s
• Custom code developed for the project
• Public GitHub repository: http://github.com/coerll
• Project documentation (research protocols, development
processes and methodologies, etc):
• Corpus-to-Classroom Blog: http://sites.la.utexas.edu/corpus-to-
classroom/
• “For Researchers” page on
spanishintexas.orghttp://spanishintexas.org/for-researchers/
44
Recruit „locally‟
• Recruit and train interns
• Internal Review Board training
• Video shooting and audio recording
• Practice interviews on site
• Recruit family, friends, acquaintances
• Any Spanish-speaking resident of TX
• Conduct interviews in their home communities
45
Interview protocol
• Sampling of a large set of questions (~75)
• from NPR Storycorps (Historias)
• biographical information
• Average Length: 30-45 min.
• Language: Spanish and mixed
• Consent form and talent release
• Metadata on speaker and interviewer
• Google docs
46
Interview Metadata
Processing the Videos
• Intake interview materials
• create unique ID for video and forms
• archive raw video and remove from camera
• Video and transcript preparation
• Edit and export videos using Final Cut Pro
• Sound and image correction
• Upload to Automatic Sync to be transcribed by bilingual transcriber
• 3-5 day turnaround
• Approx $85 per hour of video
48
Original Transcript from Automatic Sync
Upload video and transcript to YouTube for syncing
Download SRT file
Prepare Transcript for TreeTagger
Run through TreeTagger
Combine Data from SRT File and
TreeTagger File, and add additional Tags
Manual clip selection and description
Divide CSV Files and Videos into Clips and
adjust Timings and Numberings
Automatic Pedagogical Annotation of Clips
57
SpinTX Clip Data Published on GitHub
http://www.github.com/coerll
58
Questions?
59
Links
• SpinTX Video Archive:
http://www.spintx.org
• Spanish in Texas Corpus:
http://www.spanishintexas.org
• Slides from this Presentation will be posted at:
http://www.slideshare.net/spanish_in_texas
60

More Related Content

Similar to A Video Corpus for Language Learning: Open Source Tools & Materials from the Corpus-to-Classroom Project

ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...
ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...
ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...
eaquals
 
Designing for Diversity: Creating Learning Experiences that Travel the Globe
Designing for Diversity: Creating Learning Experiences that Travel the GlobeDesigning for Diversity: Creating Learning Experiences that Travel the Globe
Designing for Diversity: Creating Learning Experiences that Travel the Globe
Una Daly
 
How Open Education Practices Support Student Centered Design & Accessibility
How Open Education Practices Support Student Centered Design & AccessibilityHow Open Education Practices Support Student Centered Design & Accessibility
How Open Education Practices Support Student Centered Design & Accessibility
Una Daly
 
Target Your Training: Techniques to Adapt Your Content to Meet Your Students ...
Target Your Training: Techniques to Adapt Your Content to Meet Your Students ...Target Your Training: Techniques to Adapt Your Content to Meet Your Students ...
Target Your Training: Techniques to Adapt Your Content to Meet Your Students ...
National Council on Interpreting in Health Care (NCIHC)
 

Similar to A Video Corpus for Language Learning: Open Source Tools & Materials from the Corpus-to-Classroom Project (20)

OER: insights into a multilingual landscape - EUROCALL 2014 conference
OER: insights into a multilingual landscape - EUROCALL 2014 conference  OER: insights into a multilingual landscape - EUROCALL 2014 conference
OER: insights into a multilingual landscape - EUROCALL 2014 conference
 
Testing
TestingTesting
Testing
 
ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...
ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...
ukas Bleichenbacher & Richard Rossner: Towards a Common European Framework fo...
 
Designing for Diversity: Creating Learning Experiences that Travel the Globe
Designing for Diversity: Creating Learning Experiences that Travel the GlobeDesigning for Diversity: Creating Learning Experiences that Travel the Globe
Designing for Diversity: Creating Learning Experiences that Travel the Globe
 
Training Heritage Speakers: A Journey Worth Taking
Training Heritage Speakers: A Journey Worth TakingTraining Heritage Speakers: A Journey Worth Taking
Training Heritage Speakers: A Journey Worth Taking
 
AudioVisuals In the Disciplines: Developing libraries of recommended TV and r...
AudioVisuals In the Disciplines: Developing libraries of recommended TV and r...AudioVisuals In the Disciplines: Developing libraries of recommended TV and r...
AudioVisuals In the Disciplines: Developing libraries of recommended TV and r...
 
Two Hot Topics in Online Language Learning: Corpus Linguistics and Telecollab...
Two Hot Topics in Online Language Learning: Corpus Linguistics and Telecollab...Two Hot Topics in Online Language Learning: Corpus Linguistics and Telecollab...
Two Hot Topics in Online Language Learning: Corpus Linguistics and Telecollab...
 
CCCOER Sept 24 advisory
CCCOER Sept 24 advisoryCCCOER Sept 24 advisory
CCCOER Sept 24 advisory
 
Kirsten Holt The material writer’s toolkit for success
Kirsten Holt The material writer’s toolkit for successKirsten Holt The material writer’s toolkit for success
Kirsten Holt The material writer’s toolkit for success
 
The Open Science training hub FOSTER Plus - new resources and courses
The Open Science training hub FOSTER Plus - new resources and coursesThe Open Science training hub FOSTER Plus - new resources and courses
The Open Science training hub FOSTER Plus - new resources and courses
 
Using pedagogic corpora in ELT
Using pedagogic corpora in ELTUsing pedagogic corpora in ELT
Using pedagogic corpora in ELT
 
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
 
Blended Learning-Best Practices
Blended Learning-Best PracticesBlended Learning-Best Practices
Blended Learning-Best Practices
 
OER Workshop
OER Workshop OER Workshop
OER Workshop
 
How Open Education Practices Support Student Centered Design & Accessibility
How Open Education Practices Support Student Centered Design & AccessibilityHow Open Education Practices Support Student Centered Design & Accessibility
How Open Education Practices Support Student Centered Design & Accessibility
 
Catering for linguistic domain specialisations through computer-assisted lang...
Catering for linguistic domain specialisations through computer-assisted lang...Catering for linguistic domain specialisations through computer-assisted lang...
Catering for linguistic domain specialisations through computer-assisted lang...
 
Target Your Training: Techniques to Adapt Your Content to Meet Your Students ...
Target Your Training: Techniques to Adapt Your Content to Meet Your Students ...Target Your Training: Techniques to Adapt Your Content to Meet Your Students ...
Target Your Training: Techniques to Adapt Your Content to Meet Your Students ...
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"
 
The Open Education Handbook
The Open Education HandbookThe Open Education Handbook
The Open Education Handbook
 
Pedagogy, technology and training for language learning and teaching: the ECM...
Pedagogy, technology and training for language learning and teaching: the ECM...Pedagogy, technology and training for language learning and teaching: the ECM...
Pedagogy, technology and training for language learning and teaching: the ECM...
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 

A Video Corpus for Language Learning: Open Source Tools & Materials from the Corpus-to-Classroom Project

  • 1. June 14, 2013 | IALLT Conference A Video Corpus for Language Learning Open Source Tools & Materials from the Corpus-to-Classroom Project
  • 2. Who we are • Rachael Gilg • Project Manager / Web Developer • Arthur Wendorf • Educational Technologist / Developer / Spanish Instructor • Martí Quixal • Computational Linguist / Developer / Spanish Instructor • Almeida Jacqueline Toribio & Barbara E. Bullock • Project Co-Directors • Carl Blyth • Director of COERLL 2
  • 3. 3
  • 4. Agenda 1. Introduction to the Corpus-to-Classroom Project 2. Project results: • The SpinTX Video Archive: a pedagogically-friendly interface to the Spanish in Texas Corpus. • Involving teachers in the development of open educational resources. • A model for open source corpus development. 4
  • 5. Introduction to the Corpus-to- Classroom Project 5
  • 6. Corpora in the Classroom: the promise • Corpus = a large, structured, collection of language • Benefits for language learning: • Naturalistic language use • Motivation • „Real‟ language • Discovery learning 6
  • 7. Example: CORPUS DEL ESPAÑOL 7
  • 8. Example: CORPUS DEL ESPAÑOL 8 Pros: • View examples of language in context. • Linguistic annotations enable searching by part-of-speech, etc.
  • 9. Example: CORPUS DEL ESPAÑOL 9 Cons: • Designed for researchers, not educators. • Limited utility to untrained end users. • Content not openly licensed.
  • 11. Example: YouTube 11 Pros: • Engaging video content, many with captions. • Many videos are openly licensed (CC-BY).
  • 12. Example: YouTube 12 Cons • Searching is time-consuming. • Content can disappear without warning. • Sometimes blocked by K12 schools.
  • 13. Our two-pronged approach SpinTX: Corpus-to-Classroom Grant from the University of Texas Longhorn Innovation Fund for Technology (2012-2013) 13 Spanish in Texas Video Corpus A project of COERLL, a National Foreign Language Resource Center (2010-2014)
  • 14. Spanish in Texas Corpus • Goals: • make publically available authentic data about variation in Spanish as spoken in Texas • for education • for research • encourage teachers/students/public to view local varieties as a resource 14 A collection of sociolinguistic video interviews that provide rich content for language learning.
  • 15. Corpus-to-Classroom • Goals: • develop a pedagogically friendly interface for the Spanish in Texas Corpus • involve teachers and learners in the development of open educational resources based on the corpus • create a model for using open source tools and a pedagogical interface that can be adapted for any language corpus 15 A searchable collection of pre- selected, corrected, annotated clips from the larger corpus
  • 16. About the Corpus 16 Spanish in Texas Corpus SpinTX Video Archive 92 sociolinguistic interview videos (avg. 30–45 min) 327 video clips from 33 speakers (avg. 1-4 min) Transcribed (approx. 650,000 words) Transcribed (approx. 80,000 words) Time-synced video caption files Time-synced video caption files Tagged for linguistic features Tagged for linguistic and pedagogical features Completely open (no registration required, open CC license) Teacher-friendly interface
  • 17. 17
  • 18. The SpinTX Video Archive: a pedagogically-friendly interface to the Spanish in Texas Corpus 18
  • 19. Needs assessment with educators 19
  • 20. Needs assessment with educators 20 • How do you use authentic video in your teaching? • How do you find videos to use? What problems do you encounter? • How can you imagine using the Spanish in Texas videos in your classes?
  • 21. Primary goals of the interface • Enable educators to easily find and use videos that suit the curriculum. • Search by grammar point, theme, vocabulary, etc. • Enable accessibility and content openness. • Downloadable from open site with a license enabling remixing • Enable educators to curate sets of videos for comparison and study. • Favoriting and tagging videos • Provide access to supporting materials (lesson plans, activity templates, etc). • Develop a community to share ready-made materials and templates 21
  • 22. Secondary goals of the interface • Employ in the development of materials for teacher training. • Engage students as co-researchers. 22
  • 23. 23
  • 24. Technical Overview of SpinTX Archive • Drupal 7 • Taxonomy module integration • Community tags module • Apache Solr search engine • Keyword search • Faceted browsing 24
  • 25. Ideas for future development • Advanced search capability • support for wildcards • improved phrase searching • improved “keyword in context” result view • Data visualizations • word and/or tag clouds • language maps • Enhanced word-level annotations • hover over a word in a transcript and see all annotations 25
  • 26. Formative evaluation of Beta version Data collection methods: • Online user survey (http://goo.gl/4Lbbg) • Web analytics (navigation patterns, popular content) • Search analytics • User observation and feedback through ongoing workshops and focus groups 26
  • 27. Formative evaluation of Beta version Data collection methods: • Online user survey (http://goo.gl/4Lbbg) • Web analytics (navigation patterns, popular content) • Search analytics • User observation and feedback through ongoing workshops and focus groups 27 Results of formative evaluation will drive future development of the interface.
  • 28. Involving Teachers in the Development of OER 28
  • 29. Workshops with Educators • Summer 2012 Workshop • ~100 secondary and college Spanish teachers • Fall 2012 Working Group • ~10 Univ. of Texas Spanish teachers • Spring 2013 Workshops • Multiple conferences & Univ. of Texas Spanish teachers • Summer 2013 Working Group • ~10 secondary and college Spanish teachers 29
  • 30. Sample materials from the community (1) 30
  • 31. 31
  • 32. Sample materials from the community (2) • Idea from teacher workshop: Use videos for grammar lessons to develop the student‟s metalinguistic and critical thinking skills as they pertain to language. • Searched and selected clips for lesson on “por vs. para”. • Lesson tested in heritage learners class. • Anecdotal evidence that video lessons were effective and motivating to students. 32
  • 33. Current Templates • Four templates: • Cloze • Data-Driven Learning (DDL) • Variation • Schema 33
  • 37. Data-Driven Learning (DDL) Template: Activity 37
  • 40. Schema Template: Pre-class Preparation 40
  • 42. Publication of OER • Templates and community-developed lesson plans will be available on the SpinTX website by August, 2013 • We encourage the publication of videos on third-party platforms for remixing educational content. 42
  • 43. A Model for Open Source Corpus Development 43
  • 44. Sharing development practices and code • Use of open source software and open API‟s • Custom code developed for the project • Public GitHub repository: http://github.com/coerll • Project documentation (research protocols, development processes and methodologies, etc): • Corpus-to-Classroom Blog: http://sites.la.utexas.edu/corpus-to- classroom/ • “For Researchers” page on spanishintexas.orghttp://spanishintexas.org/for-researchers/ 44
  • 45. Recruit „locally‟ • Recruit and train interns • Internal Review Board training • Video shooting and audio recording • Practice interviews on site • Recruit family, friends, acquaintances • Any Spanish-speaking resident of TX • Conduct interviews in their home communities 45
  • 46. Interview protocol • Sampling of a large set of questions (~75) • from NPR Storycorps (Historias) • biographical information • Average Length: 30-45 min. • Language: Spanish and mixed • Consent form and talent release • Metadata on speaker and interviewer • Google docs 46
  • 48. Processing the Videos • Intake interview materials • create unique ID for video and forms • archive raw video and remove from camera • Video and transcript preparation • Edit and export videos using Final Cut Pro • Sound and image correction • Upload to Automatic Sync to be transcribed by bilingual transcriber • 3-5 day turnaround • Approx $85 per hour of video 48
  • 49. Original Transcript from Automatic Sync
  • 50. Upload video and transcript to YouTube for syncing
  • 54. Combine Data from SRT File and TreeTagger File, and add additional Tags
  • 55. Manual clip selection and description
  • 56. Divide CSV Files and Videos into Clips and adjust Timings and Numberings
  • 58. SpinTX Clip Data Published on GitHub http://www.github.com/coerll 58
  • 60. Links • SpinTX Video Archive: http://www.spintx.org • Spanish in Texas Corpus: http://www.spanishintexas.org • Slides from this Presentation will be posted at: http://www.slideshare.net/spanish_in_texas 60

Editor's Notes

  1. Results: still in progress!
  2. Will introduce corpora in general, our source corpus, and the pedagogical corpus
  3. Discuss examples briefly one at a time.How frequently do teachers use them?How easy are they to use?Emphasis on YouTube as probably the most popular in language classes, but hard to use.
  4. Considering the pros and cons of these types of corpus interfaces, we took a two-pronged approach to developing a pedagogically friendly corpus.One the one hand there is the Spanish in Texas project – collecting sociolinguistic video interviews since 2010Recently got a grant focused on developing a pedagogically friendly interface to this existing corpus.
  5. Both for research and for education dual purpseShow that language is alive and to view local varieties positively rather than negatively
  6. To give you a sense of the scope of the corpus we are working with.
  7. Will introduce corpora in general, our source corpus, and the pedagogical corpus
  8. We asked teachers how they use videos and how they would like to use videos. (interviews and focus groups)
  9. We asked teachers how they use videos and how they would like to use videos. (interviews and focus groups)
  10. Teachers of heritage learners can learn about local variationInterviews collected by students can be contributed to the corpus
  11. 1. Anonymous userWatch intro video.Show search criteria: topics, grammar, pragmatics, keywords, etc.Show video page: related items, transcripts with highlighting, sharing & downloading tabs2. Registered userHow to favorite and tag a videoTagged video lists
  12. We asked teachers how they use videos and how they would like to use videos.Here is how we havemet their needs
  13. Observe how teachers are using the system to develop OER
  14. Observe how teachers are using the system to develop OER
  15. But that’s not all!
  16. This will be an ongoing process that will hopefully eventually be taken over by the users.
  17. This will be an ongoing process that will hopefully eventually be taken over by the users.
  18. This will be an ongoing process that will hopefully eventually be taken over by the users.
  19. This will be an ongoing process that will hopefully eventually be taken over by the users.
  20. Pull up favorited videoHide target wordsProject video and cloze text in front of class
  21. Discuss prescriptive rules for target as a class.Students pull up worksheet (example)Students complete worksheet by finding and recording examples, and then indicating whether they think it is a standard or non-standard use
  22. This will be an ongoing process that will hopefully eventually be taken over by the users.
  23. 5 guidelines for developing open corporaWill also illustrate how we have implemented each guideline