Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Media Suite: Unlocking Archives for Mixed
Media Scholarly Research
Roeland Ordelman - Technical coordinator CLARIAH Media ...
Media Studies
Focus on both “institutional”
data collections and collections
created by scholars
Welke data zitten in de Media Suite V3?
Radio & Television (1.88M items) Newspapers (60M pages)
Film (1129 films) Oral His...
Welke data zitten in de Media Suite V3?
MIXEDMEDIA
RESEARCH PILOTS
Cross-Medial Analysis of WW2
Eyewitness Testimonies
Cross-media research of public debates
on drugs and re...
Media Suite: enabling Mixed Media Scholarly Research
with Multi-media Data in a Sustainable Infrastructure
CLARIAH Centers
Common Lab Research Infrastructure for the Arts and Humanities
SUSTAINABLE
üAvailable after the project
üM...
Architecture principles
1. Centers are responsible for data quality and to facilitate
access to data
2. Authorized access ...
1. Centers facilitate are responsible for data
quality and access to data
REGISTER COLLECTION
HARVEST COLLECTION METADATA
SEARCH COLLECTION
Collection Owner
Media Suite
Scholar
CKAN web-based
open...
Example: DANS registers set Oral History
Common Lab Research Infrastructure for the Humanities
“METADATA ARCHEOLOGY”
Manual effort to describe metadata fields
ISSUE:
Resources
manual effort
Tools for inspection of metadata
Common Lab Research Infrastructure for the Humanities
2. Authorized access using a federated
authentication mechanism
Secure play-out and viewing
ISSUE:
Not always
available
Federated login
3. Data is connected to a shared
“workspace” (VRE) for analysis
ISSUE:
Currently semi-
shared
WORKSPACE
ü Create virtual personal
mixed media collections
ü Create projects
ü Stores annotations
ü Upload personal colle...
Data analysis: Jupyter Notebooks or NLP
Common Lab Research Infrastructure for the Humanities
ISSUE:
Robust pipelines
Write your own (Python)
code to analyze the data
in the Media Suite
ISSUE:
expertise
Example
output
Jupyter
Notebook
Auto Metadata Extraction –
Large scale speech recognition
350K hours processed
until now
Poster slam 11:00 – 11:30 tomorrow
4. Provide exports of data for tools outside
Media Suite is just an
interface on the
underlying
infrastructure….
Speech Suite
Media Suite: Unlocking Archives for Mixed Media
Scholarly Research
Co-development
Community
building
User stories!
Short iterations
(sprints) of 2 weeks:
development &
testing
• Information...
Discussing issues with Gitter
Tracking issues with Github
SCHOLARLY PRIMITIVES
Unsworth, 2000
Blanke and Hedges, 2013
“Unlock data”
Distant reading
Close reading
1. Discovery & Inspection of data sets hidden in archives
2. Discovery of items in large archival data sets
3. Accessing i...
Search Oral History in Media Suite
Common Lab Research Infrastructure for the Humanities
Project
Search
Bookmark
Save
Bookmark
Save
Query
Bookmark view View Source
Annotation view View SourceAlignment
ISSUE:
Complex
interface
Private collection Apply enrichment or a “pipeline”
To appear:
Content-based Cross-media
Recommendations
1. Registered collections: persistent link (data management)
2. Registered collections: rights don’t permit (legal)
3. Met...
Main contribution: enabling mixed media scholarly
research for “institutional” multimedia collections
Bringing the Tools t...
Research coordination: Julia Noordegraaf @jjnoordegraaf
Technical coordination: Roeland Ordelman @roelandordelman
DEMO & Q...
Media Suite: Unlocking Archives for Mixed Media Scholarly Research
Media Suite: Unlocking Archives for Mixed Media Scholarly Research
Media Suite: Unlocking Archives for Mixed Media Scholarly Research
Media Suite: Unlocking Archives for Mixed Media Scholarly Research
Media Suite: Unlocking Archives for Mixed Media Scholarly Research
Media Suite: Unlocking Archives for Mixed Media Scholarly Research
Media Suite: Unlocking Archives for Mixed Media Scholarly Research
Media Suite: Unlocking Archives for Mixed Media Scholarly Research
Upcoming SlideShare
Loading in …5
×

of

Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 1 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 2 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 3 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 4 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 5 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 6 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 7 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 8 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 9 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 10 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 11 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 12 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 13 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 14 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 15 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 16 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 17 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 18 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 19 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 20 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 21 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 22 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 23 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 24 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 25 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 26 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 27 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 28 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 29 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 30 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 31 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 32 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 33 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 34 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 35 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 36 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 37 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 38 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 39 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 40 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 41 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 42 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 43 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 44 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 45 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 46 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 47 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 48 Media Suite: Unlocking Archives for Mixed Media Scholarly Research  Slide 49
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

Media Suite: Unlocking Archives for Mixed Media Scholarly Research

Download to read offline

Presentation at the CLARIN 2018 Conference, October 2018, Pisa, Italy

Media Suite: Unlocking Archives for Mixed Media Scholarly Research

  1. 1. Media Suite: Unlocking Archives for Mixed Media Scholarly Research Roeland Ordelman - Technical coordinator CLARIAH Media Suite Netherlands Institute for Sound and Vision / University of Twente The Netherlands
  2. 2. Media Studies Focus on both “institutional” data collections and collections created by scholars
  3. 3. Welke data zitten in de Media Suite V3? Radio & Television (1.88M items) Newspapers (60M pages) Film (1129 films) Oral History (2744 interviews) MULTIMEDIA
  4. 4. Welke data zitten in de Media Suite V3? MIXEDMEDIA
  5. 5. RESEARCH PILOTS Cross-Medial Analysis of WW2 Eyewitness Testimonies Cross-media research of public debates on drugs and regulation Me and Myself: Tracing first person in documentary history in AV-collections Annotating EYE’s Jean Desmet Collection: Towards Mixed Media Analysis in Digital Media History Narrativizing Disruption: How exploratory search can support media researchers to interpret ‘disruptive’ media events as lucid narratives Remediation in Sports News clariah.nl/projecten/research-pilots
  6. 6. Media Suite: enabling Mixed Media Scholarly Research with Multi-media Data in a Sustainable Infrastructure
  7. 7. CLARIAH Centers Common Lab Research Infrastructure for the Arts and Humanities SUSTAINABLE üAvailable after the project üMaintenance and support üUpdates and upgrades
  8. 8. Architecture principles 1. Centers are responsible for data quality and to facilitate access to data 2. Authorized access using a federated authentication mechanism 3. Data is connected to a shared “workspace” (VRE) for various forms of analysis … 4. … that provides exports of data in various formats for using tools outside the closed environment 5. The Media Suite provides the interface on the underlying architecture
  9. 9. 1. Centers facilitate are responsible for data quality and access to data
  10. 10. REGISTER COLLECTION HARVEST COLLECTION METADATA SEARCH COLLECTION Collection Owner Media Suite Scholar CKAN web-based open source management system for the storage and distribution of open data Open Archive Initiative (OAI) ISSUE: Persistent link to source file ISSUE: IPR (e.g., no subtitles)
  11. 11. Example: DANS registers set Oral History Common Lab Research Infrastructure for the Humanities
  12. 12. “METADATA ARCHEOLOGY” Manual effort to describe metadata fields ISSUE: Resources manual effort
  13. 13. Tools for inspection of metadata Common Lab Research Infrastructure for the Humanities
  14. 14. 2. Authorized access using a federated authentication mechanism
  15. 15. Secure play-out and viewing ISSUE: Not always available
  16. 16. Federated login
  17. 17. 3. Data is connected to a shared “workspace” (VRE) for analysis ISSUE: Currently semi- shared
  18. 18. WORKSPACE ü Create virtual personal mixed media collections ü Create projects ü Stores annotations ü Upload personal collections ü Advanced Data Analysis (Jupyter Notebooks) ü Advanced Data processing ü Export annotations
  19. 19. Data analysis: Jupyter Notebooks or NLP Common Lab Research Infrastructure for the Humanities ISSUE: Robust pipelines
  20. 20. Write your own (Python) code to analyze the data in the Media Suite ISSUE: expertise
  21. 21. Example output Jupyter Notebook
  22. 22. Auto Metadata Extraction – Large scale speech recognition 350K hours processed until now
  23. 23. Poster slam 11:00 – 11:30 tomorrow
  24. 24. 4. Provide exports of data for tools outside
  25. 25. Media Suite is just an interface on the underlying infrastructure…. Speech Suite
  26. 26. Media Suite: Unlocking Archives for Mixed Media Scholarly Research
  27. 27. Co-development Community building User stories! Short iterations (sprints) of 2 weeks: development & testing • Information Specialist • Experienced DH Researcher Liaisons part of development team: Workshops, hack-a- thons, data-a-thons
  28. 28. Discussing issues with Gitter
  29. 29. Tracking issues with Github
  30. 30. SCHOLARLY PRIMITIVES Unsworth, 2000 Blanke and Hedges, 2013
  31. 31. “Unlock data” Distant reading Close reading
  32. 32. 1. Discovery & Inspection of data sets hidden in archives 2. Discovery of items in large archival data sets 3. Accessing items (play, view) from restricted data sets 4. Discovery of segments in time-based media 5. Relating and comparing data on the segment level DistantreadingClosereading
  33. 33. Search Oral History in Media Suite Common Lab Research Infrastructure for the Humanities
  34. 34. Project Search Bookmark Save Bookmark Save Query
  35. 35. Bookmark view View Source
  36. 36. Annotation view View SourceAlignment ISSUE: Complex interface
  37. 37. Private collection Apply enrichment or a “pipeline”
  38. 38. To appear: Content-based Cross-media Recommendations
  39. 39. 1. Registered collections: persistent link (data management) 2. Registered collections: rights don’t permit (legal) 3. Metadata archeology: manual resources (funding) 4. Play-out/view: not always available (funding) 5. Shared workspace: semi-shared (infra development) 6. Advanced analysis: expertise scholars (training) 7. Advanced analysis: robust pipelines (benchmarking) 8. Workspace: complex interface (interaction design) Issues/investments
  40. 40. Main contribution: enabling mixed media scholarly research for “institutional” multimedia collections Bringing the Tools to the Data: in progress but already useful: ü Unlocking the data, enabling distant/close reading ü Supporting the scholarly primitives ü Providing a workspace for saving annotations, creating collections and options for (advanced) analysis Summary…
  41. 41. Research coordination: Julia Noordegraaf @jjnoordegraaf Technical coordination: Roeland Ordelman @roelandordelman DEMO & QUESTIONS AT THE BAZAR mediasuite.clariah.nl
  • LindsayPalaparti

    Nov. 27, 2021

Presentation at the CLARIN 2018 Conference, October 2018, Pisa, Italy

Views

Total views

152

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

1

Shares

0

Comments

0

Likes

1

×