SlideShare a Scribd company logo
1 of 21
Karen Cariani
AAPB Project Director, WGBH
Senior Director, WGBH Media Library &
Archives
Using computational tools and
crowdsourcing games to increase
metadata and discoverability of digital
collections
Or…
“Can the Computer Do the Work?”
the situation
■72,000 digitized television and radio programs
■incomplete, inaccurate metadata records
■limited staff resources
■we need to know what we have in the
collection
■we have a responsibility to users to provide
access to the collection
■continued growth of the collection (content
potential: transforming content
into data
Computational Tools
Speech-to-text
Audio analysis
Image Analysis
Visualization of Data
How can we use them?
a crowdsourcing game
http://fixit.americanarchive.org
once corrected…
• JSON transcripts will be stored on AAPB’s Amazon S3 account
•
Transcripts will be indexed for keyword searching on the AAPB website
• Transcripts will be made available alongside the media on the record
page
• Transcripts can play as captions within the player
• Transcripts can be harvested via an API and used as a dataset for
research such as a digital humanities project
facebook.com/amarchivepub
@amarchivepub
americanarchive.org
http://fixit.americanarchive.or
#FixItAAPB

More Related Content

Similar to Using Computational Tools and Crowdsourcing Games to Increase Metadata and Discoverability of Digital Collections

Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...AIST
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big DataMiguel Pastor
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...openminted_eu
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Peter Mika
 
publishing production
publishing productionpublishing production
publishing productionEssam Obaid
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsJoshua Shinavier
 
AWS Summit Sydney 2014 | Closing Keynote - Dr Werner Vogels, VP & CTO, Amazon...
AWS Summit Sydney 2014 | Closing Keynote - Dr Werner Vogels, VP & CTO, Amazon...AWS Summit Sydney 2014 | Closing Keynote - Dr Werner Vogels, VP & CTO, Amazon...
AWS Summit Sydney 2014 | Closing Keynote - Dr Werner Vogels, VP & CTO, Amazon...Amazon Web Services
 
Building the Inform Semantic Publishing Ecosystem: from Author to Audience
Building the Inform Semantic Publishing Ecosystem: from Author to AudienceBuilding the Inform Semantic Publishing Ecosystem: from Author to Audience
Building the Inform Semantic Publishing Ecosystem: from Author to AudienceVital.AI
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialBarbara Starr
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
AWS Enterprise Day | Closing Keynote, Singapore - Dr Werner Vogels
AWS Enterprise Day | Closing Keynote, Singapore - Dr Werner VogelsAWS Enterprise Day | Closing Keynote, Singapore - Dr Werner Vogels
AWS Enterprise Day | Closing Keynote, Singapore - Dr Werner VogelsAmazon Web Services
 
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner VogelsBeyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner VogelsAmazon Web Services
 
APIS. Digitale biographische Blütenlese
APIS. Digitale biographische BlütenleseAPIS. Digitale biographische Blütenlese
APIS. Digitale biographische Blütenleseeveline wandl-vogt
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)Jeremy Cabral
 
Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Jonathan Seidman
 
Graphics101
Graphics101Graphics101
Graphics101bthat
 

Similar to Using Computational Tools and Crowdsourcing Games to Increase Metadata and Discoverability of Digital Collections (20)

Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!
 
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big Data
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
 
publishing production
publishing productionpublishing production
publishing production
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of Agents
 
AWS Summit Sydney 2014 | Closing Keynote - Dr Werner Vogels, VP & CTO, Amazon...
AWS Summit Sydney 2014 | Closing Keynote - Dr Werner Vogels, VP & CTO, Amazon...AWS Summit Sydney 2014 | Closing Keynote - Dr Werner Vogels, VP & CTO, Amazon...
AWS Summit Sydney 2014 | Closing Keynote - Dr Werner Vogels, VP & CTO, Amazon...
 
How AI connect dots for IoT
How AI connect dots for IoTHow AI connect dots for IoT
How AI connect dots for IoT
 
Building the Inform Semantic Publishing Ecosystem: from Author to Audience
Building the Inform Semantic Publishing Ecosystem: from Author to AudienceBuilding the Inform Semantic Publishing Ecosystem: from Author to Audience
Building the Inform Semantic Publishing Ecosystem: from Author to Audience
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
AWS Enterprise Day | Closing Keynote, Singapore - Dr Werner Vogels
AWS Enterprise Day | Closing Keynote, Singapore - Dr Werner VogelsAWS Enterprise Day | Closing Keynote, Singapore - Dr Werner Vogels
AWS Enterprise Day | Closing Keynote, Singapore - Dr Werner Vogels
 
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner VogelsBeyond the Fridge, The World of Connected Data - Dr Werner Vogels
Beyond the Fridge, The World of Connected Data - Dr Werner Vogels
 
APIS. Digitale biographische Blütenlese
APIS. Digitale biographische BlütenleseAPIS. Digitale biographische Blütenlese
APIS. Digitale biographische Blütenlese
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
 
Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010
 
Graphics101
Graphics101Graphics101
Graphics101
 
FFL & CNYH
FFL & CNYHFFL & CNYH
FFL & CNYH
 

More from WGBH Media Library and Archives

Implementing Samvera Open Source Technology at WGBH and the American Archive ...
Implementing Samvera Open Source Technology at WGBH and the American Archive ...Implementing Samvera Open Source Technology at WGBH and the American Archive ...
Implementing Samvera Open Source Technology at WGBH and the American Archive ...WGBH Media Library and Archives
 
American Archive of Public Broadcasting: a Digital Library for Teaching Media...
American Archive of Public Broadcasting: a Digital Library for Teaching Media...American Archive of Public Broadcasting: a Digital Library for Teaching Media...
American Archive of Public Broadcasting: a Digital Library for Teaching Media...WGBH Media Library and Archives
 
Accessibility of the American Archive of Public Broadcasting in Academic Libr...
Accessibility of the American Archive of Public Broadcasting in Academic Libr...Accessibility of the American Archive of Public Broadcasting in Academic Libr...
Accessibility of the American Archive of Public Broadcasting in Academic Libr...WGBH Media Library and Archives
 
How to Use the American Archive of Public Broadcasting as a Resource in the C...
How to Use the American Archive of Public Broadcasting as a Resource in the C...How to Use the American Archive of Public Broadcasting as a Resource in the C...
How to Use the American Archive of Public Broadcasting as a Resource in the C...WGBH Media Library and Archives
 
Putting the Pieces Together: Creating a National Educational Television Catalog
Putting the Pieces Together: Creating a National Educational Television CatalogPutting the Pieces Together: Creating a National Educational Television Catalog
Putting the Pieces Together: Creating a National Educational Television CatalogWGBH Media Library and Archives
 
DESIGN FOR CONTEXT: Cataloging and Linked Data for Exposing National Educatio...
DESIGN FOR CONTEXT: Cataloging and Linked Data for Exposing National Educatio...DESIGN FOR CONTEXT: Cataloging and Linked Data for Exposing National Educatio...
DESIGN FOR CONTEXT: Cataloging and Linked Data for Exposing National Educatio...WGBH Media Library and Archives
 
Preserving Your Station Legacy with the American Archive of Public Broadcasti...
Preserving Your Station Legacy with the American Archive of Public Broadcasti...Preserving Your Station Legacy with the American Archive of Public Broadcasti...
Preserving Your Station Legacy with the American Archive of Public Broadcasti...WGBH Media Library and Archives
 
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discover...
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discover...FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discover...
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discover...WGBH Media Library and Archives
 
Going Far by Going Together: Collaboration with Scholars and Other Allies
Going Far by Going Together: Collaboration with Scholars and Other AlliesGoing Far by Going Together: Collaboration with Scholars and Other Allies
Going Far by Going Together: Collaboration with Scholars and Other AlliesWGBH Media Library and Archives
 
Building AAPB Participation into Digitization Grant Proposals: Requirements, ...
Building AAPB Participation into Digitization Grant Proposals: Requirements, ...Building AAPB Participation into Digitization Grant Proposals: Requirements, ...
Building AAPB Participation into Digitization Grant Proposals: Requirements, ...WGBH Media Library and Archives
 
Building the AAPB: Inter-Institutional Preservation and Access Workflows
Building the AAPB: Inter-Institutional Preservation and Access WorkflowsBuilding the AAPB: Inter-Institutional Preservation and Access Workflows
Building the AAPB: Inter-Institutional Preservation and Access WorkflowsWGBH Media Library and Archives
 
Put it on your Bucket List: Navigating Copyright to Expose Digital AV Collect...
Put it on your Bucket List: Navigating Copyright to Expose Digital AV Collect...Put it on your Bucket List: Navigating Copyright to Expose Digital AV Collect...
Put it on your Bucket List: Navigating Copyright to Expose Digital AV Collect...WGBH Media Library and Archives
 
Keeping the Broadcast Historic Record: An Archive of Public Media in the Making
Keeping the Broadcast Historic Record: An Archive of Public Media in the MakingKeeping the Broadcast Historic Record: An Archive of Public Media in the Making
Keeping the Broadcast Historic Record: An Archive of Public Media in the MakingWGBH Media Library and Archives
 
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...WGBH Media Library and Archives
 

More from WGBH Media Library and Archives (20)

Engage Your Community to Celebrate Your History
Engage Your Community to Celebrate Your HistoryEngage Your Community to Celebrate Your History
Engage Your Community to Celebrate Your History
 
Wikipedia Editathon: How to Guide
Wikipedia Editathon: How to GuideWikipedia Editathon: How to Guide
Wikipedia Editathon: How to Guide
 
Implementing Samvera Open Source Technology at WGBH and the American Archive ...
Implementing Samvera Open Source Technology at WGBH and the American Archive ...Implementing Samvera Open Source Technology at WGBH and the American Archive ...
Implementing Samvera Open Source Technology at WGBH and the American Archive ...
 
American Archive of Public Broadcasting: a Digital Library for Teaching Media...
American Archive of Public Broadcasting: a Digital Library for Teaching Media...American Archive of Public Broadcasting: a Digital Library for Teaching Media...
American Archive of Public Broadcasting: a Digital Library for Teaching Media...
 
Accessibility of the American Archive of Public Broadcasting in Academic Libr...
Accessibility of the American Archive of Public Broadcasting in Academic Libr...Accessibility of the American Archive of Public Broadcasting in Academic Libr...
Accessibility of the American Archive of Public Broadcasting in Academic Libr...
 
How to Use the American Archive of Public Broadcasting as a Resource in the C...
How to Use the American Archive of Public Broadcasting as a Resource in the C...How to Use the American Archive of Public Broadcasting as a Resource in the C...
How to Use the American Archive of Public Broadcasting as a Resource in the C...
 
Putting the Pieces Together: Creating a National Educational Television Catalog
Putting the Pieces Together: Creating a National Educational Television CatalogPutting the Pieces Together: Creating a National Educational Television Catalog
Putting the Pieces Together: Creating a National Educational Television Catalog
 
DESIGN FOR CONTEXT: Cataloging and Linked Data for Exposing National Educatio...
DESIGN FOR CONTEXT: Cataloging and Linked Data for Exposing National Educatio...DESIGN FOR CONTEXT: Cataloging and Linked Data for Exposing National Educatio...
DESIGN FOR CONTEXT: Cataloging and Linked Data for Exposing National Educatio...
 
Preserving Your Station Legacy with the American Archive of Public Broadcasti...
Preserving Your Station Legacy with the American Archive of Public Broadcasti...Preserving Your Station Legacy with the American Archive of Public Broadcasti...
Preserving Your Station Legacy with the American Archive of Public Broadcasti...
 
Let the Computer Do the Work
Let the Computer Do the WorkLet the Computer Do the Work
Let the Computer Do the Work
 
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discover...
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discover...FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discover...
FIX IT - A Transcript Game to Make Historic Public Broadcasting More Discover...
 
Can the Computer and the Public Do the Metadata Work?
Can the Computer and the Public Do the Metadata Work?Can the Computer and the Public Do the Metadata Work?
Can the Computer and the Public Do the Metadata Work?
 
Going Far by Going Together: Collaboration with Scholars and Other Allies
Going Far by Going Together: Collaboration with Scholars and Other AlliesGoing Far by Going Together: Collaboration with Scholars and Other Allies
Going Far by Going Together: Collaboration with Scholars and Other Allies
 
Building AAPB Participation into Digitization Grant Proposals: Requirements, ...
Building AAPB Participation into Digitization Grant Proposals: Requirements, ...Building AAPB Participation into Digitization Grant Proposals: Requirements, ...
Building AAPB Participation into Digitization Grant Proposals: Requirements, ...
 
Building the AAPB: Inter-Institutional Preservation and Access Workflows
Building the AAPB: Inter-Institutional Preservation and Access WorkflowsBuilding the AAPB: Inter-Institutional Preservation and Access Workflows
Building the AAPB: Inter-Institutional Preservation and Access Workflows
 
Put it on your Bucket List: Navigating Copyright to Expose Digital AV Collect...
Put it on your Bucket List: Navigating Copyright to Expose Digital AV Collect...Put it on your Bucket List: Navigating Copyright to Expose Digital AV Collect...
Put it on your Bucket List: Navigating Copyright to Expose Digital AV Collect...
 
NET Collection Catalog Project
NET Collection Catalog ProjectNET Collection Catalog Project
NET Collection Catalog Project
 
Keeping the Broadcast Historic Record: An Archive of Public Media in the Making
Keeping the Broadcast Historic Record: An Archive of Public Media in the MakingKeeping the Broadcast Historic Record: An Archive of Public Media in the Making
Keeping the Broadcast Historic Record: An Archive of Public Media in the Making
 
PBCore RDF Ontology Hackathon | Code4Lib 2015
PBCore RDF Ontology Hackathon | Code4Lib 2015PBCore RDF Ontology Hackathon | Code4Lib 2015
PBCore RDF Ontology Hackathon | Code4Lib 2015
 
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
Challenges, Workflows, and Insights in the Collaboration to Preserve America'...
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Using Computational Tools and Crowdsourcing Games to Increase Metadata and Discoverability of Digital Collections

Editor's Notes

  1. And we are talking about ….. Using computational tools and crowdsourcing games to increase metadata and discoverability of digital collections
  2. We are WGBH, Pop-Up Archive and University of Texas at Austin School of Information. I am going to let Anne and Tanya introduce themselves and their organizations. We are discussing a project generously funded by IMLS. I am Karen Cariani, Senior Director of WGBH media library and archives, and project director for the american archive of public broadcasting.
  3. I am going to give a quick introduction on WGBH and the American Archive. WGBH as many of you know is the premiere pubic broadcasting station in Boston, producer of many core PBS programs such as NOVA, Frontline, American Experience, Antiques Roadshow, Masterpiece Theater. We are not only a TV producer and broadcaster, but we manage 2 radio stations in Boston area and 1 servicing the Cape and the Islands. And we oversee a TV station in western Ma.
  4. The American Archive is a collaboration between the Library of Congress and WGBH with a goal to preserve and make accessible significant public radio and television programs before they are lost to posterity. The American Archive is a digital archive with a website, americanarchive.org, the homepage of which you see here. Users anywhere in the U.S. can access a wide range of historical public television and radio programs from the late 1940s to the present. Our primary objective is to preserve public media and assure discoverability and access through a coordinated national effort. In doing this, we support content creators and current stewards of the materials, and facilitate the use of historical public broadcasting by researchers, educators, students, and others.
  5. As an aggregator of content, AAPB hopes to provide a centralized web portal of discovery for public media materials. The collection is growing with new additions. Access for research, educational, and informational purposes only. Due to rights restrictions, a portion (about 20,000 items) are available through our On-line Reading Room anywhere in the US. These items are also soon to be harvested by Digital Commonwealth and eventually available through the DPLA. Inclusion in the ORR is determined by analysis of types of programs and examination of individual series and programs – more is added as we have time to assess the materials. However, the entire collection of over 72,000 items is available for viewing on location at the Library of Congress and WGBH.
  6. As part of the initial project funded by CPB, the AAPB has 72,000 digitized tv and radio programs from about 100 stations across the country. Along with these digital files we received incomplete metadata records with very little descriptive data about the content or the program. We have limited staff resources to fully catalog the 72,000 items. We figured it would take a full time person about 32 years to watch everything, spending only 15 minutes per item cataloguing to complete the collection, all while we adding up to 25,000 items in annually. So you can do the math and figure out that even if we could afford a team of 10 people to just catalogue full time (and that is over ½ of my current staff), it would still take a long time and we would barely catch up cataloguing the new acquisitions. However, we need to know what we have, (it helps us determine rights and what we can make accessible) and we need to be able to make it findable for users, and do that, currently, we need to be able to expose text for search engines and indexers. So how to do you transform large amounts of audio and video into something searchable for search engines and indexers? How can we transform it into a dataset?
  7. We thought, this is a great opportunity for collaboration with computational tools and computer science field, but we need to understand each others work and the capabilities of what exist. Here are some of the tools available that can help us with our dilemma. With this IMLS funded project we are working with Pop-up archive to create speech to text transcripts of the entire collection, and with UT Texas to analyze the audio to help further identify speakers and sounds. And we will use a crowdsourcing game to help correct or fix the computer generated transcripts which will hopefully help further train the tools to improve.. We will not talk about image analysis.
  8. Experience has shown that most speech to text tools don’t output clean transcripts. Accurate transcripts are dependent on audio quality, speaker accents, background noise, etc, Given that our collection is from 100 different local tv and radio stations across the country, the variety of audio and audio quality varies widely. Some programs are in Spanish, some are musical performances, and nearly all begin with standard bars and tone for video recordings. The speech to text tool tries to interpret these sounds as text, and it makes a number of other mistakes too. WGBH has created a web based game to allow the public to help us fix and correct these transcripts. You are welcome to follow long with me if you have a computer as I walk through the game, and encouraged to play afterwards.
  9. The game has a terms of use that we need players to check off to make sure they understand that they can not use the content for anything but helping us correct the transcripts. We’ve kept the clips to only 5 mins in order to be able to take advantage of fair use.
  10. There are 3 games you can play – identify errors, suggest fixes, and validate fixes. You gain points for each action taken.
  11. You can set preferences on the type of content you would like to interact with. Or you can pick which station’s content you would like to work on. We are hoping to perhaps get stations to compete with each by getting their station volunteers and community to play against each other for more points. But we need to do a bit more development for that.
  12. Each iteration of a game lasts 5 minutes. But you can play multiple times for any length of time. Three lines of the transcript are active at once. You listen to the audio, see the line highlighted and click on it if there is a mistake. There are instructions and guides on what is considered an error and how to mark it. It take a little bit to figure it out, but after a few times you can pick it up pretty quickly.
  13. Game 2 you correct things that have been tagged as an error or mark it as not an error.
  14. And game 3 you validate the corrections that have been made. You are given choices that have been fixed to pick the correct one.
  15. The game board keeps track of points and players. And highlights top scorers. Studies have shown that people play these games for personal satisfaction and a competition doesn’t necessarily increase the desire to play. We hope people will be driven just by the personal satisfaction of getting points and helping us out as oppose to competing against anyone in particular.
  16. Once the transcripts have been verified, the JSON transcripts will be stored in the AAPB’s Amazon S3 account and indexed for keyword searching on the AAPB website. The transcripts will be made available alongside the media on the record page. They can also be played like captions within the video player. And they will be able to be harvested via an API to be used as a data set for research. We are hoping that researchers will begin to look at the collection as a data set and start trying to see trends from programming over the last 60 years. Particularly across news programs.
  17. Be sure to play and tell all your friends about it.