SlideShare a Scribd company logo
1 of 13
RSI Archive: our experience working with
Speech to Text and Semantic Analysis
Sarah-Haye Aziz and Lorenzo Vassallo
May 17, 2013
2
Come al solito, anche la recente
inaugurazione dell'ultima monumentale
opera di quell'eccezionale scultore che
Giacomo Manzù, vale a dire la nuova porta
del Duomo di Rotterdam avuto il sapore
di un avvenimento straordinario di
risonanza internazionale e per lavori in
corso Fabio Bonetti è riuscito ad avvicinare
l'insigne maestro bergamasco, a buon
diritto ritenuto ormai uno dei più alti
interpreti del nostro tempo, artista fra i più
grandi del secolo e non solo per la misura
del suo talento ma anche per il rigore
morale di cui è sempre stato esempio in
anni di sorta, tormentata, ispirata
attività.
Credits: Giacomo Manzù, Fabio Bonetti
Geographic Therms: Rotterdam
Themes: arte, cultura, intrattenimento
Errors
è
as
Audio Transcription
ha
Categorization
3
Outline
1. Why an automatic indexing system?
2. The project timeline
3. Two paths: system and archivists workflow overview
4. Does it work? We learned that...
5. Next steps
6. Some advices
4
Why an automatic indexing system?
RSI has a consolidated cataloguing system (CMM)
with a well-defined human workflow from 2008
RSI has plenty undocumented historical material
and no capacity to document it.
Increase (plus) the documented material adding
an automation but not substituting (vs) the archivist.
Not vs but plus!
5
Archivists and Technicians Synergy
Project timeline
DeploymentDeploymentTuningTuningAnalysis & StartupAnalysis & Startup
Workflow DesignWorkflow Design
Language ModelLanguage Model
Tv & Radio
Programmes Choice
Tv & Radio
Programmes Choice
Workflow ReviewWorkflow Review
Transcription TestTranscription Test
System TestSystem Test
6
Documenting a material: two paths
Ingestion
Catalogue
Publishing
Transcription Engine
Audio + Key frames
Semantic Engine
Audio
and Video
Key frames
Archivist
Documentation
+
Refinements
Speech
Transcription
Text +
Sequences
Categorization
Text + Sequences
Credits
SIA
Themes +
Geographical therms
Human
audio listening
and
transcription
+
Archivist
documentation
7
The two paths for the archivist
Start ?
Invoke
Indexing
Human Task
on Catalogue
Detailed documentation
Manual creation of
logical sequences
Automated
Transcription and
Categorisation
Detailed documentation
Automatic creation of
logical sequences
Publish
Doc
Level
Basic Human
Limited set of
documented metadata
High Human
with Automation
Limited set of
documented metadata
Automatic creation of
logical sequences
?
?
Human Task
on Catalogue
Yes
No
Doc
Level
High Human
Basic Human
with Automation
8
The archivist – Francesco Veri
9
Does it work? Yes! But…
Differences between Radio and TV
Background Music/Noise does not help the transcription.
Based only on silences and
without key frames, the system
creates too many sequences.
Key frames help to locate a
change of context.
Speech rhythm and pauses are different between and .
10
Next steps (1) – Capitalize Editorial Texts
Semantic Engine
Categorization CatalogueAudio
+
Editorial Texts
11
Next steps (2) – Capitalize 24h Radio Logging
24h Radio Logging
0 24
SIA
(Transcription and
Semantic Engine)
Transcription &
Categorization
Catalogue
Automatic
Cut
12
If you... some advice
Involve the Archivists
Take a different approach in Radio and TV
Choose the right Tv & Radio Programmes
13
sarah-haye.aziz@rsi.ch
lorenzo.vassallo@rsi.ch

More Related Content

Similar to Presentation 17 may morning case study 2 sarahhaye aziz

Muehlberger - PrestoPrime case study 2 @EUscreen Mykonos
Muehlberger - PrestoPrime case study 2 @EUscreen MykonosMuehlberger - PrestoPrime case study 2 @EUscreen Mykonos
Muehlberger - PrestoPrime case study 2 @EUscreen MykonosEUscreen
 
Apache Stanbol Incubation Proposal
Apache Stanbol Incubation ProposalApache Stanbol Incubation Proposal
Apache Stanbol Incubation ProposalBertrand Delacretaz
 
F/LOSS in Norwegian libraries
F/LOSS in Norwegian librariesF/LOSS in Norwegian libraries
F/LOSS in Norwegian librariesLibriotech
 
OSMC 2014 | Processing millions of logs with Logstash and integrating with El...
OSMC 2014 | Processing millions of logs with Logstash and integrating with El...OSMC 2014 | Processing millions of logs with Logstash and integrating with El...
OSMC 2014 | Processing millions of logs with Logstash and integrating with El...NETWAYS
 
PHP is the King, nodejs the prince and python the fool
PHP is the King, nodejs the prince and python the foolPHP is the King, nodejs the prince and python the fool
PHP is the King, nodejs the prince and python the foolAlessandro Cinelli (cirpo)
 
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...Codemotion
 
Searching information in a collection of video-lectures
Searching information in a collection of video-lecturesSearching information in a collection of video-lectures
Searching information in a collection of video-lecturesronchet
 
IETF remote participation via Meetecho @ WebRTC Meetup Stockholm
IETF remote participation via Meetecho @ WebRTC Meetup StockholmIETF remote participation via Meetecho @ WebRTC Meetup Stockholm
IETF remote participation via Meetecho @ WebRTC Meetup StockholmLorenzo Miniero
 
OSMC 2014: Processing millions of logs with Logstash and integrating with Ela...
OSMC 2014: Processing millions of logs with Logstash and integrating with Ela...OSMC 2014: Processing millions of logs with Logstash and integrating with Ela...
OSMC 2014: Processing millions of logs with Logstash and integrating with Ela...NETWAYS
 
Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Janifer Gatenby
 
WebRTC, RED and Janus @ ClueCon21
WebRTC, RED and Janus @ ClueCon21WebRTC, RED and Janus @ ClueCon21
WebRTC, RED and Janus @ ClueCon21Lorenzo Miniero
 
COAR Venice 2017 Next Generation Repository Session: What can be done, right ...
COAR Venice 2017 Next Generation Repository Session: What can be done, right ...COAR Venice 2017 Next Generation Repository Session: What can be done, right ...
COAR Venice 2017 Next Generation Repository Session: What can be done, right ...4Science
 
COAR Venice 2017 Next Generation Repository Session: What can be done, right ...
COAR Venice 2017 Next Generation Repository Session: What can be done, right ...COAR Venice 2017 Next Generation Repository Session: What can be done, right ...
COAR Venice 2017 Next Generation Repository Session: What can be done, right ...Andrea Bollini
 
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, SwedenSem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, SwedenVladimir Alexiev, PhD, PMP
 
Lectures On Demand: delivering traditional lectures over the web
Lectures On Demand: delivering traditional lectures over the webLectures On Demand: delivering traditional lectures over the web
Lectures On Demand: delivering traditional lectures over the webronchet
 
DepositMOre: Applying tools to increase full-text content in institutional re...
DepositMOre: Applying tools to increase full-text content in institutional re...DepositMOre: Applying tools to increase full-text content in institutional re...
DepositMOre: Applying tools to increase full-text content in institutional re...depositMO
 
Visualizing Developer Interactions [VISSOFT2014]
Visualizing Developer Interactions [VISSOFT2014]Visualizing Developer Interactions [VISSOFT2014]
Visualizing Developer Interactions [VISSOFT2014]Roberto Minelli
 
Miniproject audioenhancement-100223094301-phpapp02
Miniproject audioenhancement-100223094301-phpapp02Miniproject audioenhancement-100223094301-phpapp02
Miniproject audioenhancement-100223094301-phpapp02mohankota
 

Similar to Presentation 17 may morning case study 2 sarahhaye aziz (20)

Muehlberger - PrestoPrime case study 2 @EUscreen Mykonos
Muehlberger - PrestoPrime case study 2 @EUscreen MykonosMuehlberger - PrestoPrime case study 2 @EUscreen Mykonos
Muehlberger - PrestoPrime case study 2 @EUscreen Mykonos
 
Apache Stanbol Incubation Proposal
Apache Stanbol Incubation ProposalApache Stanbol Incubation Proposal
Apache Stanbol Incubation Proposal
 
F/LOSS in Norwegian libraries
F/LOSS in Norwegian librariesF/LOSS in Norwegian libraries
F/LOSS in Norwegian libraries
 
OSMC 2014 | Processing millions of logs with Logstash and integrating with El...
OSMC 2014 | Processing millions of logs with Logstash and integrating with El...OSMC 2014 | Processing millions of logs with Logstash and integrating with El...
OSMC 2014 | Processing millions of logs with Logstash and integrating with El...
 
Knoxbug2016
Knoxbug2016Knoxbug2016
Knoxbug2016
 
PHP is the King, nodejs the prince and python the fool
PHP is the King, nodejs the prince and python the foolPHP is the King, nodejs the prince and python the fool
PHP is the King, nodejs the prince and python the fool
 
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...
PHP is the king, nodejs is the prince and Python is the fool - Alessandro Cin...
 
Searching information in a collection of video-lectures
Searching information in a collection of video-lecturesSearching information in a collection of video-lectures
Searching information in a collection of video-lectures
 
IETF remote participation via Meetecho @ WebRTC Meetup Stockholm
IETF remote participation via Meetecho @ WebRTC Meetup StockholmIETF remote participation via Meetecho @ WebRTC Meetup Stockholm
IETF remote participation via Meetecho @ WebRTC Meetup Stockholm
 
OSMC 2014: Processing millions of logs with Logstash and integrating with Ela...
OSMC 2014: Processing millions of logs with Logstash and integrating with Ela...OSMC 2014: Processing millions of logs with Logstash and integrating with Ela...
OSMC 2014: Processing millions of logs with Logstash and integrating with Ela...
 
Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Multilingualism ifla 2014 08
Multilingualism ifla 2014 08
 
WebRTC, RED and Janus @ ClueCon21
WebRTC, RED and Janus @ ClueCon21WebRTC, RED and Janus @ ClueCon21
WebRTC, RED and Janus @ ClueCon21
 
COAR Venice 2017 Next Generation Repository Session: What can be done, right ...
COAR Venice 2017 Next Generation Repository Session: What can be done, right ...COAR Venice 2017 Next Generation Repository Session: What can be done, right ...
COAR Venice 2017 Next Generation Repository Session: What can be done, right ...
 
COAR Venice 2017 Next Generation Repository Session: What can be done, right ...
COAR Venice 2017 Next Generation Repository Session: What can be done, right ...COAR Venice 2017 Next Generation Repository Session: What can be done, right ...
COAR Venice 2017 Next Generation Repository Session: What can be done, right ...
 
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, SwedenSem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
Sem tech in CH, Linked Data Meetup, 2014-08-21, Malmo, Sweden
 
Lectures On Demand: delivering traditional lectures over the web
Lectures On Demand: delivering traditional lectures over the webLectures On Demand: delivering traditional lectures over the web
Lectures On Demand: delivering traditional lectures over the web
 
DepositMOre: Applying tools to increase full-text content in institutional re...
DepositMOre: Applying tools to increase full-text content in institutional re...DepositMOre: Applying tools to increase full-text content in institutional re...
DepositMOre: Applying tools to increase full-text content in institutional re...
 
Visualizing Developer Interactions [VISSOFT2014]
Visualizing Developer Interactions [VISSOFT2014]Visualizing Developer Interactions [VISSOFT2014]
Visualizing Developer Interactions [VISSOFT2014]
 
Miniproject audioenhancement-100223094301-phpapp02
Miniproject audioenhancement-100223094301-phpapp02Miniproject audioenhancement-100223094301-phpapp02
Miniproject audioenhancement-100223094301-phpapp02
 
Mini Project- Audio Enhancement
Mini Project- Audio EnhancementMini Project- Audio Enhancement
Mini Project- Audio Enhancement
 

More from Nederlands Instituut voor Beeld en Geluid

More from Nederlands Instituut voor Beeld en Geluid (11)

Presentation 17 may morning keynote cees snoek
Presentation 17 may morning keynote cees snoekPresentation 17 may morning keynote cees snoek
Presentation 17 may morning keynote cees snoek
 
Presentation 17 may afternoon casestudy 2 liam wylie
Presentation 17 may afternoon casestudy 2 liam wyliePresentation 17 may afternoon casestudy 2 liam wylie
Presentation 17 may afternoon casestudy 2 liam wylie
 
Presentation 16 may casestudy 2 evalisgreen kaisa unander
Presentation 16 may casestudy 2 evalisgreen kaisa unanderPresentation 16 may casestudy 2 evalisgreen kaisa unander
Presentation 16 may casestudy 2 evalisgreen kaisa unander
 
Presentation 16 may morning semantic linking rutger verhoeven
Presentation 16 may morning semantic linking rutger verhoevenPresentation 16 may morning semantic linking rutger verhoeven
Presentation 16 may morning semantic linking rutger verhoeven
 
Presentation 16 may morning casestudy 2 xavier jacques jourion
Presentation 16 may morning casestudy 2 xavier jacques jourionPresentation 16 may morning casestudy 2 xavier jacques jourion
Presentation 16 may morning casestudy 2 xavier jacques jourion
 
Presentation 16 may morning casestudy 1 maarten de rijke
Presentation 16 may morning casestudy 1 maarten de rijkePresentation 16 may morning casestudy 1 maarten de rijke
Presentation 16 may morning casestudy 1 maarten de rijke
 
Presentation 16 may morning keynote seth van hooland
Presentation 16 may morning keynote seth van hoolandPresentation 16 may morning keynote seth van hooland
Presentation 16 may morning keynote seth van hooland
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
Presentation 16 may casestudy daniel steinmeier
Presentation 16 may casestudy daniel steinmeierPresentation 16 may casestudy daniel steinmeier
Presentation 16 may casestudy daniel steinmeier
 
Presentation 16 may casestudy 2 evalisgreen kaisa unander
Presentation 16 may casestudy 2 evalisgreen kaisa unanderPresentation 16 may casestudy 2 evalisgreen kaisa unander
Presentation 16 may casestudy 2 evalisgreen kaisa unander
 
Presentation 16 may archive achievements awards tom de smet
Presentation 16 may archive achievements awards tom de smetPresentation 16 may archive achievements awards tom de smet
Presentation 16 may archive achievements awards tom de smet
 

Recently uploaded

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 

Recently uploaded (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 

Presentation 17 may morning case study 2 sarahhaye aziz

  • 1. RSI Archive: our experience working with Speech to Text and Semantic Analysis Sarah-Haye Aziz and Lorenzo Vassallo May 17, 2013
  • 2. 2 Come al solito, anche la recente inaugurazione dell'ultima monumentale opera di quell'eccezionale scultore che Giacomo Manzù, vale a dire la nuova porta del Duomo di Rotterdam avuto il sapore di un avvenimento straordinario di risonanza internazionale e per lavori in corso Fabio Bonetti è riuscito ad avvicinare l'insigne maestro bergamasco, a buon diritto ritenuto ormai uno dei più alti interpreti del nostro tempo, artista fra i più grandi del secolo e non solo per la misura del suo talento ma anche per il rigore morale di cui è sempre stato esempio in anni di sorta, tormentata, ispirata attività. Credits: Giacomo Manzù, Fabio Bonetti Geographic Therms: Rotterdam Themes: arte, cultura, intrattenimento Errors è as Audio Transcription ha Categorization
  • 3. 3 Outline 1. Why an automatic indexing system? 2. The project timeline 3. Two paths: system and archivists workflow overview 4. Does it work? We learned that... 5. Next steps 6. Some advices
  • 4. 4 Why an automatic indexing system? RSI has a consolidated cataloguing system (CMM) with a well-defined human workflow from 2008 RSI has plenty undocumented historical material and no capacity to document it. Increase (plus) the documented material adding an automation but not substituting (vs) the archivist. Not vs but plus!
  • 5. 5 Archivists and Technicians Synergy Project timeline DeploymentDeploymentTuningTuningAnalysis & StartupAnalysis & Startup Workflow DesignWorkflow Design Language ModelLanguage Model Tv & Radio Programmes Choice Tv & Radio Programmes Choice Workflow ReviewWorkflow Review Transcription TestTranscription Test System TestSystem Test
  • 6. 6 Documenting a material: two paths Ingestion Catalogue Publishing Transcription Engine Audio + Key frames Semantic Engine Audio and Video Key frames Archivist Documentation + Refinements Speech Transcription Text + Sequences Categorization Text + Sequences Credits SIA Themes + Geographical therms Human audio listening and transcription + Archivist documentation
  • 7. 7 The two paths for the archivist Start ? Invoke Indexing Human Task on Catalogue Detailed documentation Manual creation of logical sequences Automated Transcription and Categorisation Detailed documentation Automatic creation of logical sequences Publish Doc Level Basic Human Limited set of documented metadata High Human with Automation Limited set of documented metadata Automatic creation of logical sequences ? ? Human Task on Catalogue Yes No Doc Level High Human Basic Human with Automation
  • 8. 8 The archivist – Francesco Veri
  • 9. 9 Does it work? Yes! But… Differences between Radio and TV Background Music/Noise does not help the transcription. Based only on silences and without key frames, the system creates too many sequences. Key frames help to locate a change of context. Speech rhythm and pauses are different between and .
  • 10. 10 Next steps (1) – Capitalize Editorial Texts Semantic Engine Categorization CatalogueAudio + Editorial Texts
  • 11. 11 Next steps (2) – Capitalize 24h Radio Logging 24h Radio Logging 0 24 SIA (Transcription and Semantic Engine) Transcription & Categorization Catalogue Automatic Cut
  • 12. 12 If you... some advice Involve the Archivists Take a different approach in Radio and TV Choose the right Tv & Radio Programmes