SlideShare a Scribd company logo
1 of 37
Download to read offline
Tools & Technologies for enhancing access
to Audiovisual - the Singapore Journey
Dr Phang Lai Tee (phang_lai_tee@nlb.gov.sg)
National Archives of Singapore
AMIA Conference
20 Nov 2015
Curated Stream
}  A “Little Red Dot” …
Greetings from Singapore
}  Island city-state
}  Population: 5.5 million
}  Multi-racial community – Chinese
74%, Malay 14%, Indian 9%, other
ethnicities 3%
}  Area : 710 sq km
}  Government: Parliamentary
Democracy
}  We celebrated our Golden Jubilee
}  We mourned the passing of our
founding Prime Minister
Who we are
Introduction to the
National Archives of Singapore (NAS)
4
}  1968: Established by Act of Parliament
}  Aug 1993: Came under National Heritage Board (NHB)
}  1996: AudioVisual Archives Division formally set up
}  Nov 2012: Transferred to National Library Board (NLB)
Conveniently located in
Singapore’s Civic District
(1 Canning Rise)
Archives in the library
New opportunities, new challenges
}  Content is king
}  Increased digitisation funding
}  Robust IT infrastructure for resource hungry AV
}  Experienced in improving search-ability of content
}  Dared to innovate & try new technologies
}  How to be visible in a sea of books (enhanced discovery?)
}  Pressure to widen access
}  Branding of archives
}  Archival principles…
Enhancing access step by step
Treasure Trove of AV Content
8
}  Recommendation of Advisory Council on Culture and the Arts chaired by
then 2nd Deputy Prime Minister Ong Teng Cheong in 1989
}  Strengthen the national heritage collection in all media to cover sound-
and-moving images
}  Over 100,000 AV recordings covering 60 years of broadcasting history of
Singapore
}  AV recordings capturing defining moments and key government initiatives
in Singapore’s 50 years of independence
}  Sound recordings documenting recording history of Singapore and the
region from 1903 to 1970s
New look (2013)
Expose the archives - Findable
}  Make each record Google findable with permanent url
}  Curate easy access pages of topical interests
Radio Talks on ‘The Battle for Merger’, 13 Sep - 9 Oct 1961
Archivist Pick of the Week
Search beyond the Archives – Expandable
}  OneSearch, Many Sources
}  Data harmonization and linkages across different descriptive frameworks
and systems for the benefit of users
Avoiding pitfalls
}  Beware of the mapping
}  ISAD-G, MARC, Dublin Core
}  Creator/publisher, transferring agency/source of acquisition
}  One date v.s. many dates
}  Know your collection well and the differences in
descriptions and definitions
}  Mapping alone may not be adequate
Anchored on ISAD-G
Enhance findability of non-textual content -
voice to text transcription
}  6,000 hours of broadcasts and speeches done
}  Useful guide for writing synopses, minimises need to make notes
when listening to audio, reduces time taken by 25% (for those with
good accuracy)
}  Problem with names and non-English words
}  Sarong became sorrow, Blakang Mati became Locomotiv
}  Saudara Joko Senyoto became John Paulson
}  Dr Goh Keng Swee became…
}  Accuracy highly dependent on clarity of recording and speaker’s
accent; can be improved through training
}  There are portions that can only be understood by listening to the
audio repeatedly
}  Not suitable for broadcasts with multiple languages, certain series
17
http://www.nas.gov.sg/archivesonline/oral_history_interviews/record-details/
df04a824-115d-11e3-83d5-0050568939ad?keywords=nair&keywords-type=all
Using text analytics to automatically identify related content
Text	tokenised;	
tokens	parsed	and	
weighted	(TF/IDF)	
Text	tokenised;	
tokens	parsed	and	
weighted	(TF/IDF)	
Weighted	
tokens	similarity	
computed	
Similarity = 0.295
Expandable - Mahout
Using clustering to handle large datasets
Clustering	is	the	task	of	grouping	a	set	of	objects	in	such	a	way	that	
objects	in	the	same	group	(called	a	cluster)	are	more	similar	(in	some	
sense	or	another)	to	each	other	than	to	those	in	other	groups	(clusters)	
Mahout	K-Means	Clustering	
with	Cosine	Distance
Examples of results within the same
database
Examples of results across different
databases
I can’t put everything
online!
Copyrights ($$$)
Restrictions by
depositors & rights
protection
Behind the scene
At the public front
AV holdings size: 140,000 recordings
3,120
2,443
12,864
19,200
4,308
8,316
6,442 5,186
18,951
26,637
35,100
125,931
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
FY 11 FY 12 FY 13 FY 14
No. of Recordings
No. of Recordings
Digitised
No. of New Recordings
(or Metadata) Uploaded
Online
No. of PageViews on
Recordings
Huge rise in public interest for AV
recordings
Total no. of recordings (or metadata) online: 96,209
In the pipeline…
Expandable – Project by NLB
}  Use machine translation technology & KOS (Knowledge
Organisation System) names database to translate non-English
content/local personality names to English
}  Apply text-mining & keyword classification to recommend
related library & archives content across languages
In the pipeline
}  Extend in-premises access to the libraries
}  Image analytics
}  Linked data (by NLB)
}  Crowdsourcing for home movies?
http://www.jts2016.org/
Acknowledgements:
Technology & Innovation, NLB
Oral History Centre, NAS
email: phang_lai_tee@nlb.gov.sg

More Related Content

Similar to Tools & Technologies for Enhancing Access to Audiovisual - the Singapore Journey

Eudat 2nd conference - CLARIN B2SAFE demo
Eudat 2nd conference - CLARIN B2SAFE demoEudat 2nd conference - CLARIN B2SAFE demo
Eudat 2nd conference - CLARIN B2SAFE demoWillemElbers
 
Introduction about AAT-Taiwan Project
Introduction about AAT-Taiwan ProjectIntroduction about AAT-Taiwan Project
Introduction about AAT-Taiwan ProjectAAT Taiwan
 
Finding paradisec: how Paradisec has made their data findable
Finding paradisec: how Paradisec has made their data findable Finding paradisec: how Paradisec has made their data findable
Finding paradisec: how Paradisec has made their data findable ARDC
 
Innovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPInnovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPariadnenetwork
 
Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...Guy De Pauw
 
IVACS Symposium 2010
IVACS Symposium 2010IVACS Symposium 2010
IVACS Symposium 2010nottyknight
 
ISI 5121 Trove Presentation
ISI 5121 Trove PresentationISI 5121 Trove Presentation
ISI 5121 Trove Presentationksirett
 
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...linshanleearchive
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana
 
Are New Digital Literacies Skills Neededrscd2018
Are New Digital Literacies Skills Neededrscd2018Are New Digital Literacies Skills Neededrscd2018
Are New Digital Literacies Skills Neededrscd2018SusanMRob
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageLushanthan Sivaneasharajah
 
TWLOM Application Profile
TWLOM Application ProfileTWLOM Application Profile
TWLOM Application Profileallenchen8888
 
FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...
FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...
FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...Alannah Fitzgerald
 
Speaker specific feature based clustering and its applications in language in...
Speaker specific feature based clustering and its applications in language in...Speaker specific feature based clustering and its applications in language in...
Speaker specific feature based clustering and its applications in language in...IJECEIAES
 
Project_Phase1_-_Literature_Review-1[1].pptx
Project_Phase1_-_Literature_Review-1[1].pptxProject_Phase1_-_Literature_Review-1[1].pptx
Project_Phase1_-_Literature_Review-1[1].pptxASHWIN808488
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...Scott Bou
 

Similar to Tools & Technologies for Enhancing Access to Audiovisual - the Singapore Journey (20)

Eudat 2nd conference - CLARIN B2SAFE demo
Eudat 2nd conference - CLARIN B2SAFE demoEudat 2nd conference - CLARIN B2SAFE demo
Eudat 2nd conference - CLARIN B2SAFE demo
 
Introduction about AAT-Taiwan Project
Introduction about AAT-Taiwan ProjectIntroduction about AAT-Taiwan Project
Introduction about AAT-Taiwan Project
 
Finding paradisec: how Paradisec has made their data findable
Finding paradisec: how Paradisec has made their data findable Finding paradisec: how Paradisec has made their data findable
Finding paradisec: how Paradisec has made their data findable
 
Innovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPInnovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLP
 
Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 
IVACS Symposium 2010
IVACS Symposium 2010IVACS Symposium 2010
IVACS Symposium 2010
 
ISI 5121 Trove Presentation
ISI 5121 Trove PresentationISI 5121 Trove Presentation
ISI 5121 Trove Presentation
 
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
 
Are New Digital Literacies Skills Neededrscd2018
Are New Digital Literacies Skills Neededrscd2018Are New Digital Literacies Skills Neededrscd2018
Are New Digital Literacies Skills Neededrscd2018
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil Language
 
Exploring the Evolution and Diversity of Speech Datasets
Exploring the Evolution and Diversity of Speech DatasetsExploring the Evolution and Diversity of Speech Datasets
Exploring the Evolution and Diversity of Speech Datasets
 
TWLOM Application Profile
TWLOM Application ProfileTWLOM Application Profile
TWLOM Application Profile
 
Maithili Text-to-Speech
Maithili Text-to-SpeechMaithili Text-to-Speech
Maithili Text-to-Speech
 
FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...
FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...
FLAX Weaving with Oxford Open Educational Resources: Open Practices for Engli...
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Speaker specific feature based clustering and its applications in language in...
Speaker specific feature based clustering and its applications in language in...Speaker specific feature based clustering and its applications in language in...
Speaker specific feature based clustering and its applications in language in...
 
Project_Phase1_-_Literature_Review-1[1].pptx
Project_Phase1_-_Literature_Review-1[1].pptxProject_Phase1_-_Literature_Review-1[1].pptx
Project_Phase1_-_Literature_Review-1[1].pptx
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...
 

More from Sound and Vision R&D

New life for old media - Investigations into Speech Synthesis and Deep Learni...
New life for old media - Investigations into Speech Synthesis and Deep Learni...New life for old media - Investigations into Speech Synthesis and Deep Learni...
New life for old media - Investigations into Speech Synthesis and Deep Learni...Sound and Vision R&D
 
Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...
Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...
Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...Sound and Vision R&D
 
(Im)possible Approaches to Preserving Interactive Media
(Im)possible Approaches to Preserving Interactive Media(Im)possible Approaches to Preserving Interactive Media
(Im)possible Approaches to Preserving Interactive MediaSound and Vision R&D
 
Beeld en Geluid Kenniscafé: GIFs en RE:VIVE
Beeld en Geluid Kenniscafé: GIFs en RE:VIVEBeeld en Geluid Kenniscafé: GIFs en RE:VIVE
Beeld en Geluid Kenniscafé: GIFs en RE:VIVESound and Vision R&D
 
Identification Authentication Authorization in CLARIAH
Identification Authentication Authorization in CLARIAHIdentification Authentication Authorization in CLARIAH
Identification Authentication Authorization in CLARIAHSound and Vision R&D
 
Archival Intelligence for AV Archives
Archival Intelligence for AV ArchivesArchival Intelligence for AV Archives
Archival Intelligence for AV ArchivesSound and Vision R&D
 
Access to Europe's Television Heritage via EUscreen
Access to Europe's Television Heritage via EUscreenAccess to Europe's Television Heritage via EUscreen
Access to Europe's Television Heritage via EUscreenSound and Vision R&D
 
Ho'okele: Navigating Copyright to Provide Access and Use
Ho'okele: Navigating Copyright to Provide Access and UseHo'okele: Navigating Copyright to Provide Access and Use
Ho'okele: Navigating Copyright to Provide Access and UseSound and Vision R&D
 
Methodologies for Assessment and Evaluation of Access to Moving Image Collect...
Methodologies for Assessment and Evaluation of Access to Moving Image Collect...Methodologies for Assessment and Evaluation of Access to Moving Image Collect...
Methodologies for Assessment and Evaluation of Access to Moving Image Collect...Sound and Vision R&D
 
Moving Beyond Access: Unlocking the Potential of Moving Image Archive Collect...
Moving Beyond Access: Unlocking the Potential of Moving Image Archive Collect...Moving Beyond Access: Unlocking the Potential of Moving Image Archive Collect...
Moving Beyond Access: Unlocking the Potential of Moving Image Archive Collect...Sound and Vision R&D
 
Art / Archives: A New England Archivists Research Project
Art / Archives: A New England Archivists Research ProjectArt / Archives: A New England Archivists Research Project
Art / Archives: A New England Archivists Research ProjectSound and Vision R&D
 
HTML 5: A Security Solution for EUXcreenXL
HTML 5: A Security Solution for EUXcreenXLHTML 5: A Security Solution for EUXcreenXL
HTML 5: A Security Solution for EUXcreenXLSound and Vision R&D
 
Culturejam kelly mostert_europeanatv_fonts
Culturejam kelly mostert_europeanatv_fontsCulturejam kelly mostert_europeanatv_fonts
Culturejam kelly mostert_europeanatv_fontsSound and Vision R&D
 
TVX2015: EuropeanaTV Open Up Culture, Enrich Television
TVX2015: EuropeanaTV Open Up Culture, Enrich TelevisionTVX2015: EuropeanaTV Open Up Culture, Enrich Television
TVX2015: EuropeanaTV Open Up Culture, Enrich TelevisionSound and Vision R&D
 
Europeana Space TV pilot elevator pitch
Europeana Space TV pilot elevator pitchEuropeana Space TV pilot elevator pitch
Europeana Space TV pilot elevator pitchSound and Vision R&D
 
Na de bevrijding XL: Expanding a Historical Television Series with Archival S...
Na de bevrijding XL: Expanding a Historical Television Series with Archival S...Na de bevrijding XL: Expanding a Historical Television Series with Archival S...
Na de bevrijding XL: Expanding a Historical Television Series with Archival S...Sound and Vision R&D
 

More from Sound and Vision R&D (20)

Journal Forms and Futures
Journal Forms and FuturesJournal Forms and Futures
Journal Forms and Futures
 
New life for old media - Investigations into Speech Synthesis and Deep Learni...
New life for old media - Investigations into Speech Synthesis and Deep Learni...New life for old media - Investigations into Speech Synthesis and Deep Learni...
New life for old media - Investigations into Speech Synthesis and Deep Learni...
 
Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...
Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...
Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...
 
ACM TVX2017 Review
ACM TVX2017 Review ACM TVX2017 Review
ACM TVX2017 Review
 
(Im)possible Approaches to Preserving Interactive Media
(Im)possible Approaches to Preserving Interactive Media(Im)possible Approaches to Preserving Interactive Media
(Im)possible Approaches to Preserving Interactive Media
 
Beeld en Geluid Kenniscafé: GIFs en RE:VIVE
Beeld en Geluid Kenniscafé: GIFs en RE:VIVEBeeld en Geluid Kenniscafé: GIFs en RE:VIVE
Beeld en Geluid Kenniscafé: GIFs en RE:VIVE
 
Identification Authentication Authorization in CLARIAH
Identification Authentication Authorization in CLARIAHIdentification Authentication Authorization in CLARIAH
Identification Authentication Authorization in CLARIAH
 
Copyright and Open Content
Copyright and Open ContentCopyright and Open Content
Copyright and Open Content
 
Archival Intelligence for AV Archives
Archival Intelligence for AV ArchivesArchival Intelligence for AV Archives
Archival Intelligence for AV Archives
 
Access to Europe's Television Heritage via EUscreen
Access to Europe's Television Heritage via EUscreenAccess to Europe's Television Heritage via EUscreen
Access to Europe's Television Heritage via EUscreen
 
Ho'okele: Navigating Copyright to Provide Access and Use
Ho'okele: Navigating Copyright to Provide Access and UseHo'okele: Navigating Copyright to Provide Access and Use
Ho'okele: Navigating Copyright to Provide Access and Use
 
Methodologies for Assessment and Evaluation of Access to Moving Image Collect...
Methodologies for Assessment and Evaluation of Access to Moving Image Collect...Methodologies for Assessment and Evaluation of Access to Moving Image Collect...
Methodologies for Assessment and Evaluation of Access to Moving Image Collect...
 
Moving Beyond Access: Unlocking the Potential of Moving Image Archive Collect...
Moving Beyond Access: Unlocking the Potential of Moving Image Archive Collect...Moving Beyond Access: Unlocking the Potential of Moving Image Archive Collect...
Moving Beyond Access: Unlocking the Potential of Moving Image Archive Collect...
 
Art / Archives: A New England Archivists Research Project
Art / Archives: A New England Archivists Research ProjectArt / Archives: A New England Archivists Research Project
Art / Archives: A New England Archivists Research Project
 
HTML 5: A Security Solution for EUXcreenXL
HTML 5: A Security Solution for EUXcreenXLHTML 5: A Security Solution for EUXcreenXL
HTML 5: A Security Solution for EUXcreenXL
 
Culturejam kelly mostert_europeanatv_fonts
Culturejam kelly mostert_europeanatv_fontsCulturejam kelly mostert_europeanatv_fonts
Culturejam kelly mostert_europeanatv_fonts
 
TVX2015: EuropeanaTV Open Up Culture, Enrich Television
TVX2015: EuropeanaTV Open Up Culture, Enrich TelevisionTVX2015: EuropeanaTV Open Up Culture, Enrich Television
TVX2015: EuropeanaTV Open Up Culture, Enrich Television
 
Europeana Space TV pilot elevator pitch
Europeana Space TV pilot elevator pitchEuropeana Space TV pilot elevator pitch
Europeana Space TV pilot elevator pitch
 
Na de bevrijding XL: Expanding a Historical Television Series with Archival S...
Na de bevrijding XL: Expanding a Historical Television Series with Archival S...Na de bevrijding XL: Expanding a Historical Television Series with Archival S...
Na de bevrijding XL: Expanding a Historical Television Series with Archival S...
 
TROVe eindpresentatie
TROVe eindpresentatieTROVe eindpresentatie
TROVe eindpresentatie
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Tools & Technologies for Enhancing Access to Audiovisual - the Singapore Journey

  • 1. Tools & Technologies for enhancing access to Audiovisual - the Singapore Journey Dr Phang Lai Tee (phang_lai_tee@nlb.gov.sg) National Archives of Singapore AMIA Conference 20 Nov 2015 Curated Stream
  • 2. }  A “Little Red Dot” … Greetings from Singapore
  • 3. }  Island city-state }  Population: 5.5 million }  Multi-racial community – Chinese 74%, Malay 14%, Indian 9%, other ethnicities 3% }  Area : 710 sq km }  Government: Parliamentary Democracy }  We celebrated our Golden Jubilee }  We mourned the passing of our founding Prime Minister Who we are
  • 4. Introduction to the National Archives of Singapore (NAS) 4 }  1968: Established by Act of Parliament }  Aug 1993: Came under National Heritage Board (NHB) }  1996: AudioVisual Archives Division formally set up }  Nov 2012: Transferred to National Library Board (NLB) Conveniently located in Singapore’s Civic District (1 Canning Rise)
  • 5. Archives in the library
  • 6. New opportunities, new challenges }  Content is king }  Increased digitisation funding }  Robust IT infrastructure for resource hungry AV }  Experienced in improving search-ability of content }  Dared to innovate & try new technologies }  How to be visible in a sea of books (enhanced discovery?) }  Pressure to widen access }  Branding of archives }  Archival principles…
  • 8. Treasure Trove of AV Content 8 }  Recommendation of Advisory Council on Culture and the Arts chaired by then 2nd Deputy Prime Minister Ong Teng Cheong in 1989 }  Strengthen the national heritage collection in all media to cover sound- and-moving images }  Over 100,000 AV recordings covering 60 years of broadcasting history of Singapore }  AV recordings capturing defining moments and key government initiatives in Singapore’s 50 years of independence }  Sound recordings documenting recording history of Singapore and the region from 1903 to 1970s
  • 10. Expose the archives - Findable }  Make each record Google findable with permanent url }  Curate easy access pages of topical interests
  • 11. Radio Talks on ‘The Battle for Merger’, 13 Sep - 9 Oct 1961
  • 12. Archivist Pick of the Week
  • 13. Search beyond the Archives – Expandable }  OneSearch, Many Sources }  Data harmonization and linkages across different descriptive frameworks and systems for the benefit of users
  • 14. Avoiding pitfalls }  Beware of the mapping }  ISAD-G, MARC, Dublin Core }  Creator/publisher, transferring agency/source of acquisition }  One date v.s. many dates }  Know your collection well and the differences in descriptions and definitions }  Mapping alone may not be adequate
  • 16.
  • 17. Enhance findability of non-textual content - voice to text transcription }  6,000 hours of broadcasts and speeches done }  Useful guide for writing synopses, minimises need to make notes when listening to audio, reduces time taken by 25% (for those with good accuracy) }  Problem with names and non-English words }  Sarong became sorrow, Blakang Mati became Locomotiv }  Saudara Joko Senyoto became John Paulson }  Dr Goh Keng Swee became… }  Accuracy highly dependent on clarity of recording and speaker’s accent; can be improved through training }  There are portions that can only be understood by listening to the audio repeatedly }  Not suitable for broadcasts with multiple languages, certain series 17
  • 19. Using text analytics to automatically identify related content Text tokenised; tokens parsed and weighted (TF/IDF) Text tokenised; tokens parsed and weighted (TF/IDF) Weighted tokens similarity computed Similarity = 0.295 Expandable - Mahout
  • 20. Using clustering to handle large datasets Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters) Mahout K-Means Clustering with Cosine Distance
  • 21. Examples of results within the same database
  • 22. Examples of results across different databases
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. I can’t put everything online! Copyrights ($$$) Restrictions by depositors & rights protection
  • 29.
  • 31. At the public front
  • 32. AV holdings size: 140,000 recordings 3,120 2,443 12,864 19,200 4,308 8,316 6,442 5,186 18,951 26,637 35,100 125,931 0 20,000 40,000 60,000 80,000 100,000 120,000 140,000 FY 11 FY 12 FY 13 FY 14 No. of Recordings No. of Recordings Digitised No. of New Recordings (or Metadata) Uploaded Online No. of PageViews on Recordings Huge rise in public interest for AV recordings Total no. of recordings (or metadata) online: 96,209
  • 34. Expandable – Project by NLB }  Use machine translation technology & KOS (Knowledge Organisation System) names database to translate non-English content/local personality names to English }  Apply text-mining & keyword classification to recommend related library & archives content across languages
  • 35. In the pipeline }  Extend in-premises access to the libraries }  Image analytics }  Linked data (by NLB) }  Crowdsourcing for home movies?
  • 37. Acknowledgements: Technology & Innovation, NLB Oral History Centre, NAS email: phang_lai_tee@nlb.gov.sg