Presentatie met daarin een overzicht van het gebruik van Taal- en Spraaktechnologie in Customer Contact Centres. Bovendien een overzicht van de laatste inzichten en ontwikkelingen op dit gebied: gebruik van TST in CCC.
Keynote from Online Tuesday on February 9th 2010, Amsterdam by Bart Schutz (@barts) and Ton Wesseling (@tonwesseling) from Online Dialogue (@onlinedialogue).
Studiedag CSZ - Taal en communicatie in kwaliteitsvolle dienstverlening aan a...VIVO vzw
Dienstverleners communiceren vandaag met klanten uit alle windstreken. Hoe kunnen we op weg gaan naar een taal- en communicatiebeleid met ondersteunende instrumenten die werken? Wat is hierbij de inzet van brugfuncties en hun impact op de kwaliteit van de dienstverlening?
Kind en Gezin stelt een tentatieve communicatiematrix voor die een waaier aan ondersteunende instrumenten omvat.
Pascal Rillof, Beleidscoördinator taal en diversiteit, Kruispunt Migratie-Integratie vzw
Meer info over dit thema op www.pigmentzorg.be.
2015 01 22 Data-driven e-commerce: De sleutel tot succes!Holger Wandt
Presentatie tijdens de Webwinklevakdagen 2015. Zonder goed datamanagenment is succesvolle e-commerce een utopie. Deze presentatie laat zien hoe goede klantdata zorgen voor een beterere customer experience in online business.
Keynote from Online Tuesday on February 9th 2010, Amsterdam by Bart Schutz (@barts) and Ton Wesseling (@tonwesseling) from Online Dialogue (@onlinedialogue).
Studiedag CSZ - Taal en communicatie in kwaliteitsvolle dienstverlening aan a...VIVO vzw
Dienstverleners communiceren vandaag met klanten uit alle windstreken. Hoe kunnen we op weg gaan naar een taal- en communicatiebeleid met ondersteunende instrumenten die werken? Wat is hierbij de inzet van brugfuncties en hun impact op de kwaliteit van de dienstverlening?
Kind en Gezin stelt een tentatieve communicatiematrix voor die een waaier aan ondersteunende instrumenten omvat.
Pascal Rillof, Beleidscoördinator taal en diversiteit, Kruispunt Migratie-Integratie vzw
Meer info over dit thema op www.pigmentzorg.be.
2015 01 22 Data-driven e-commerce: De sleutel tot succes!Holger Wandt
Presentatie tijdens de Webwinklevakdagen 2015. Zonder goed datamanagenment is succesvolle e-commerce een utopie. Deze presentatie laat zien hoe goede klantdata zorgen voor een beterere customer experience in online business.
Talk voor laatstejaarstudenten van de faculteit letteren over welke carriere ze kunnen hebben in onze digitale wereld. Het is een pleidooi voor humane wetenschappers om hun kansen te grijpen.
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018CLARIAH
This document describes ACAD, an automatic coherence analysis tool for Dutch texts. ACAD allows users to formulate sophisticated search queries across multiple Dutch corpora to analyze coherence relations and connectives. It aims to make analyses more reproducible and transparent. ACAD's search interface Cesar translates queries into XQuery and controls output. It can search corpora like SoNaR and formats like Folia. ACAD's goals are to build this search interface and extend available corpora like newspaper texts and WhatsApp data. Future work includes manuals, investigating other connectives, constructions, and languages. Resulting annotated corpora will be released.
DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018CLARIAH
DB:CCC - Diamonds in Borneo: Commodities as Concepts in Context
For comparative research on globalization, understood as the intensified circulation of people, commodities and ideas historians currently study texts ‘manually’ deriving lists of concepts, such as places, products, and labour types as well as related characteristics explaining changes in these concepts.
To do this more efficiently we want to transpose our existing lists of concepts to vocabularies and use those to derive structured information from texts like travelogues, newspapers, trade papers etc. For this particular project we want to improve our concept lists on diamonds,using existing linked data and adding data from the Geillustreerde encyclopaedie der diamantnijverheid.
Based on a selection of texts from Delpher and with help of NLP tools like Entity recognition, classification & linking and the Ontotagger we hope to detect the mining, manufacturing and trading places and people in Borneo up until now a blind spot in our knowledge about the global diamond commodity chain.
This document describes QB'er, a tool for converting statistical datasets into linked open data on the semantic web. It aims to address problems with today's workflow for working with multiple disconnected datasets, including a lack of comparability and repeating cleaning efforts. QB'er allows researchers to standardize individual datasets according to community best practices, share code lists with colleagues, and publish standardized, interlinked datasets on a structured data hub. This grows a graph of interconnected datasets and makes the cleaning and mapping efforts reusable rather than disposable. A demonstration shows uploading a historical census dataset and mapping its variable values to codes while preserving the original values.
Collection registration for the CLARIAH Media Suite.CLARIAH
CKAN is a data management software solution developed and maintained by the Open Knowledge Foundation, adopted internationally to “streamline publishing, sharing, finding and using data,” (1) making it accessible.This system was adopted by WP5 (CLARIAH media studies focus) to register collections relevant to media studies scholars. The search API of the media suite is connected to the index that contain all data imported from CKAN. In this presentation we give an overview of the modular approach and collection registration system adopted by the CLARIAH media suite.
(1) http://ckan.org/faq/
This document discusses using linked data technology to solve the problem of disconnected and isolated economic and social science datasets. It proposes hosting updated versions of important historical databases as linked data in a single location. This will allow users to easily query across datasets, upload and link their own datasets, and build a large graph of interconnected public datasets. The document demonstrates some triplestore and linked data browsing tools, including a SPARQL query editor and lightweight linked data browser. It also introduces the team working on the structured datahub and linked data solutions for economic and social historians.
This document summarizes several demonstrations presented at a linguistics conference. It describes projects integrating historical lexical databases through linked open data, allowing tracing of word meanings and concepts over time. It also summarizes demonstrations of tools for searching treebanks and researching morphosyntactic dialects in historical Dutch texts. Finally, it provides brief updates on the status of treebank search applications GrETEL and PaQU, and plans to upgrade the morphological research tool MIMORE.
This document summarizes the work of Clariah workpackage 2. They build common infrastructure for virtual research environments including agreements on APIs, basic applications, and components. A researcher can launch their VRE, pull in authoritative data from various sources, add their own data, analyze the data through query languages or GUIs, and customize the VRE with additional tools and user management functionality. This infrastructure helps researchers create virtual research environments to curate and share data as a community.
This document describes Multi Tier Annotation Search (MTAS), a system built on Apache Solr that allows searching across text and multiple layers of linguistic annotations. MTAS extends Solr's indexing and querying capabilities to handle annotated text by using prefixes to distinguish annotation types, payloads to encode additional information, and forward indexes to retrieve related tokens. A FoLiA tokenizer maps the annotated text to MTAS' extended index structure, and queries can be written in Corpus Query Language (CQL) through specialized query handlers.
Taalportaal is an online portal that will provide an exhaustive and searchable grammar of Dutch and Frisian phonology, morphology, and syntax. Authors write content in XML using customized editing tools. The content comes from existing sources and is rewritten in a topic-based structure and integrated into the portal. An automated process retrieves authored content from a repository and displays it on the website. The portal will organize and complete existing grammatical knowledge of the languages and make it accessible through an innovative digital design that links linguistic categories and the two languages.
WhiteLab is a web application that allows users to explore and search the large Dutch text collections SoNaR-500 and CGN. It provides access to the texts, audio, transcriptions, and linguistic annotations. Users can view collection composition and statistics, search by words, parts of speech, or lemmas using the CQP query language, and view concordance results and linked audio/context. OpenSoNaR-CGN was developed by several Dutch institutions to make these annotated resources openly available.
Talk voor laatstejaarstudenten van de faculteit letteren over welke carriere ze kunnen hebben in onze digitale wereld. Het is een pleidooi voor humane wetenschappers om hun kansen te grijpen.
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018CLARIAH
This document describes ACAD, an automatic coherence analysis tool for Dutch texts. ACAD allows users to formulate sophisticated search queries across multiple Dutch corpora to analyze coherence relations and connectives. It aims to make analyses more reproducible and transparent. ACAD's search interface Cesar translates queries into XQuery and controls output. It can search corpora like SoNaR and formats like Folia. ACAD's goals are to build this search interface and extend available corpora like newspaper texts and WhatsApp data. Future work includes manuals, investigating other connectives, constructions, and languages. Resulting annotated corpora will be released.
DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018CLARIAH
DB:CCC - Diamonds in Borneo: Commodities as Concepts in Context
For comparative research on globalization, understood as the intensified circulation of people, commodities and ideas historians currently study texts ‘manually’ deriving lists of concepts, such as places, products, and labour types as well as related characteristics explaining changes in these concepts.
To do this more efficiently we want to transpose our existing lists of concepts to vocabularies and use those to derive structured information from texts like travelogues, newspapers, trade papers etc. For this particular project we want to improve our concept lists on diamonds,using existing linked data and adding data from the Geillustreerde encyclopaedie der diamantnijverheid.
Based on a selection of texts from Delpher and with help of NLP tools like Entity recognition, classification & linking and the Ontotagger we hope to detect the mining, manufacturing and trading places and people in Borneo up until now a blind spot in our knowledge about the global diamond commodity chain.
This document describes QB'er, a tool for converting statistical datasets into linked open data on the semantic web. It aims to address problems with today's workflow for working with multiple disconnected datasets, including a lack of comparability and repeating cleaning efforts. QB'er allows researchers to standardize individual datasets according to community best practices, share code lists with colleagues, and publish standardized, interlinked datasets on a structured data hub. This grows a graph of interconnected datasets and makes the cleaning and mapping efforts reusable rather than disposable. A demonstration shows uploading a historical census dataset and mapping its variable values to codes while preserving the original values.
Collection registration for the CLARIAH Media Suite.CLARIAH
CKAN is a data management software solution developed and maintained by the Open Knowledge Foundation, adopted internationally to “streamline publishing, sharing, finding and using data,” (1) making it accessible.This system was adopted by WP5 (CLARIAH media studies focus) to register collections relevant to media studies scholars. The search API of the media suite is connected to the index that contain all data imported from CKAN. In this presentation we give an overview of the modular approach and collection registration system adopted by the CLARIAH media suite.
(1) http://ckan.org/faq/
This document discusses using linked data technology to solve the problem of disconnected and isolated economic and social science datasets. It proposes hosting updated versions of important historical databases as linked data in a single location. This will allow users to easily query across datasets, upload and link their own datasets, and build a large graph of interconnected public datasets. The document demonstrates some triplestore and linked data browsing tools, including a SPARQL query editor and lightweight linked data browser. It also introduces the team working on the structured datahub and linked data solutions for economic and social historians.
This document summarizes several demonstrations presented at a linguistics conference. It describes projects integrating historical lexical databases through linked open data, allowing tracing of word meanings and concepts over time. It also summarizes demonstrations of tools for searching treebanks and researching morphosyntactic dialects in historical Dutch texts. Finally, it provides brief updates on the status of treebank search applications GrETEL and PaQU, and plans to upgrade the morphological research tool MIMORE.
This document summarizes the work of Clariah workpackage 2. They build common infrastructure for virtual research environments including agreements on APIs, basic applications, and components. A researcher can launch their VRE, pull in authoritative data from various sources, add their own data, analyze the data through query languages or GUIs, and customize the VRE with additional tools and user management functionality. This infrastructure helps researchers create virtual research environments to curate and share data as a community.
This document describes Multi Tier Annotation Search (MTAS), a system built on Apache Solr that allows searching across text and multiple layers of linguistic annotations. MTAS extends Solr's indexing and querying capabilities to handle annotated text by using prefixes to distinguish annotation types, payloads to encode additional information, and forward indexes to retrieve related tokens. A FoLiA tokenizer maps the annotated text to MTAS' extended index structure, and queries can be written in Corpus Query Language (CQL) through specialized query handlers.
Taalportaal is an online portal that will provide an exhaustive and searchable grammar of Dutch and Frisian phonology, morphology, and syntax. Authors write content in XML using customized editing tools. The content comes from existing sources and is rewritten in a topic-based structure and integrated into the portal. An automated process retrieves authored content from a repository and displays it on the website. The portal will organize and complete existing grammatical knowledge of the languages and make it accessible through an innovative digital design that links linguistic categories and the two languages.
WhiteLab is a web application that allows users to explore and search the large Dutch text collections SoNaR-500 and CGN. It provides access to the texts, audio, transcriptions, and linguistic annotations. Users can view collection composition and statistics, search by words, parts of speech, or lemmas using the CQP query language, and view concordance results and linked audio/context. OpenSoNaR-CGN was developed by several Dutch institutions to make these annotated resources openly available.
This document outlines Work Package 4 which aims to gather and curate important structured economic and social history datasets and make them available on the Clariah Structured Data Hub. It proposes using linked data technology to augment, harmonize, link, and query datasets to empower individual researchers to align codes and identifiers across datasets. The goals are to grow an interconnected graph of datasets and provide tools for users to explore, visualize, query and analyze the datasets.
Diachronous conceptuallexicons Marieke van Erp / Piek VossenCLARIAH
The researchers are converting and publishing Dutch historical and contemporary lexicons and ontologies as Linked Open Data. They are using standards like Lemon to represent lexical concepts and their relationships over time. Exposing the lexicons as Linked Open Data allows them to connect the resources to external datasets like DBpedia through mappings. Examples show how entries from the Brouwers thesaurus and Embodied Emotions Lexicon are represented and linked to concepts and time periods.
This document describes CorpusStudio, a web application for corpus linguistics research that allows defining queries to analyze text corpora in various formats. The application allows users to create corpus research projects containing metadata, definitions, queries and result databases. It includes editors for defining queries and constructing output as well as viewers for results and corpora. The application execution is handled asynchronously with a queuing system. Future plans include expanding grouping and filtering of query results.
ATHENA is a research project that aims to create a historical database on flora and fauna species in cultural and natural contexts for the Netherlands. It develops a digital infrastructure combining text, images, and structured data from various sources to examine human-nature relationships over time. The portal will provide access to this consolidated data.
This document outlines Work Package 4 which aims to gather and curate important structured economic and social history datasets and make them available on the Clariah Structured Data Hub. It proposes using linked data technology to augment, harmonize, link, and query datasets to empower individual researchers to align codes and identifiers across datasets. The goals are to grow an interconnected graph of datasets and provide tools for users to explore, visualize, query and analyze the datasets. An initial prototype is available with intake, data description, harmonization and linking capabilities as well as a triplestore, data API and query API.
1. Arjan van Hessen
Onderzoek naar Mens-Machine Interactie
(Embodied Agents) en de ontsluiting van
gesproken documenten mbv
Taal- en Spraaktechnologie
Selfservice via de telefoon en de
ontsluiting van gesproken documenten mbv
Taal- en Spraaktechnologie
Het standaardiseren (van zowel data als tools om
hiermee om te gaan) van talige, wetenschappelijke
data voor de geesteswetenschappen.
2. HOE KOMT DE BURGER WAT WAAR HALEN
BIJ DE OVERHEID IN 2018?
Volgens het regeerakkoord dienen burgers en bedrijven in 2017 digitaal te
kunnen communiceren met de lagere overheden. Maar hoe gaat die
communicatie er over vijf jaar uit zien? De kans dat het zal gebeuren zoals
we nu denken is relatief klein.
Immers: van veel van de momenteel gebruikte informatie en communicatie
technologie konden vijf jaar geleden zich nog weinig mensen een beeld
vormen.
3. • Inleiding
– Hoe zit het met de menselijke Taal en Spraak?
– Wat is Taal- en Spraaktechnologie (TST)?
• Werkende applicaties
– Klassieke spraakherkenning
– Geavanceerde spraakherkenning
• (Nabije) toekomst
– Technisch/wetenschappelijke ontwikkelingen
4. • De ontwikkeling van de menselijke
taal (of spraak) is waarschijnlijk
100.000 jaar geleden begonnen.
• Daar vóór hadden de menselijke
kaak, de mond en de larynx de
verkeerde vorm om woorden te
vormen.
Iets dat we nu nog bij de apen
zien.
5. • Het pictografische schrift
(3300 AD Sumer, Mesopotamië) is
voor zover bekend, de oudste
geschreven taal.
-3300
schrift
-10.000
landbouw
-100.000
spraak
NU
11. Wat zegt U? Wat bedoelt U?
/A/ /p/ /@/ /l/ /A/ /p/ /@/ /l/ /t/ /j/ /@/
appel appeltje
12. Ik eet een appel
Ik gebruik een appel
Het gebouw waarin appel huist
Ik koop een echte appel
Een appel is mij te modern
13. Spraakherkenning (ook bij mensen) werkt door de
voorspelbaarheid. Hoe hoger die is, hoe beter de herkenning.
Voorspelbaarheid hangt af van
verschillende zaken:
Kennis van de taal
Kennis over het gespreksonderwerp
24. TST bij Organisaties/Bedrijven
Self Service
Assisted
Service
Simpel: Slotfilling
Management
Service
Postcode-huisnr
Aankomst/vertrek
Doorverbinden
Advanced: OSH
How May I Help You?
Human finalisation
Human interaction
Alles herkennen
wat gezegd werd
Topic spotting
Topic clustering
Searching
Emotion detection
Etc.
27. Service niveau op
0900-8844 is bepalend
voor het eindoordeel
0900-88448844
De Politie over spraakherkenning bij 0900-8844:
•Een concept dat schittert in zijn eenvoud
•Een verrassend snelle implementatie
•Een uitkomst die direct al een groot succes is
•Een besparing die oploopt tot acht ton op jaarbasis
•Een mooi succes in tijden waarin we het geld goed kunnen gebruiken (aldus
korpschef politie - Intake en Noodhulp)
•De druk op de centrale van het KLPD is daarmee aanzienlijk – ik zou bijna
zeggen: rigoureus – verminderd.
29. • Waarmee kan ik u helpen?
• Ik wil graag weten of mijn pensioengeld naar mijn
rekening wordt overgemaakt voor einde van het
jaar?
Openvraag spraakherkenning
Eind-van-het-Jaar Pensioengeld
Overgemaakt Rekening
Geef de groep met vragen
die hier het meest op lijkt
Geef het antwoord dat bij
deze groep hoort
31. 053 850 80 35
WELKOM BIJ DE GEMEENTE
zegt u het maar
In de Open vraag spraakherkenning DEMO worden alle vragen waarvoor
burgers bellen naar de gemeente herkend en naar de juiste afdeling worden
doorverbonden. Denkt u daarbij aan onderstaande onderwerpen:
• Adres- en contactgegevens
• Afvalstoffen
• Bestemmingsplan
• Gemeentelijke belastingen
• Hondenbelasting
• Afvalstoffenheffing
• Rioolheffing
• Paspoorten
• Rijbewijzen
• Identiteitskaarten
• Melding openbare ruimte
• Openingstijden
• Persoonsadministratie
• Bouwvergunningen
• Parkeervergunningen
• Kapvergunningen
• Koopzondagen
• WMO
• WOZ
• Inkomen en bijstand
• Ongediertebestrijding
• Schuldhulpverlening
• Verkiezingen
• Woonruimte
33. Self service <-> Assistentie
Vraag Herkenning
Dialoog
Analyse
Self
Service
Stuur info naar
geschikte medewerker
Schakel gesprek door
naar medewerker
Wat nu?
35. Ik heb een vraag
over mijn studie
financiering?
Telefonie
netwerk
Contact center
Centrale ACD
Voice Recorder
1 2
U spreekt met OCW
DUO waarmee kan
ik u helpen?
Datum en
tijd
CLI nummer
vd beller
DDI nummer
gebeld IVR Keuzes
WACHT tijd
Agent tijd
Inhoud TEXT
‘Emotie’
Spraak
Analyse
DB
37. • Voer een natuurlijke conversatie met een
mensachtige “agent”
• Doe dat affectief en sociaal-intelligent
– Herken de emotionele staat
– Pas het gedrag daar op aan
Conversational agents/robots
38. Detection: via low-level acoustic features
F0
intensity
MFCC
phoneme
syllable
word
sentence
LEVELS
Emotion X
44. Het moet
makkelijk zijn
Omnichannel
Webchat wordt
groter dan SM
Smartphones
veranderen
gedrag burgers
Verwacht niveau CC
gaat omhoog
CC gaat de cloud in
CC-agents zijn niet
meer kantoor-
gebonden
Central burger-
repository met alle
contact-info
Service in ‘n app
CC worden
transparant, je ziet je
info, je kiest je agent
Avatars gaan een
grotere rol spelen
Iedereen wordt een
expert en kan
betrokken worden
Spraak-analyse zal
de agents helpen
Biometrie helpt
identificatie/
verificatie
45. • It’s not speech
Recognition
• It’s not searching
• It’s Artificial
Intelligence
46. Begrijpen van Taal
De slang sproeit water op de dure das
van mijn vader die over de ezel hangt
met de mooie tekening van een paard
>77.000
betekenissen
47. Herkennen van Emotie
De computer slaagt
er steeds beter in
om de emotie van
mensen te “lezen”.