Grzegorz Chrupała - 2017 - Representations of langauge in a model of visually grounded speech signal

•

1 like•82 views

Association for Computational Linguistics

1. The document describes a model that represents language by jointly embedding speech and images into a shared vector space. 2. The model is trained on datasets that pair images with audio captions or synthetically spoken captions. 3. The model projects speech features and images into the joint space and can perform tasks like retrieving images based on speech or disambiguating homonyms based on the visual context.

Education

Representations of language
in a model of visually
grounded speech signal
Grzegorz Chrupała Lieke Gelderloos Afra Alishahi

Automatic Speech
Recognition
A major commercial success story in
Language Technology

Very heavy-handed
supervision
I can see you

Data

Flickr8K Audio (Harwath & Glass 2015)

8K images, five audio captions each

MS COCO Synthetic Spoken Captions

300K images, five synthetically spoken captions
each

Project speech and
image to joint space
a bird walks on a beam
bears play in water

Image model
BOAT
BIRD
BOAR
Pre-classifcationlayer

Speech model

Input: MFCC

Subsampling CNN

Recurrent
Highway Network
(Zilly et al 2016)

Attention

Image retrieval
Flickr8K
MSCOCO
Newer CNN architecture: Harwath et al 2016 (NIPS), Harwath and Glass 2017 (ACL)

Levels of representation

What aspects of sentences are
encoded?

Which layers encode form, which
encode meaning?

Auxiliary tasks (Adi et al 2017)

Form-related aspects
Use activation vectors to decode

Utterance length in words

Presence of specific words

Number of words

Input

Activations for
utterance

Model

Linear regression

Word presence

Input

Activations for
utterance

MFCC for word

Model

MLP

Representational Similarity

Correlations between sets
of pairwise similarities
according to

Activations
AND

Edit ops on written
sentences

Human judgments
(SICK dataset)

Follow-up work
Afra Alishahi, Marie Barking and
Grzegorz Chrupała. Encoding of
phonology in a recurrent neural
model of grounded speech
Friday, session #4 at CoNLL

Conclusion
Encodings of form and meaning emerge
and evolve in hidden layers of stacked
RHN listening to grounded speech
Code: github.com/gchrupala/visually-grounded-speech
Data: doi.org/10.5281/zenodo.400926

Error analysis

Text usually better

Speech better:

Long
descriptions

Misspellings
Text Speech
a yellow and white
birtd is in flight

Text model

Convolution word embedding→

No attention

The idea of this paper is to design a tool that will be used to test and compare commercial speech recognition systems, such as Microsoft Speech API and Google Speech API, with open-source speech recognition systems such as Sphinx-4. The best way to compare automatic speech recognition systems in different environments is by using some audio recordings that were selected from different sources and calculating the word error rate (WER). Although the WER of the three aforementioned systems were acceptable, it was observed that the Google API is superior

Glis Localization Internationalization 05 20071030

Jan Pawlowski

The document discusses various aspects of internationalization and localization for global information systems, including: 1. Definitions of internationalization, localization, and globalization and how they relate. Internationalization is making a product work for multiple languages and cultures, localization is making it appropriate for a specific locale, and globalization is a business strategy for acting on a global market. 2. Design approaches and challenges for localization, including text translation, character encoding, formatting differences across cultures, and user interface adaptation. 3. Recommendations for making systems culture-aware, such as investigating cultural models, designing prototypes, and considering cultural factors in user interface design like metaphors, navigation, and interaction styles.

BL Demo Day - July2011 - (5) OCR for IMPACT Part 2

IMPACT Centre of Competence

Translating Ontologies in Real-World Settings

Mauro Dragoni

To enable knowledge access across languages, ontologies that are often represented only in English, need to be translated into different languages. The main challenge in translating ontologies is to find the right term with respect to the domain modeled by ontology itself. Machine translation services may help in this task; however, a crucial requirement is to have translations validated by experts before the ontologies are deployed. Real-world applications must implement a support system addressing this task for relieve experts work in validating all translations. In this paper, we present ESSOT, an Expert Supporting System for Ontology Translation. The peculiarity of this system is to exploit semantic information of the concept's context for improving the quality of label translations. The system has been tested both within the Organic.Lingua project by translating the modeled ontology in three languages and on other multilingual ontologies in order to evaluate the effectiveness of the system in other contexts. The results have been compared with the translations provided by the Microsoft Translator API and the improvements demonstrated the viability of the proposed approach.

How can you get started with machine learning?

Abdelrahman Omran

Machine learning, WTF!?

Alê Borba

(1) The document discusses different ways to get started with machine learning including using cloud APIs, retraining existing models, or developing new models. It then provides examples of Google Cloud Vision, Natural Language, and Speech APIs. (2) The document also discusses TensorFlow as an open-source machine learning library for research and production. It provides examples of using TensorFlow for image recognition and neural style transfer. (3) The document concludes by mentioning additional machine learning examples from Google including SyntaxNet and Google Photos as well as resources for learning more about TensorFlow.

visH (fin).pptx

tefflontrolegdy

This document provides an outline and details of a student internship project on text-to-speech conversion using the Python programming language. The project was conducted at iPEC Solutions, which provides AI training and services. The student designed a text-to-speech system using tools including Praat, Audacity, and WaveSurfer. The system converts text to speech by extracting phonetic components, matching them to inventory items, and generating acoustic signals for output. The project aimed to help those with communication difficulties through improved accessibility of text-to-speech technology.

This document describes the development of a voice-based virtual personal assistant using Google Dialogflow and machine learning. The authors developed an assistant called ERAA using Dialogflow's natural language understanding capabilities. Dialogflow agents contain intents that match user queries to trigger responses. The authors designed a user interface for ERAA using the Flutter platform and integrated it with Dialogflow to handle conversations. They compared Dialogflow to IBM Watson and determined Dialogflow was better for this project due to its ease of maintenance, ability to handle structured data, integration, pricing, and language support. The authors aim to implement ERAA as a smartphone app initially and potentially as a desktop application in the future.

Semantische interoperabiliteit met behulp van een bedrijfsbrede taxonomie

Richard Claassens CIPPE

What Technical Communicators Need to Know about Flash

Scott Abel

Presented by Sarah O'Keefe at Documentation and Training West, May 6-9, 2008 in Vancouver, BC. Flash-based content lets you add with animation to your text and graphics. In this session, participants will see examples of Flash-developed content and a live demonstration of basic Flash development. Topics include: * Timeline * Keyframes * Tweening * And much more If you’re curious about Flash and want to find out what it’s all about, this is the session for you.

Managing domain ontologies within the AOS

AIMS (Agricultural Information Management Standards)

The document discusses managing sub-domain ontologies within the Agricultural Ontology Service (AOS). It provides an inventory of existing FAO sub-domain ontologies, including fisheries, food safety, and FNA ontologies. It outlines needs for strategic partnerships, technical developments, and content management tools. It describes workflows for building, making available, and maintaining sub-domain ontologies through the AGROVOC Concept Server and AOS/CS workbench.

Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...

Facultad de Informática UCM

In this talk I intend to review some basic and high-level concepts like formal languages, grammars and ontologies. Languages to transmit knowledge from a sender to a receiver; grammars to formally specify languages; ontologies as formals specifications of specific knowledge domains. After this introductory revision, enhancing the role of each of those elements in the context of computer-based problem solving (programming), I will talk about a project aimed at automatically infer and generate a Grammar for a Domain Specific Language (DSL) from a given ontology that describes this specific domain. The transformation rules will be presented and the system, Onto2Gra, that fully implements that "Ontological approach for DSL development" will be introduced.

Adri Jovin - Semantic Web

Adri Jovin

The document outlines the architecture and layers of the Semantic Web Cake. It begins with the bottom layers of URI/IRI and XML and progresses up through layers including RDF, RDF Schema, OWL, and query/rule layers. It describes the purpose and components of each layer, such as using RDF to provide a basic assertion model and RDF Schema to describe classes of resources. The top layers unify the data through languages like OWL, RIF, and SPARQL to enable querying across data sources.

Re-speaking and localization

Luigi Muzii

The document discusses emerging trends in localization and translation for multimedia content. It covers challenges like translating media-rich content, subtitling, audio description, and real-time captioning. New technologies are enabling more automated solutions for transcription, translation and subtitling of audiovisual content, but specialized skills are still needed for tasks like re-speaking. Training for interpreters needs to evolve to meet the demands of localizing various media formats.

The Exploitation of OpenAPI Documents for the Generation of Web Frontends

IstvanKoren

This presentation discusses generating web frontends from OpenAPI API documentation. It introduces OpenAPI and the Interaction Flow Modeling Language (IFML) used to model user interactions. The transformation approach involves parsing the OpenAPI file, designing an IFML model, and generating HTML. A live demo shows an address book application generated this way. Challenges include synchronization across technologies and limitations of the input. The methodology enables automatic frontend creation but future work includes empirical evaluation and additional features.

Efficient Intralingual Text To Speech Web Podcasting And Recording

IOSR Journals

This document describes a web browser application that converts text to speech. The key features are: 1. The browser can open different file formats (e.g. doc, pdf) and read the text aloud, reducing reading effort. 2. It includes a text-to-speech converter, recorder to save audio, and image-based history with timestamps. 3. The project aims to combine online content browsing with text-to-speech in a single application, addressing limitations of separate browser and text converter tools.

Multimedia Information Retrieval: What is it, and why isn't ...

webhostingguy

The document discusses opportunities and challenges in video search. It begins with an introduction to video search and outlines key market trends driving growth in online video. It then explores opportunities in leveraging metadata, community contributions, and large datasets. However, it also notes challenges including developing theoretical frameworks for video search and addressing the complexity of video content analysis.

Flex & Drupal Integration

Matthew Connerton

Lit mtap

Andrea Ferracani

LIT (Lexicon of the Italian Television) is a project conceived by the Accademia della Crusca, the leading research institution on the Italian language, in collaboration with CLIEO (Center for theoretical and historical Linguistics: Italian, European and Oriental languages), with the aim of studying frequencies of the Italian lexicon used in television content and targets the specific sector of web applications for linguistic research. The corpus of transcriptions is constituted approximately by 170 hours of random television recordings transmitted by the national broadcaster RAI (Italian Radio Television) during the year 2006.

Semantic Interoperability - grafi della conoscenza

Giorgia Lodi

This document summarizes Giorgia Lodi's presentation on meaningful data and semantic interoperability in the Italian public sector. Lodi discusses issues with data quality such as missing values, semantics mismatches, and use of strings instead of codes. She argues that adopting semantic web standards like RDF, OWL and SPARQL can help address these issues by linking data together and representing it semantically. Ontologies and knowledge graphs can be used to represent domain knowledge and infer new facts. Tools like FRED can generate knowledge graphs from unstructured text. Overall, Lodi argues that semantic web technologies have the potential to improve data interoperability and quality in the public sector, though challenges remain.

Drupal Camp LA 2011: Typography modules for Drupal

Ashok Modi

Guru_poster

Christopher Clarke

This document discusses research on detecting deception in real-time audio and video streams. It outlines challenges in synchronizing, capturing, indexing and analyzing multiple streams. It proposes using MPEG-7 semantic annotations to generate knowledge bases for analysis. The research tests infrastructure for capturing, storing and retrieving segmented streams in SQL Server 2008. It also demonstrates prototype avatar animation controlled by Python scripts. Further studies are needed on the visual concept models and detection analysis engine.

Sirio

Andrea Ferracani

This document describes Sirio, an ontology-based video search engine that allows users to search video content using different interfaces like free text, natural language, or graphical concept composition. The system uses an automatically generated ontology to encode semantic relationships between concepts and exploits this structure to expand queries. It provides responsive search across different video domains through a web-based Rich Internet Application interface without requiring installation.

Multi modal retrieval and generation with deep distributed models

Roelof Pieters

Crawford ubl200212

kcmani15

Universal Business Language (UBL) aims to define a standard XML vocabulary for business documents that fulfills the promise of XML for e-business. UBL provides a standard set of XML business documents that can enable the next generation of EDI by making it cheaper, easier, and allowing its use over the Internet. UBL documents can also extend the benefits of EDI to small businesses by fitting existing legal and trade concepts and allowing reuse of data. UBL is committed to international standards like ebXML Core Components to ensure interoperability.

A Semantic Multimedia Web: Create, Annotate, Present and Share your Media

Raphael Troncy

The document discusses multimedia applications and semantic annotation of multimedia content. It introduces the canonical processes of media production model for describing multimedia workflows. It also discusses various multimedia metadata formats and ontologies like MPEG-7, COMM and their use cases. The document talks about annotating multimedia content with semantic metadata and linking it to external knowledge bases on the web.

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text

Association for Computational Linguistics

This paper contributes a noun phrase-annotated SMS corpus and proposes a weak semi-Markov CRF model for noun phrase chunking in informal text. The weak semi-CRF model improves training speed over linear-CRF and semi-CRF models while maintaining similar accuracy. Experiments on the SMS corpus show the weak semi-CRF achieves F1 scores comparable to other models but trains faster, especially with larger training data sizes.

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...

Association for Computational Linguistics

This document presents a new method for automatically detecting false friends between Spanish and Portuguese using word embeddings. The method builds word vector spaces for each language using word2vec, finds a linear transformation between the spaces, and measures vector distances to classify word pairs as cognates or false friends. In experiments on a dataset of 710 word pairs, the method achieved state-of-the-art accuracy of 77.28% and high coverage of 97.91%, outperforming previous work. Future work will explore using different word embeddings and fine-grained classifications of partial false friends.

Similar to Grzegorz Chrupała - 2017 - Representations of langauge in a model of visually grounded speech signal

Mobile android application devlopment workshop for beginners

VaibhavDaf1

A Voice Based Assistant Using Google Dialogflow And Machine Learning

Emily Smith

Semantische interoperabiliteit met behulp van een bedrijfsbrede taxonomie

Richard Claassens CIPPE

What Technical Communicators Need to Know about Flash

Scott Abel

Managing domain ontologies within the AOS

AIMS (Agricultural Information Management Standards)

Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...

Facultad de Informática UCM

Adri Jovin - Semantic Web

Adri Jovin

Re-speaking and localization

Luigi Muzii

The Exploitation of OpenAPI Documents for the Generation of Web Frontends

IstvanKoren

Efficient Intralingual Text To Speech Web Podcasting And Recording

IOSR Journals

Multimedia Information Retrieval: What is it, and why isn't ...

webhostingguy

Flex & Drupal Integration

Matthew Connerton

Lit mtap

Andrea Ferracani

Semantic Interoperability - grafi della conoscenza

Giorgia Lodi

Drupal Camp LA 2011: Typography modules for Drupal

Ashok Modi

Guru_poster

Christopher Clarke

Sirio

Andrea Ferracani

Multi modal retrieval and generation with deep distributed models

Roelof Pieters

Crawford ubl200212

kcmani15

A Semantic Multimedia Web: Create, Annotate, Present and Share your Media

Raphael Troncy

Similar to Grzegorz Chrupała - 2017 - Representations of langauge in a model of visually grounded speech signal (20)

Mobile android application devlopment workshop for beginners

A Voice Based Assistant Using Google Dialogflow And Machine Learning

Semantische interoperabiliteit met behulp van een bedrijfsbrede taxonomie

What Technical Communicators Need to Know about Flash

Managing domain ontologies within the AOS

Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...

Adri Jovin - Semantic Web

Re-speaking and localization

The Exploitation of OpenAPI Documents for the Generation of Web Frontends

Efficient Intralingual Text To Speech Web Podcasting And Recording

Multimedia Information Retrieval: What is it, and why isn't ...

Flex & Drupal Integration

Lit mtap

Semantic Interoperability - grafi della conoscenza

Drupal Camp LA 2011: Typography modules for Drupal

Guru_poster

Sirio

Multi modal retrieval and generation with deep distributed models

Crawford ubl200212

A Semantic Multimedia Web: Create, Annotate, Present and Share your Media

More from Association for Computational Linguistics

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text

Association for Computational Linguistics

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...

Association for Computational Linguistics

Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis

Association for Computational Linguistics

This document describes a Spanish language corpus for humor analysis that was created through crowd-sourcing annotations. Over 27,000 tweets were collected from humorous accounts and annotated through a web interface. The corpus contains over 100,000 annotations of the tweets' humor and funniness. Inter-annotator agreement was higher for this corpus than a previous Spanish humor corpus. The dataset will help analyze subjectivity in humor and was used in a shared task on humor classification and funniness prediction.

Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...

Association for Computational Linguistics

This document discusses position bias in instructor interventions in MOOC discussion forums. It finds that instructors are more likely to intervene in threads that appear higher on the discussion forum user interface due to their recent activity. To address this, it proposes a debiased classifier that weights examples based on their propensity for intervention. It finds this approach identifies intervention opportunities that were overlooked due to position bias. The debiased classifier outperforms a standard classifier on several metrics, demonstrating it can better predict unbiased intervention needs.

Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions

Association for Computational Linguistics

The document summarizes the history and current state of the ACL Anthology, a repository of publications from ACL-sponsored conferences. It discusses how the Anthology was established in 2001 and is now maintained by volunteers, containing over 45,000 papers. The presentation calls for community involvement to help future-proof the Anthology through efforts like migrating its infrastructure and improving documentation. It also proposes hosting the Anthology on the main ACL website and recruiting a new editor.

Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification

Association for Computational Linguistics

The document presents SAMSA, a new automatic evaluation measure for structural text simplification. SAMSA uses semantic parsing to measure the preservation of semantic structures and relations between an original text and its simplified version. It correlates significantly better with human judgments of meaning preservation and structural simplicity than prior reference-based metrics. SAMSA is the first evaluation method designed specifically for structural simplification operations like sentence splitting.

Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions

Association for Computational Linguistics

Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...

Association for Computational Linguistics

(1) Sequicity is a framework that simplifies task-oriented dialogue systems using single sequence-to-sequence architectures. (2) It formalizes dialogues as sequences of belief spans and responses and decodes them in two stages: generating a belief span followed by a response. (3) An experiment on two datasets found that a two-stage CopyNet instantiation of Sequicity outperformed several baselines in effectiveness, efficiency and handling out-of-vocabulary requests.

Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...

Association for Computational Linguistics

The document summarizes a study that explored how people's strategies for giving commands to a robot change over time during a collaborative navigation task. Ten participants each directed a robot for one hour via dialogue. Initially, participants predominantly used metric units like distances in their commands, but over time their commands increasingly referred to environmental landmarks. The study collected audio, text, and robot data to analyze parameters in commands. Future work aims to automate dialogue response generation based on this data.

Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...

Association for Computational Linguistics

The document describes a system for estimating emotion intensity in tweets. It takes a lexicon-based and word vector-based approach to create sentence embeddings for tweets. Various regression models are trained and an ensemble is used to predict emotion intensity scores between 0-1 for anger, sadness, joy and fear. The system achieved third place in predicting emotion intensity and second place for intensities over 0.5. Future work involves using contextual sentence embeddings to improve predictions.

Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop

Association for Computational Linguistics

This document describes Toshiba's machine translation system submitted to the WAT2015 workshop. It discusses using statistical post-editing (SPE) to improve rule-based machine translation (RBMT) output, as well as combining SPE and SMT systems using reranking with recurrent neural network language models. Experimental results show that the combined system achieved the best BLEU and RIBES scores compared to the individual SPE and SMT systems on several language pairs, including Japanese-English and Chinese-Japanese. However, human evaluation correlations were not entirely clear.

Chenchen Ding - 2015 - NICT at WAT 2015

Association for Computational Linguistics

John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...

Association for Computational Linguistics

The document describes improvements made to the KyotoEBMT machine translation system. It discusses using forest parsing of input sentences to handle parsing errors and syntactic divergences. It also describes using the Nile alignment tool along with constituent parsing to improve word alignments from the training corpus. New features were added and the reranking was improved by incorporating a neural machine translation-based bilingual language model.

John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...

Association for Computational Linguistics

El documento describe el sistema de traducción basado en ejemplos KyotoEBMT. El sistema utiliza análisis de dependencia tanto del idioma origen como del idioma destino y puede manejar ambigüedades en las hipótesis de traducción mediante el uso de reglas de rejilla. Los resultados oficiales del WAT2015 muestran mejoras en las métricas BLEU y RIBES con la reranqueación de traducciones, aunque la reranqueación empeora la evaluación humana para la dirección de traducción japonés-chino. El sistema Ky

Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...

Association for Computational Linguistics

This document evaluates several neural machine translation models for English to Japanese translation. It finds that simple neural models outperform statistical machine translation baselines. Soft attention models with LSTM units performed best. However, training these models on pre-reordered data hurt performance. The neural models tended to produce grammatically correct but incomplete translations by omitting information. Replacing unknown words helped some models but more sophisticated solutions are needed for models trained on natural order data.

Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...

Association for Computational Linguistics

This document evaluates various neural machine translation models for English to Japanese translation. It compares different network architectures, recurrent units, and training data configurations. Results show that soft-attention models outperformed multi-layer encoder-decoder models, and training on pre-reordered data hurt performance. Neural machine translation models tended to generate grammatically correct but incomplete translations.

Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015

Association for Computational Linguistics

This document describes NAVER's machine translation systems for the WAT 2015 evaluation. For English-to-Japanese translation, the best system combined tree-to-string syntax-based machine translation with neural machine translation re-ranking, achieving a BLEU score of 34.60. For Korean-to-Japanese translation, the top system used phrase-based machine translation and neural machine translation re-ranking, obtaining a BLEU score of 71.38. The document also analyzes the effectiveness of character-level tokenization and other techniques for neural machine translation.

Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop

Association for Computational Linguistics

Toshiba presented their machine translation system for the WAT2015 workshop. Their system uses statistical post-editing (SPE) to correct rule-based machine translation (RBMT) output. It also combines SPE and phrase-based statistical machine translation (SMT) results by reranking the merged n-best lists using a recurrent neural network language model. Evaluation showed the combined system achieved the best results on most language pairs compared to SPE and SMT individually. Analysis of system selections by the combination found it primarily chose translations from SPE.

Chenchen Ding - 2015 - NICT at WAT 2015

Association for Computational Linguistics

Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...

Association for Computational Linguistics

Neural reranking of machine translation output improves both automatic metrics and subjective human evaluations of translation quality. The document analyzes reranking results from a statistical machine translation system using an attentional neural machine translation model. Reranking corrected errors related to reordering, insertion, deletion, substitution and conjugation. Specifically, it improved phrasal reordering, auxiliary verb insertion/deletion, and coordinate structures. The gains were mainly in grammatical aspects rather than lexical selection. While reranking is shown to be effective, questions remain about comparing it to pure neural machine translation and neural language models.

More from Association for Computational Linguistics (20)