Automatic Language Identification

•

2 likes•456 views

This document presents a summary of work on automatic language identification (LiD) from speech signals. It discusses how LiD could benefit various industries and outlines challenges in the problem. Features explored for LiD include MFCCs, pitch contours, and rhythmic patterns. Classification is done with WEKA using these acoustic features. Results show over 80% accuracy between related languages averaged across files, and over 70% in 5-second segments comparing all 12 languages. Extensions and improving robustness to noisy signals are discussed.

Technology

mjav
meong
h

miaou mja
m

u
ea
m

m
ia
m

uw
na
meow

u
mi

m
v

ao

já
ya

?
mi

Automatic
Language

Identiﬁcation

(LiD)

Alexander
Hosford

@awhosford

The
LiD
Problem

•  The
identiﬁcation
of
a
given
spoken
language

from
a
speech
signal

•  94%
of
global
population
speak
only
6%
of

world
languages

Real
World
Situations

•  Automation
of
LiD
is
desirable

•  Oﬀers
many
beneﬁts
to
international
service

industries

•  Hotels,
Airports,
Global
Call
Centres

Language
Diﬀerences

•  Languages
contain
information
that
makes

one
discernable
from
the
other;

–  Phonemes

–  Prosody

–  Phonotactics

–  Syntax

Human
Abilities

•  Bias
towards
native
language
arises
in
infancy

•  Prosodic
features
some
of
the
ﬁrst
cues
to
be

recognised

•  Humans
can
make
a
reasonable
estimate
on
language

heard
within
3-‐4
seconds
of
audition

•  Even
unfamiliar
languages
may
be
plausibly
judged

Previous
Attempts

•  Attempts
as
early
as
1974
–
USAF
work,
therefore

classiﬁed.

•  Methodological
Investigations
as
early
as
1977
(House
&

Neuburg)

•  Studies
for
the
most
part
center
on
phonotactic
constrains

and
phoneme
modeling

•  Raw
acoustic
waveforms
have
been
visited
(Kwasny
1993)

A
Simpler
Approach?

•  Phonotactic
approaches
require
expert
linguistic

knowledge

•  Phoneme
and
phonotactic
modeling
time

consuming

•  Given
the
speed
of
human
LiD
abilities,

discrimination
most
likely
based
on
acoustic

features

System
Overview

SuperCollider

MFCC Vectors
SuperCollider
SystemCmd
Pre-processed Speech Onset ARFF File
WEKA Tools
nPVI Generation
Utterance Detection

Pitch Contour

Feature
Extraction

•  Spectral
Information
–
MFCC
Vectors

•  Pitch
contour
information

•  Handled
by
SCMIR
Library
(Collins,
2010)

•  Speech
Rhythm
–
Normalised
Pairwise
Variability

Index
(nPVI)

Feature
Type
Implemented
Measure

Spectral
Content
MFCC
Vector
uGen

Pitch
Contour
Tartini
uGen

Speech
Rhythm
nPVI
function

Classiﬁcation

•  Handled
by
the
WEKA
toolkit

•  Built
in
Multilayer
Perceptron

•  Called
from
the
command
line
through
a

SuperCollider
system
command

Comparisons
Made

•  66
language
pairs
from
12
languages

•  A
comparison
within
language
families

•  A
comparison
of
all
12
languages

•  Averaged
&
segmented
data

Results
Within
Families

100.00

90.00

80.00

70.00

Germanic

Romantic

60.00

Slavic

SinoAltaic

50.00

Mean

40.00

30.00

20.00

4
MFCC
13
MFCC
Tartini
41
MFCC
Tartini
All
Features

*Data
averaged
across
ﬁles

Results
From
All
Languages

100.00

90.00

80.00

70.00

60.00
Averaged

Mean

50.00
5s
Segments

Mean

40.00

30.00

20.00

10.00

4
MFCC
4
MFCC
Tartini
4
MFCC
Tartini
13
MFCC
13
MFCC
Tartini
13
MFCC
Tartini
41
MFCC
41
MFCC
Tartini
41
MFCC
Tartini

nPVI
nPVI
nPVI

Extensions

•  nPVI
function
to
use
vowel
onsets

•  Phonemic
Segmentation

•  A
Larger
dataset

•  Better
Eﬃciency

•  Real-‐time
operation

Robustness

•  Real
world
signals
are
very
diﬀerent
from

processed
‘clean’
data.

•  ‘Ideal’
LiD
systems
–
independent
&
robust

•  A
need
to
analyse
only
the
part
of
the
signal

that
matters.

CASA

•  A
computational
modeling
of
human

‘Auditory
Scene
Analysis’
(Bregman
1990)

•  Separation
of
signal
into
component
parts

and
reconstitution
into
meaningful
‘streams’

Using
Noisy
Signals

I’m
open
to
suggestions!

alexanderhosford@googlemail.com

07814
692
549

@awhosford

Viewers also liked

Django & Drupal: A Tale of Two Cities

Donna Benjamin

แรงดันในของเหลว

tewin2553

Trabajo grupal

Humberto Dominguez

Photo Album created at the Games To Explain Human Factors: Come, Participate, Learn & Have Fun!!! workshop sponsored by DocTrain in East Burlington, MA on October 29, 2008. The half-day workshop, taught by Ron Shapiro, used games to illustrate how you can optimize information design and other aspects of their solutions to capitalize on human strengths and compensate for human weaknesses. For more information on arranging a presentation for your College, University or Professional Society see the http://sites.google.com/site/gamestoexplain/ website.

Games To Explain Human Factors: Come, Participate, Learn & Have Fun!!! Photo ...

Ronald G. Shapiro

Natural language processing

Iván Compañy Avi

Neue aufregende Web Technologien, HTML5 + CSS3 anhand von Praxisbeispielen le...

Eric Eggert

Aids to creativity

preciouspresentation

subrat

ABA,BALASORE

Natural Language Processing

Mike Long

Edge Amsterdam profile

charlyheus

Building solutions in SharePoint isn’t simply about getting the functionality right based on the business requirements. Developers must think about the entire user experience (UX), which goes far beyond the technical aspects of the solution. It’s no longer good enough to meet the specifications. We must exceed them in terms of usability. This takes many developers out of their comfort zones and into the messy world of end users.In this interactive session, we’ll discuss questions like:* How should the user feel when they use this piece of functionality?* Will they perceive that this functionality saves them work or creates new work?* How will the functionality compare to what they see on the consumer Web?* How can we use technologies which haven’t historically been considered mainstream SharePoint developer tools (like jQuery and CSS) to make SharePoint feel more like the sites people love?We’ll look at good and bad examples from SharePoint itself as well as specific customizations.

SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint

Marc D Anderson

поездка на родину

zizari

TRABAJO GRUPAL

Humberto Dominguez

Examples of BT using SharePoint 2010

Mark Morrell

Edge Amsterdam Profile

charlyheus

01 introduction image processing analysis

Rumah Belajar

Character Encoding & Unicode - How to (╯°□°）╯︵ ┻━┻ with dignity

Travis Fischer

3D visualisation of medical images

Shashank

Media text analysis

mariaahmad82

Mining the CPLEX Node Log for Faster MIP Performance

IBM Decision Optimization

Viewers also liked (20)

Django & Drupal: A Tale of Two Cities

แรงดันในของเหลว

Trabajo grupal

Games To Explain Human Factors: Come, Participate, Learn & Have Fun!!! Photo ...

Natural language processing

Neue aufregende Web Technologien, HTML5 + CSS3 anhand von Praxisbeispielen le...

Aids to creativity

subrat

Natural Language Processing

Edge Amsterdam profile

SharePointFest Konferenz 2016 - Creating a Great User Experience in SharePoint

поездка на родину

TRABAJO GRUPAL

Examples of BT using SharePoint 2010

Edge Amsterdam Profile

01 introduction image processing analysis

Character Encoding & Unicode - How to (╯°□°）╯︵ ┻━┻ with dignity

3D visualisation of medical images

Media text analysis

Mining the CPLEX Node Log for Faster MIP Performance

Recently uploaded

New customer? New industry? New cloud? New team? A lot to handle! How to ensure the success of the project? Start it well! I've created the 3 areas of focus at the beginning of the project that helped me in multiple roles (BA, PO, and Consultant). Learn from real-world experiences and discover how these insights can empower you to deliver unparalleled value to your customers right from the project's start.

Powerful Start- the Key to Project Success, Barbara Laskowska

CzechDreamin

Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application. In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics. Length: 30 minutes Session Overview ------------------------------------------- During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana: - What out-of-the-box solutions are available for real-time monitoring JMeter tests? - What are the benefits of integrating InfluxDB and Grafana into the load testing stack? - Which features are provided by Grafana? - Demonstration of InfluxDB and Grafana using a practice web application To view the webinar recording, go to: https://www.rttsweb.com/jmeter-integration-webinar

JMeter webinar - integration with InfluxDB and Grafana

RTTS

Already know how to write a basic SOQL query? Great! But what about an *aggregate* SOQL query? You know, the kind that uses aggregate functions like COUNT & MAX along with GROUP BY and HAVING clauses? No? Well, get ready to learn how to slice & dice your org’s data right inside your own dev console. From finding duplicate records to prototyping summary & matrix reports, learn the ins and outs of aggregate queries during this fast-paced but admin-friendly session on advanced SOQL concepts.

SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...

CzechDreamin

Welcome to UiPath Test Automation using UiPath Test Suite series part 2. In this session, we will cover API test automation along with a web automation demo. Topics covered: Test Automation introduction API Example of API automation Web automation demonstration Speaker Pathrudu Chintakayala, Associate Technical Architect @Yash and UiPath MVP Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

UiPath Test Automation using UiPath Test Suite series, part 2

DianaGray10

Ever caught yourself nodding along when someone mentions "delivering value" in Agile, but secretly wondering what the heck they actually mean? You're not alone! Join us for an eye-opening session where we'll strip away the buzzwords and dive into the heart of Agile—value delivery. But what is "value"? Is it a mythical unicorn in the world of software development, or is there more to this overused term? This isn't going to be a sit-and-get lecture. We're talking about a face-to-face, interactive meetup where YOU play a crucial role. Come along to: Define It: What does "value" really mean? We’ll build a definition that’s not just words, but a compass for your Agile journey. Contextualise It: Discover what value means specifically to you, your team, your company, and your industry. Because one size does not fit all. Deliver It: Share strategies and gather new ones for uncovering and delivering true value—no more shooting in the dark!

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx

David Michel

Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)

Julian Hyde

I'm excited to share my latest predictions on how AI, robotics, and other technological advancements will reshape industries in the coming years. The slides explore the exponential growth of computational power, the future of AI and robotics, and their profound impact on various sectors. Why this matters: The success of new products and investments hinges on precise timing and foresight into emerging categories. This deck equips founders, VCs, and industry leaders with insights to align future products with upcoming tech developments. These insights enhance the ability to forecast industry trends, improve market timing, and predict competitor actions. Highlights: ▪ Exponential Growth in Compute: How $1000 will soon buy the computational power of a human brain ▪ Scaling of AI Models: The journey towards beyond human-scale models and intelligent edge computing ▪ Transformative Technologies: From advanced robotics and brain interfaces to automated healthcare and beyond ▪ Future of Work: How automation will redefine jobs and economic structures by 2040 With so many predictions presented here, some will inevitably be wrong or mistimed, especially with potential external disruptions. For instance, a conflict in Taiwan could severely impact global semiconductor production, affecting compute costs and related advancements. Nonetheless, these slides are intended to guide intuition on future technological trends.

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl

Peter Udo Diehl

The standard Salesforce Approval process can be limiting in many ways, especially in complex scenarios. What if there was a way to implement very flexible approvals where one can use Apex code to make data updates in unrelated records, dynamically generate next steps details, and compute assignees on the fly? And still use UI-based configurations to implement concrete approval processes. In this session, we will share ideas behind such a solution and show a few lines of code to get you started.

Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder

CzechDreamin

Unlock the mysteries of successful Salesforce interviews in this insightful session hosted by Hugo Rosario (Salesforce Customer), a seasoned hiring manager that leads the Salesforce Department of multinational company with over 100 interviews under their belt. Step into the manager's chair and gain exclusive behind-the-scenes insights into what makes a Salesforce consultant stand out during the interview process. From deciphering the unspoken cues to mastering key strategies, we'll explore the intricacies of the interview process and provide practical tips for consultants looking to not only pass interviews but also thrive in their roles. Whether you're a seasoned professional or just starting your Salesforce journey, this session is your backstage pass to the secrets that hiring managers wish you knew.

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...

CzechDreamin

Key Trends Shaping the Future of Infrastructure.pdf

Cheryl Hung

The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.

Search and Society: Reimagining Information Access for Radical Futures

Bhaskar Mitra

Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place. Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects. Here’s what you’ll gain: - Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows. - Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy. - Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency. - Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity. We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic. Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.

Essentials of Automations: Optimizing FME Workflows with Parameters

Safe Software

This talk focuses on the practical aspects of integrating various telephony systems with Salesforce, drawing on examples from implementations in the Czech scene. It aims to inform attendees about the spectrum of telephony solutions available, from small to large scale, and their compatibility with Salesforce. The presentation will highlight key considerations for selecting a telephony provider that integrates smoothly with Salesforce, including important questions to support the decision-making process. It will also discuss methods for integrating existing telephony systems with Salesforce, aimed at companies contemplating or in the process of adopting this CRM platform. The discussion is designed to provide a straightforward overview of the steps and considerations involved in telephony and Salesforce integration, with an emphasis on functionality, compatibility, and the practical experiences of Czech companies.

Integrating Telephony Systems with Salesforce: Insights and Considerations, B...

CzechDreamin

Speed Wins: From Kafka to APIs in Minutes

confluent

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

Bits & Pixels using AI for Good.........

Alison B. Lowndes

You’ve heard good data matters in Machine Learning, but does it matter for Generative AI applications? Corporate data often differs significantly from the general Internet data used to train most foundation models. Join me for a demo on building an open source RAG (Retrieval Augmented Generation) stack using Milvus vector database for Retrieval, LangChain, Llama 3 with Ollama, Ragas RAG Eval, and optional Zilliz cloud, OpenAI.

Introduction to Open Source RAG and RAG Evaluation

Zilliz

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Paul Groth

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams. Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Recently uploaded (20)

Powerful Start- the Key to Project Success, Barbara Laskowska

JMeter webinar - integration with InfluxDB and Grafana

SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...

UiPath Test Automation using UiPath Test Suite series, part 2

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx

Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl

Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...

Key Trends Shaping the Future of Infrastructure.pdf

Search and Society: Reimagining Information Access for Radical Futures

Essentials of Automations: Optimizing FME Workflows with Parameters

Integrating Telephony Systems with Salesforce: Insights and Considerations, B...

Speed Wins: From Kafka to APIs in Minutes

Assuring Contact Center Experiences for Your Customers With ThousandEyes

Bits & Pixels using AI for Good.........

Introduction to Open Source RAG and RAG Evaluation

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

When stars align: studies in data quality, knowledge graphs, and machine lear...

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Automatic Language Identification

1. mjav meong h miaou mja m u ea m m ia m uw na meow u mi m v ao já ya ? mi

2. Automatic Language Identiﬁcation (LiD) Alexander Hosford @awhosford

3. The LiD Problem •  The identiﬁcation of a given spoken language from a speech signal •  94% of global population speak only 6% of world languages

4. Real World Situations •  Automation of LiD is desirable •  Oﬀers many beneﬁts to international service industries •  Hotels, Airports, Global Call Centres

5. Language Diﬀerences •  Languages contain information that makes one discernable from the other; –  Phonemes –  Prosody –  Phonotactics –  Syntax

6. Human Abilities •  Bias towards native language arises in infancy •  Prosodic features some of the ﬁrst cues to be recognised •  Humans can make a reasonable estimate on language heard within 3-‐4 seconds of audition •  Even unfamiliar languages may be plausibly judged

7. Previous Attempts •  Attempts as early as 1974 – USAF work, therefore classiﬁed. •  Methodological Investigations as early as 1977 (House & Neuburg) •  Studies for the most part center on phonotactic constrains and phoneme modeling •  Raw acoustic waveforms have been visited (Kwasny 1993)

8. A Simpler Approach? •  Phonotactic approaches require expert linguistic knowledge •  Phoneme and phonotactic modeling time consuming •  Given the speed of human LiD abilities, discrimination most likely based on acoustic features

9. System Overview SuperCollider MFCC Vectors SuperCollider SystemCmd Pre-processed Speech Onset ARFF File WEKA Tools nPVI Generation Utterance Detection Pitch Contour

10. Feature Extraction •  Spectral Information – MFCC Vectors •  Pitch contour information •  Handled by SCMIR Library (Collins, 2010) •  Speech Rhythm – Normalised Pairwise Variability Index (nPVI) Feature Type Implemented Measure Spectral Content MFCC Vector uGen Pitch Contour Tartini uGen Speech Rhythm nPVI function

11. Classiﬁcation •  Handled by the WEKA toolkit •  Built in Multilayer Perceptron •  Called from the command line through a SuperCollider system command

12. Comparisons Made •  66 language pairs from 12 languages •  A comparison within language families •  A comparison of all 12 languages •  Averaged & segmented data

13. Results Within Families 100.00 90.00 80.00 70.00 Germanic Romantic 60.00 Slavic SinoAltaic 50.00 Mean 40.00 30.00 20.00 4 MFCC 13 MFCC Tartini 41 MFCC Tartini All Features *Data averaged across ﬁles

14. Results Within Families 100.00 90.00 80.00 70.00 Germanic Romantic 60.00 Slavic SinoAltaic 50.00 Mean 40.00 30.00 20.00 4 MFCC 13 MFCC Tartini 41 MFCC Tartini *Data in 5 second segments

15. Results From All Languages 100.00 90.00 80.00 70.00 60.00 Averaged Mean 50.00 5s Segments Mean 40.00 30.00 20.00 10.00 4 MFCC 4 MFCC Tartini 4 MFCC Tartini 13 MFCC 13 MFCC Tartini 13 MFCC Tartini 41 MFCC 41 MFCC Tartini 41 MFCC Tartini nPVI nPVI nPVI

16. Extensions •  nPVI function to use vowel onsets •  Phonemic Segmentation •  A Larger dataset •  Better Eﬃciency •  Real-‐time operation

17. Robustness •  Real world signals are very diﬀerent from processed ‘clean’ data. •  ‘Ideal’ LiD systems – independent & robust •  A need to analyse only the part of the signal that matters.

18. CASA •  A computational modeling of human ‘Auditory Scene Analysis’ (Bregman 1990) •  Separation of signal into component parts and reconstitution into meaningful ‘streams’

19. Using Noisy Signals I’m open to suggestions!

20. alexanderhosford@googlemail.com 07814 692 549 @awhosford

Automatic Language Identification

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Recently uploaded

Recently uploaded (20)

Automatic Language Identification