SlideShare a Scribd company logo
1 of 8
Download to read offline
SPEECH RECOGNITION
FOR LANGUAGE LEARNING
In the age of globalization, English fluency has become the pivotal skill for any professional or
student to develop,with special focus on the ability to confidently communicate.In recent years,
language education has constantly been in the spotlight for development.Novel methodologies
are extensively researched and applied in order to deliver the best learning experience, with
advancement in automation navigating the wave. In the current context of COVID-19 disabling
traditional schooling in many ways, technology-aided language education is growing at an
unprecedented speed.
One of the most popular technologies applied in English teaching is Automatic Speech
Recognition (ASR). ASR converts the audio speech input into written text, based on big
data-searching language corpora and finding matching patterns (Carrier, 2017). This
technology will support the development of listening and speaking – the two foundational pillars
of English communication. The application of ASR in language pedagogy offers all the benefits
of interactive learning in a regular classroom setup, as well as several prevailing characteristics
making it the ideal solution for the 4.0 digitalization of education.
The ELSA Speak App is equipped with our own spoken language
assessment technology, which will be discussed in detail in this
paper along with the benefits of its application in English learning
through real success stories with our partners.
The ELSA ASR technology is developed entirely in-house - we have a dedicated research
department responsible for creating and evolving all of our speech assessment algorithms. Our
technology inherits a lot from statistical signal processing techniques traditionally used in
speech recognition. In this respect, we use deep neural networks to learn what the correct
pronunciation is,trained from thousands of hours of native speech in American English as well as
other machine learning algorithms to classify intonation, rhythm, etc. Our pipeline also includes
novel signal processing to extract information from the speech signal that is then converted to
features to be classified into correct, almost correct, and incorrect classes.
The main factor differentiating our technology from speech recognition research (and especially
from APIs from major cloud computing services) is what we target to identify. While speech
recognition research aims at identifying what words were spoken regardless of the speaker’s
accent, we focus on distinguishing between correct and incorrect pronunciations and
benchmark the speech against actual native speaker standards. Traditionally, as speech
recognition systems get more training data from non-native speakers,they also lose the ability to
tell a user that his accent is not correct, and are even more unlikely to be able to identify why. In
our case, as we gather more data from users and evolve the system, we will improve our
pronunciation detection algorithms and the quality of our feedback
1. TECHNICAL OVERVIEW:
To perform effective communication training, in our
analysis of the student’s speaking proficiency, we
independently analyze the following dimensions:
Pronunciation: This skill evaluates the correct generation of American
English sounds, identifying when the user speaks with a non-native accent
and offering concrete actionable feedback to reduce it.
Word stress: This skill measures how well the student stresses the right
syllable in multi-syllable words so that the word is correctly understood by the
listener, namely in the case of homonyms.
Fluency: This skill measures the ability of the student to speak with the right
rhythm and apply pauses where necessary.This is a suprasegmental skill that
is fundamental to bringing students to a good conversational level with English
natives.
Sentence intonation: This skill evaluates 2 different aspects of intonation.
On one hand,it checks whether there is adequate pitch variation to make the
speech pleasant and not monotonous. On the other hand, it inspects
whether the student is adding enough emphasis to the prominent words in
the sentence, conveying the expected message to the listener.
Listening: This skill refers to the ability of a student to differentiate among
the different sounds in English, which many times convey different
meanings. This is currently done via the use of minimal pairs (i.e. words
whose only difference is the tested phoneme).
In the ELSA app, we combine our proprietary speech analysis
technology with a unique scoring system and gamified learning
experience to evaluate the user’s input and give accurate and
actionable feedback, simultaneously designing engaging and
interactive lessons.
When the student starts using the application they are invited to take a placement test that
evaluates their overall English proficiency. The assessment test highlights to the student what
English mistakes they are making and creates a learning path to improve proficiency. The ELSA
proficiency score shown to the user is a combination of the scores obtained in each of the skills
outlined above. The ELSA interface is also designed to support the learning journey. On the
home screen are planets created as part of the core user experience. These planets represent
the phonemes of English that will help build their pronunciation at the atomic level.
For example,the assessment test evaluates users’ pronunciation of all American English sounds
and gives a score for each planet at the end. Once the result is generated, the planets will be
reordered on the home screen with the highest score at the top and the lowest at the bottom.
This experience allows users to easily access the sounds that they want to practice first. Finally,
ELSA offers multiple types of lessons in the form of games that train the different skills above.
There are games that focus on a single skill (e.g. listening and word stress games) and others
that train multiple skills, like the conversation games, which evaluate pronunciation, intonation,
and fluency. This gamified approach can boost learners’ performance by up to 90%, according
to a study by ScienceDirect in 2020.
2.1 Interactive learning
One-on-one interaction is known to be the best way for language learners to improve their
conversational skills (Strik, Neri and Cucchiarini, 2011). This practice gives them instant
corrective feedback, not only keeping the knowledge fresh but also developing a natural flow
and vocabulary suitable for real-life interaction.
However,organizing one-on-one tutoring with trained instructors is complicated for students as
well as institutions. The cost and logistics will be substantial over time, not to mention the
demand to customize the curriculum to each student’s need and may even compromise the
teaching quality.
The ELSA Speak App may offer a new approach to personal tutoring. Our ASR will ensure
immediate feedback to speech input. The built-in library of 120+ topics of daily conversation
helps students develop vocab naturally. All learning curricula is tailored for individual needs by
the ELSA A.I coach.
2.2 Pronunciation improvement
Recent studies have shown that, although ignored in the past, pronunciation plays an important
role in effective communication and can improve the cognitive level of learners (Strik,2009).Our
pronunciation scoring is measured against the standardized IELTS speaking band and the app
produces a native-accent version of the sample sentence. The student thus can learn with
quality similar to a teacher’s lesson, or better, to a conversation with the natives. Pronunciation
does not simply mean one can correctly produce each phoneme, it also covers listening
comprehension. Many students report a common challenge being interpretation of foreign
accents (Fathi Sidig Sidgi and Jelani Shaari, 2017). As illustrated in section 1, ELSA technology
can detect, identify accents, and design training respectively.
An independent study conducted at the University of Yogyakarta (Indonesia) among students of
the English Department saw a 30% increase in average pronunciation scoring from before and
after 3 rounds of practicing with the ELSA Speak App. The research also reflected positive
feedbacks from users on the learning experience (Kholis, 2021)
2. THE BENEFITS OF ASR
2.3 Confidence boost & Experience enhancement
No longer bound to the classroom setting and schedule, the student can now learn anytime,
anywhere at their own pace, with continual access to other additive learning materials, such as
visualizations and recordings.According to Golonka et al.(2012),learners often report favorable
experiences with A.I-assisted language training and more motivation to use English in real life.
Bodnar et al. (2017) even demonstrated that receiving corrections from an ASR-based system
had no negative impact on the learners' enjoyment,willingness to practice,or self-efficacy.All in
all, the technology offers a private, stress-free learning environment that ultimately improves
learner’s confidence and overall learning experience.
The study mentioned above by Kholis and a similar research conducted by Darsih and partners
(2021) both reported over 85% happy feedback on the ELSA app.
A 2-month pilot program with our EdTech partner saw
that attendance rate increased 4x, total engagement is
8x compared to current stats. The organization saved a
total of 359 hours of teachers’ time.
Assuming a $20-40 per hour salary, ELSA is projected
to help save $400,000 - $800,000 per year per 1,000
students.
Improved confidence Rating maintained on
the App stores.
REQUEST A DEMO:
https://elsaspeak.com/en/english-for-companies/demo
A 2021 report from J’son & Partners Consulting estimates that language learning constitutes ¼
of the EdTech market, and English takes up 80% of this ratio. According to data from the
mentioned report, the global EdTech market size is 268 billion USD in 2021, implying 44 billion
USD for the English segment. On the supply side, nearly 20 billion USD of private funds was
poured into learning technology companies during the first half of 2021, according to Metaari’s
Chief Researcher. This corresponds well to a reported 4.1 billion invested artificial intelligence
development for the sector per the A.I Index Report 2021 by Stanford University.
That is to say, EdTech for English has been and will continue to be the target of revolution and
investment. This paper has discussed how automated spoken language technology will
certainly be the key in the development of communicative language training. Educational
institutions and EdTech companies alike are all looking for ways of integrating spoken language
technology into their existing program to enhance training effectiveness and learner’s
experience.
CONCLUSION
Proudly equipped with our AI-powered language assessment
technology, ELSA is the next step to take your solution to the next
level. Book a consultation today to learn how we can help you.

More Related Content

Similar to ELSA's Speech Recognition Overview

Second draft exploring the effectiveness and perceptions of computer game bas...
Second draft exploring the effectiveness and perceptions of computer game bas...Second draft exploring the effectiveness and perceptions of computer game bas...
Second draft exploring the effectiveness and perceptions of computer game bas...Ayuni Abdullah
 
3. 7 article june edition vol 9 no 1 2016 register journal iain salatiga
3. 7 article june edition vol 9 no 1 2016 register journal iain salatiga3. 7 article june edition vol 9 no 1 2016 register journal iain salatiga
3. 7 article june edition vol 9 no 1 2016 register journal iain salatigaFaisal Pak
 
Text to-voice applications-to_promote_speaking_skill[1]
Text to-voice applications-to_promote_speaking_skill[1]Text to-voice applications-to_promote_speaking_skill[1]
Text to-voice applications-to_promote_speaking_skill[1]Amany AlKhayat
 
Rosetta stone app.pptx
Rosetta stone app.pptxRosetta stone app.pptx
Rosetta stone app.pptxAREEJ ALDAEJ
 
Integration of Technology into Sheltered Instruction
Integration of Technology into Sheltered InstructionIntegration of Technology into Sheltered Instruction
Integration of Technology into Sheltered Instructiontara0517
 
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURESMULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURESmlaij
 
Example of journal
Example of journalExample of journal
Example of journalamirahjuned
 
The effect of films with and without subtitles on listening comprehension of ...
The effect of films with and without subtitles on listening comprehension of ...The effect of films with and without subtitles on listening comprehension of ...
The effect of films with and without subtitles on listening comprehension of ...amirahjuned
 
The Reasons Why Language Labs Are Important in Schools and Colleges!.pptx
The Reasons Why Language Labs Are Important in Schools and Colleges!.pptxThe Reasons Why Language Labs Are Important in Schools and Colleges!.pptx
The Reasons Why Language Labs Are Important in Schools and Colleges!.pptxDigital Lab
 
Group E
Group EGroup E
Group EWIU
 
Master eLearning Translation in 7 Simple Steps
Master eLearning Translation in 7 Simple StepsMaster eLearning Translation in 7 Simple Steps
Master eLearning Translation in 7 Simple Stepssaikumarmba2023
 
Technology and teaching presentaton
Technology and teaching presentatonTechnology and teaching presentaton
Technology and teaching presentatonJupiter Solar
 
Teaching Listening Skills to English as a Foreign Language Students through E...
Teaching Listening Skills to English as a Foreign Language Students through E...Teaching Listening Skills to English as a Foreign Language Students through E...
Teaching Listening Skills to English as a Foreign Language Students through E...ijtsrd
 

Similar to ELSA's Speech Recognition Overview (20)

Second draft exploring the effectiveness and perceptions of computer game bas...
Second draft exploring the effectiveness and perceptions of computer game bas...Second draft exploring the effectiveness and perceptions of computer game bas...
Second draft exploring the effectiveness and perceptions of computer game bas...
 
Vocab and mobile
Vocab and mobileVocab and mobile
Vocab and mobile
 
3. 7 article june edition vol 9 no 1 2016 register journal iain salatiga
3. 7 article june edition vol 9 no 1 2016 register journal iain salatiga3. 7 article june edition vol 9 no 1 2016 register journal iain salatiga
3. 7 article june edition vol 9 no 1 2016 register journal iain salatiga
 
Text to-voice applications-to_promote_speaking_skill[1]
Text to-voice applications-to_promote_speaking_skill[1]Text to-voice applications-to_promote_speaking_skill[1]
Text to-voice applications-to_promote_speaking_skill[1]
 
Ict final
Ict finalIct final
Ict final
 
Rosetta stone app.pptx
Rosetta stone app.pptxRosetta stone app.pptx
Rosetta stone app.pptx
 
Integration of Technology into Sheltered Instruction
Integration of Technology into Sheltered InstructionIntegration of Technology into Sheltered Instruction
Integration of Technology into Sheltered Instruction
 
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURESMULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
MULTILINGUAL SPEECH TO TEXT USING DEEP LEARNING BASED ON MFCC FEATURES
 
What we do
What we doWhat we do
What we do
 
Example of journal
Example of journalExample of journal
Example of journal
 
The effect of films with and without subtitles on listening comprehension of ...
The effect of films with and without subtitles on listening comprehension of ...The effect of films with and without subtitles on listening comprehension of ...
The effect of films with and without subtitles on listening comprehension of ...
 
Reference 2
Reference 2Reference 2
Reference 2
 
Listen to learn
Listen to learnListen to learn
Listen to learn
 
Ict cd analysis
Ict cd analysisIct cd analysis
Ict cd analysis
 
The Reasons Why Language Labs Are Important in Schools and Colleges!.pptx
The Reasons Why Language Labs Are Important in Schools and Colleges!.pptxThe Reasons Why Language Labs Are Important in Schools and Colleges!.pptx
The Reasons Why Language Labs Are Important in Schools and Colleges!.pptx
 
Group E
Group EGroup E
Group E
 
Master eLearning Translation in 7 Simple Steps
Master eLearning Translation in 7 Simple StepsMaster eLearning Translation in 7 Simple Steps
Master eLearning Translation in 7 Simple Steps
 
Technology and teaching presentaton
Technology and teaching presentatonTechnology and teaching presentaton
Technology and teaching presentaton
 
Teaching Listening Skills to English as a Foreign Language Students through E...
Teaching Listening Skills to English as a Foreign Language Students through E...Teaching Listening Skills to English as a Foreign Language Students through E...
Teaching Listening Skills to English as a Foreign Language Students through E...
 
The Benefits of English Language Software.pdf
The Benefits of English Language Software.pdfThe Benefits of English Language Software.pdf
The Benefits of English Language Software.pdf
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

ELSA's Speech Recognition Overview

  • 2. In the age of globalization, English fluency has become the pivotal skill for any professional or student to develop,with special focus on the ability to confidently communicate.In recent years, language education has constantly been in the spotlight for development.Novel methodologies are extensively researched and applied in order to deliver the best learning experience, with advancement in automation navigating the wave. In the current context of COVID-19 disabling traditional schooling in many ways, technology-aided language education is growing at an unprecedented speed. One of the most popular technologies applied in English teaching is Automatic Speech Recognition (ASR). ASR converts the audio speech input into written text, based on big data-searching language corpora and finding matching patterns (Carrier, 2017). This technology will support the development of listening and speaking – the two foundational pillars of English communication. The application of ASR in language pedagogy offers all the benefits of interactive learning in a regular classroom setup, as well as several prevailing characteristics making it the ideal solution for the 4.0 digitalization of education. The ELSA Speak App is equipped with our own spoken language assessment technology, which will be discussed in detail in this paper along with the benefits of its application in English learning through real success stories with our partners.
  • 3. The ELSA ASR technology is developed entirely in-house - we have a dedicated research department responsible for creating and evolving all of our speech assessment algorithms. Our technology inherits a lot from statistical signal processing techniques traditionally used in speech recognition. In this respect, we use deep neural networks to learn what the correct pronunciation is,trained from thousands of hours of native speech in American English as well as other machine learning algorithms to classify intonation, rhythm, etc. Our pipeline also includes novel signal processing to extract information from the speech signal that is then converted to features to be classified into correct, almost correct, and incorrect classes. The main factor differentiating our technology from speech recognition research (and especially from APIs from major cloud computing services) is what we target to identify. While speech recognition research aims at identifying what words were spoken regardless of the speaker’s accent, we focus on distinguishing between correct and incorrect pronunciations and benchmark the speech against actual native speaker standards. Traditionally, as speech recognition systems get more training data from non-native speakers,they also lose the ability to tell a user that his accent is not correct, and are even more unlikely to be able to identify why. In our case, as we gather more data from users and evolve the system, we will improve our pronunciation detection algorithms and the quality of our feedback 1. TECHNICAL OVERVIEW:
  • 4. To perform effective communication training, in our analysis of the student’s speaking proficiency, we independently analyze the following dimensions: Pronunciation: This skill evaluates the correct generation of American English sounds, identifying when the user speaks with a non-native accent and offering concrete actionable feedback to reduce it. Word stress: This skill measures how well the student stresses the right syllable in multi-syllable words so that the word is correctly understood by the listener, namely in the case of homonyms. Fluency: This skill measures the ability of the student to speak with the right rhythm and apply pauses where necessary.This is a suprasegmental skill that is fundamental to bringing students to a good conversational level with English natives. Sentence intonation: This skill evaluates 2 different aspects of intonation. On one hand,it checks whether there is adequate pitch variation to make the speech pleasant and not monotonous. On the other hand, it inspects whether the student is adding enough emphasis to the prominent words in the sentence, conveying the expected message to the listener. Listening: This skill refers to the ability of a student to differentiate among the different sounds in English, which many times convey different meanings. This is currently done via the use of minimal pairs (i.e. words whose only difference is the tested phoneme).
  • 5. In the ELSA app, we combine our proprietary speech analysis technology with a unique scoring system and gamified learning experience to evaluate the user’s input and give accurate and actionable feedback, simultaneously designing engaging and interactive lessons. When the student starts using the application they are invited to take a placement test that evaluates their overall English proficiency. The assessment test highlights to the student what English mistakes they are making and creates a learning path to improve proficiency. The ELSA proficiency score shown to the user is a combination of the scores obtained in each of the skills outlined above. The ELSA interface is also designed to support the learning journey. On the home screen are planets created as part of the core user experience. These planets represent the phonemes of English that will help build their pronunciation at the atomic level. For example,the assessment test evaluates users’ pronunciation of all American English sounds and gives a score for each planet at the end. Once the result is generated, the planets will be reordered on the home screen with the highest score at the top and the lowest at the bottom. This experience allows users to easily access the sounds that they want to practice first. Finally, ELSA offers multiple types of lessons in the form of games that train the different skills above. There are games that focus on a single skill (e.g. listening and word stress games) and others that train multiple skills, like the conversation games, which evaluate pronunciation, intonation, and fluency. This gamified approach can boost learners’ performance by up to 90%, according to a study by ScienceDirect in 2020.
  • 6. 2.1 Interactive learning One-on-one interaction is known to be the best way for language learners to improve their conversational skills (Strik, Neri and Cucchiarini, 2011). This practice gives them instant corrective feedback, not only keeping the knowledge fresh but also developing a natural flow and vocabulary suitable for real-life interaction. However,organizing one-on-one tutoring with trained instructors is complicated for students as well as institutions. The cost and logistics will be substantial over time, not to mention the demand to customize the curriculum to each student’s need and may even compromise the teaching quality. The ELSA Speak App may offer a new approach to personal tutoring. Our ASR will ensure immediate feedback to speech input. The built-in library of 120+ topics of daily conversation helps students develop vocab naturally. All learning curricula is tailored for individual needs by the ELSA A.I coach. 2.2 Pronunciation improvement Recent studies have shown that, although ignored in the past, pronunciation plays an important role in effective communication and can improve the cognitive level of learners (Strik,2009).Our pronunciation scoring is measured against the standardized IELTS speaking band and the app produces a native-accent version of the sample sentence. The student thus can learn with quality similar to a teacher’s lesson, or better, to a conversation with the natives. Pronunciation does not simply mean one can correctly produce each phoneme, it also covers listening comprehension. Many students report a common challenge being interpretation of foreign accents (Fathi Sidig Sidgi and Jelani Shaari, 2017). As illustrated in section 1, ELSA technology can detect, identify accents, and design training respectively. An independent study conducted at the University of Yogyakarta (Indonesia) among students of the English Department saw a 30% increase in average pronunciation scoring from before and after 3 rounds of practicing with the ELSA Speak App. The research also reflected positive feedbacks from users on the learning experience (Kholis, 2021) 2. THE BENEFITS OF ASR
  • 7. 2.3 Confidence boost & Experience enhancement No longer bound to the classroom setting and schedule, the student can now learn anytime, anywhere at their own pace, with continual access to other additive learning materials, such as visualizations and recordings.According to Golonka et al.(2012),learners often report favorable experiences with A.I-assisted language training and more motivation to use English in real life. Bodnar et al. (2017) even demonstrated that receiving corrections from an ASR-based system had no negative impact on the learners' enjoyment,willingness to practice,or self-efficacy.All in all, the technology offers a private, stress-free learning environment that ultimately improves learner’s confidence and overall learning experience. The study mentioned above by Kholis and a similar research conducted by Darsih and partners (2021) both reported over 85% happy feedback on the ELSA app. A 2-month pilot program with our EdTech partner saw that attendance rate increased 4x, total engagement is 8x compared to current stats. The organization saved a total of 359 hours of teachers’ time. Assuming a $20-40 per hour salary, ELSA is projected to help save $400,000 - $800,000 per year per 1,000 students. Improved confidence Rating maintained on the App stores.
  • 8. REQUEST A DEMO: https://elsaspeak.com/en/english-for-companies/demo A 2021 report from J’son & Partners Consulting estimates that language learning constitutes ¼ of the EdTech market, and English takes up 80% of this ratio. According to data from the mentioned report, the global EdTech market size is 268 billion USD in 2021, implying 44 billion USD for the English segment. On the supply side, nearly 20 billion USD of private funds was poured into learning technology companies during the first half of 2021, according to Metaari’s Chief Researcher. This corresponds well to a reported 4.1 billion invested artificial intelligence development for the sector per the A.I Index Report 2021 by Stanford University. That is to say, EdTech for English has been and will continue to be the target of revolution and investment. This paper has discussed how automated spoken language technology will certainly be the key in the development of communicative language training. Educational institutions and EdTech companies alike are all looking for ways of integrating spoken language technology into their existing program to enhance training effectiveness and learner’s experience. CONCLUSION Proudly equipped with our AI-powered language assessment technology, ELSA is the next step to take your solution to the next level. Book a consultation today to learn how we can help you.