Speaking with your computing device is becoming commonplace. Most of us have used Apple's Siri, Google Now, Microsoft's Cortana, or Amazon's Alexa - but how can you speak with your web application? The Web Speech API can enable a voice interface by adding both Speech Synthesis (Text to Speech) and Speech Recognition (Speech to Text) functionality.
This session will introduce the core concepts of Speech Synthesis and Speech Recognition. We will evaluate the current browser support and review alternative options. See the JavaScript code and UX design considerations required to add a speech interface to your web application. Come hear if it's as easy as it sounds?
Build the Virtual Reality Web with A-FrameMozilla VR
A-Frame is a web framework from Mozilla that makes VR ridiculously easy. A-Frame brings 3D and VR to HTML and JavaScript, allowing us to use the languages, APIs, and tools we all know and love as web developers and designers. Brought to you by MozVR.
Presented
at Forward.js (http://forwardjs.com/)
by Kevin Ngo (https://twitter.com/ngokevin_)
on Wednesday, February 10
Source:
https://github.com/ngokevin/forwardjs-presentation/
Esta presentación fue realizada en el marco de la Maestría en Entornos Virtuales de Aprendizaje, para la materia "La educación en la tercera década de la web", a cargo de D. Reig. Constituye un espacio de recopilación de información sobre estándares de la W3C.
FITC events. For digital creators.
Save 10% off ANY FITC event with discount code 'slideshare'
See our upcoming events at www.fitc.ca
JavaScript Speech Recognition
with Simon MacDonald
OVERVIEW
Learn how you can use speech recognition effectively on mobile platforms. We’ll discuss what speech recognition is, and go over the current state of the W3C Web Speech API specification including which vendors have implemented pieces of the specification. Learn how to smooth over those inconsistencies, and contrast this with the native implementation of speech recognition on Android and iOS. Live demos on how to use speech recognition for dictation, querying the Web for answers, and translating English to other languages will be performed with full source code available after the presentation.
OBJECTIVE
Learn how to add speech recognition capabilities to your web application.
TARGET AUDIENCE
Developers who are looking to add multi-modal or accessibility features to their web applications.
FIVE THINGS AUDIENCE MEMBERS WILL LEARN
The basics of speech recognition
The status of the W3C Speech API
How to recognize speech using JavaScript
How to respond using Text To Speech (TTS)
How to polyfil speech recognition into mobile browsers
Build the Virtual Reality Web with A-FrameMozilla VR
A-Frame is a web framework from Mozilla that makes VR ridiculously easy. A-Frame brings 3D and VR to HTML and JavaScript, allowing us to use the languages, APIs, and tools we all know and love as web developers and designers. Brought to you by MozVR.
Presented
at Forward.js (http://forwardjs.com/)
by Kevin Ngo (https://twitter.com/ngokevin_)
on Wednesday, February 10
Source:
https://github.com/ngokevin/forwardjs-presentation/
Esta presentación fue realizada en el marco de la Maestría en Entornos Virtuales de Aprendizaje, para la materia "La educación en la tercera década de la web", a cargo de D. Reig. Constituye un espacio de recopilación de información sobre estándares de la W3C.
FITC events. For digital creators.
Save 10% off ANY FITC event with discount code 'slideshare'
See our upcoming events at www.fitc.ca
JavaScript Speech Recognition
with Simon MacDonald
OVERVIEW
Learn how you can use speech recognition effectively on mobile platforms. We’ll discuss what speech recognition is, and go over the current state of the W3C Web Speech API specification including which vendors have implemented pieces of the specification. Learn how to smooth over those inconsistencies, and contrast this with the native implementation of speech recognition on Android and iOS. Live demos on how to use speech recognition for dictation, querying the Web for answers, and translating English to other languages will be performed with full source code available after the presentation.
OBJECTIVE
Learn how to add speech recognition capabilities to your web application.
TARGET AUDIENCE
Developers who are looking to add multi-modal or accessibility features to their web applications.
FIVE THINGS AUDIENCE MEMBERS WILL LEARN
The basics of speech recognition
The status of the W3C Speech API
How to recognize speech using JavaScript
How to respond using Text To Speech (TTS)
How to polyfil speech recognition into mobile browsers
Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!FITC
Save 10% off ANY FITC event with discount code 'slideshare'
See our upcoming events at www.fitc.ca
Virtual Reality development has become very active recently, with the availability of low cost and high quality headsets, motion tracking equipment, and sensors. However, most VR app development is happening natively — users are stuck in the days of needing to download the right binary, trust a third-party that their code isn’t malicious and fix compatibility issues. Developers need to target multiple platforms, thus often ignoring those with fewer users. Instead, wouldn’t it be great if high quality VR content could be delivered through the Web?
In this session, Vladimir Vukicevic will address additions to HTML, CSS, and WebGL that Mozilla is experimenting with which allow Web developers to create immersive VR experiences. Everything from pure VR WebGL content to responsive HTML and CSS that can shift from mobile to tablet to desktop to VR will be covered. Additionally, Vladimir will discuss delivering VR video via the Web, as well as how to mix WebGL and CSS content in a true 3D space.
OBJECTIVE
To show how VR and the Web work together, and the techniques for bringing VR content to the Web.
TARGET AUDIENCE
Web developers and designers
ASSUMED AUDIENCE KNOWLEDGE
Some knowledge of at least one of WebGL, CSS 3D Transforms, or modern 3D graphics would be helpful.
FIVE THINGS AUDIENCE MEMBERS WILL LEARN
An overview of current VR devices, their capabilities and how they can interface with the Web.
How to render WebGL content to a VR device.
How to create documents using HTML and CSS that can be projected in VR.
How to create responsive documents that can shift in and out of VR based on user choice.
How WebGL and CSS content can be mixed, providing interactive 3D graphics but with the full power of HTML for non-3D elements.
History of Architecture 2 class
Report by: Group 1 (Leader: Quinto)
Central Colleges of the Philippines
College of Architecture
2nd Semester S.Y. 2015-16
December 2015
La meiosis es un proceso de división nuclear que utiliza los mismos mecanismos que la mitosis, por lo que es bastante parecida, aunque su significado biológico es diferente ya que es reducir a la mitad el número de cromosomas para que no se duplique el número de la especie tras la fecundación (= fusión de gametos). La meiosis es en realidad una doble división (de las cuales la segunda es como una mitosis normal) que se da exclusivamente en células diploides. El proceso comienza igual que la mitosis, es decir, con una replicación previa de todas las cadenas de ADN al final de la interfase, de manera que al comenzar la división tenemos doble número de cadenas; tras la duplicación comienza la meiosis.
ng-owasp: OWASP Top 10 for AngularJS ApplicationsKevin Hakanson
The OWASP Top 10 provides a list of the 10 most critical web application security risks. How do these relate to AngularJS applications? What security vulnerabilities should developers be aware of beyond XSS and CSRF?
This session will review the OWASP Top 10 with a front-end development focus on HTML and JavaScript. It will look at patterns to implement and others to consider avoiding. We will also explore several built-in features of AngularJS that help secure your application.
3-in-1 talk on Serverless Chatbots, Alexa skills & Voice UI best practices (t...Daniel Zivkovic
Slides for Serverless Toronto User Group meetup cover:
1. Creating Serverless Chatbots for Twilio SMS, Slack & Facebook in minutes!
2. Alexa Bot/Skill from the same Node.js codebase! Rework of the Alexa code for the "AWS Lambda purists”.
3. Important (non-Serverless) Voice UI specific topics:
• An in-depth look at creating Alexa Skills
• Understanding Voice-First design & how it differs from designing mobile and web apps, even Interactive Voice Response (IVR) systems
• Best practices for designing Voice User Interfaces (VUI).
The session was not recorded, but "The Good, the Bad and the Ugly of the voice-first experience" demos & sample Alexa Skill Interaction Model were uploaded to http://goo.gl/H5CEpW for you to enjoy.
Who’s Afraid of Open Design? - Emanuela Damiani - Codemotion Rome 2018Codemotion
A design process can be extraordinarily structured, or entirely missing. Either way, it often requires a remarkable number of conversations with all the stakeholders involves, including people from different departments and the users. But, what does happen when we add the unpredictable power of a community? In this talk, I’ll share my thoughts, fears and views on working for an open-source product.
Does mouse and monitor soon become a thing of the past? Will we communicate with devices in an augmented reality instead? Will devices still chat with us and not much more about us? Will they not rather know automatically what to do autonomously? Sascha does not just discuss these questions. He also shows how modern interaction and multimodal user interfaces can be integrated into your own connected applications using technologies such as Amazon Echo, Google Home and Microsoft Cortana.
Provides user with introduction to build Multi-modal skill for alexa Echo Show / Spot
This slides are used for 01-JUN-2019 Alexa User Group Bangalore Meetup.
Speaker : Ilanchezhian Ganesamurthy / Suneet Patil
Building Speech Enabled Products with Amazon Polly & Amazon LexAmazon Web Services
by Dario Rivera, Solutions Architect, AWS
This session will introduce you to Amazon Polly, a deep learning service that turns text into lifelike speech. Polly enables existing applications to speak as a first class feature and creates the opportunity for entirely new categories of speech-enabled products – from mobile apps and cars, to devices and appliances. Polly includes 47 lifelike voices and support for 24 languages, so you can select the ideal voice and distribute your speech-enabled applications in many geographies. Polly is easy to use – you just send the text you want converted into speech to the Polly API, and Polly immediately returns the audio stream to your application so you can play it directly or store it in a standard audio file format, such as MP3. Polly supports Speech Synthesis Markup Language (SSML) tags like prosody so you can adjust the speech rate, pitch, or volume. Polly is a secure service that delivers all of these benefits at high scale and at low latency. You can cache and replay Polly’s generated speech at no additional cost. Polly lets you convert 5M characters per month for free during the first year. Polly’s pay-as-you-go pricing, low cost per request, and lack of restrictions on storage and reuse of voice output make it a cost-effective way to enable speech synthesis everywhere. Join this session to learn more and find out how you get can started with Amazon Polly, today!
What are chatbots and how are they built? How conversational applications combine machine learning, knowledge and engineering? What are challenges in real world enterprise implementations? How to analyze such systems
Building speech enabled products with Amazon Polly & Amazon LexAmazon Web Services
This session will introduce you to Amazon Polly, a deep learning service that turns text into lifelike speech. Polly enables existing applications to speak as a first class feature and creates the opportunity for entirely new categories of speech-enabled products – from mobile apps and cars, to devices and appliances. Polly includes 47 lifelike voices and support for 24 languages, so you can select the ideal voice and distribute your speech-enabled applications in many geographies. Polly is easy to use – you just send the text you want converted into speech to the Polly API, and Polly immediately returns the audio stream to your application so you can play it directly or store it in a standard audio file format, such as MP3. Polly supports Speech Synthesis Markup Language (SSML) tags like prosody so you can adjust the speech rate, pitch, or volume. Polly is a secure service that delivers all of these benefits at high scale and at low latency. You can cache and replay Polly’s generated speech at no additional cost. Polly lets you convert 5M characters per month for free during the first year. Polly’s pay-as-you-go pricing, low cost per request, and lack of restrictions on storage and reuse of voice output make it a cost-effective way to enable speech synthesis everywhere. Join this session to learn more and find out how you get can started with Amazon Polly, today!
Content Design for the Conversational UI - Design + Content Conference 2019Melanie Seibert
Each type of chatbot (voice, text, or both) has its own unique abilities and design requirements. How do you create truly helpful experiences for these user interfaces? Together, we’ll learn to think beyond the screen, and take advantage of the exciting potential of the conversational UI.
This presentation is focused on building solutions and strategy to solve business or customer engagement challenges. It tells the Amazon Machine Learning story and describes core AWS Artificial Intelligence services such as Polly, Lex and Rekognition can be applied to business problems.
Chatbot is a computer program which conducts a conversation via auditory or textual interaction.
This talk provides an overview of technologies used for chatbots. We will take an in-depth look at building blocks such as information access through natural language processing, Data driven approach, Single/Multi turn dialogues, Sentence representation & intent detection, use of deep learning methods.
Finally, we will distill core-concepts from these to describe a general purpose scalable chatbot platform.
Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!FITC
Save 10% off ANY FITC event with discount code 'slideshare'
See our upcoming events at www.fitc.ca
Virtual Reality development has become very active recently, with the availability of low cost and high quality headsets, motion tracking equipment, and sensors. However, most VR app development is happening natively — users are stuck in the days of needing to download the right binary, trust a third-party that their code isn’t malicious and fix compatibility issues. Developers need to target multiple platforms, thus often ignoring those with fewer users. Instead, wouldn’t it be great if high quality VR content could be delivered through the Web?
In this session, Vladimir Vukicevic will address additions to HTML, CSS, and WebGL that Mozilla is experimenting with which allow Web developers to create immersive VR experiences. Everything from pure VR WebGL content to responsive HTML and CSS that can shift from mobile to tablet to desktop to VR will be covered. Additionally, Vladimir will discuss delivering VR video via the Web, as well as how to mix WebGL and CSS content in a true 3D space.
OBJECTIVE
To show how VR and the Web work together, and the techniques for bringing VR content to the Web.
TARGET AUDIENCE
Web developers and designers
ASSUMED AUDIENCE KNOWLEDGE
Some knowledge of at least one of WebGL, CSS 3D Transforms, or modern 3D graphics would be helpful.
FIVE THINGS AUDIENCE MEMBERS WILL LEARN
An overview of current VR devices, their capabilities and how they can interface with the Web.
How to render WebGL content to a VR device.
How to create documents using HTML and CSS that can be projected in VR.
How to create responsive documents that can shift in and out of VR based on user choice.
How WebGL and CSS content can be mixed, providing interactive 3D graphics but with the full power of HTML for non-3D elements.
History of Architecture 2 class
Report by: Group 1 (Leader: Quinto)
Central Colleges of the Philippines
College of Architecture
2nd Semester S.Y. 2015-16
December 2015
La meiosis es un proceso de división nuclear que utiliza los mismos mecanismos que la mitosis, por lo que es bastante parecida, aunque su significado biológico es diferente ya que es reducir a la mitad el número de cromosomas para que no se duplique el número de la especie tras la fecundación (= fusión de gametos). La meiosis es en realidad una doble división (de las cuales la segunda es como una mitosis normal) que se da exclusivamente en células diploides. El proceso comienza igual que la mitosis, es decir, con una replicación previa de todas las cadenas de ADN al final de la interfase, de manera que al comenzar la división tenemos doble número de cadenas; tras la duplicación comienza la meiosis.
ng-owasp: OWASP Top 10 for AngularJS ApplicationsKevin Hakanson
The OWASP Top 10 provides a list of the 10 most critical web application security risks. How do these relate to AngularJS applications? What security vulnerabilities should developers be aware of beyond XSS and CSRF?
This session will review the OWASP Top 10 with a front-end development focus on HTML and JavaScript. It will look at patterns to implement and others to consider avoiding. We will also explore several built-in features of AngularJS that help secure your application.
3-in-1 talk on Serverless Chatbots, Alexa skills & Voice UI best practices (t...Daniel Zivkovic
Slides for Serverless Toronto User Group meetup cover:
1. Creating Serverless Chatbots for Twilio SMS, Slack & Facebook in minutes!
2. Alexa Bot/Skill from the same Node.js codebase! Rework of the Alexa code for the "AWS Lambda purists”.
3. Important (non-Serverless) Voice UI specific topics:
• An in-depth look at creating Alexa Skills
• Understanding Voice-First design & how it differs from designing mobile and web apps, even Interactive Voice Response (IVR) systems
• Best practices for designing Voice User Interfaces (VUI).
The session was not recorded, but "The Good, the Bad and the Ugly of the voice-first experience" demos & sample Alexa Skill Interaction Model were uploaded to http://goo.gl/H5CEpW for you to enjoy.
Who’s Afraid of Open Design? - Emanuela Damiani - Codemotion Rome 2018Codemotion
A design process can be extraordinarily structured, or entirely missing. Either way, it often requires a remarkable number of conversations with all the stakeholders involves, including people from different departments and the users. But, what does happen when we add the unpredictable power of a community? In this talk, I’ll share my thoughts, fears and views on working for an open-source product.
Does mouse and monitor soon become a thing of the past? Will we communicate with devices in an augmented reality instead? Will devices still chat with us and not much more about us? Will they not rather know automatically what to do autonomously? Sascha does not just discuss these questions. He also shows how modern interaction and multimodal user interfaces can be integrated into your own connected applications using technologies such as Amazon Echo, Google Home and Microsoft Cortana.
Provides user with introduction to build Multi-modal skill for alexa Echo Show / Spot
This slides are used for 01-JUN-2019 Alexa User Group Bangalore Meetup.
Speaker : Ilanchezhian Ganesamurthy / Suneet Patil
Building Speech Enabled Products with Amazon Polly & Amazon LexAmazon Web Services
by Dario Rivera, Solutions Architect, AWS
This session will introduce you to Amazon Polly, a deep learning service that turns text into lifelike speech. Polly enables existing applications to speak as a first class feature and creates the opportunity for entirely new categories of speech-enabled products – from mobile apps and cars, to devices and appliances. Polly includes 47 lifelike voices and support for 24 languages, so you can select the ideal voice and distribute your speech-enabled applications in many geographies. Polly is easy to use – you just send the text you want converted into speech to the Polly API, and Polly immediately returns the audio stream to your application so you can play it directly or store it in a standard audio file format, such as MP3. Polly supports Speech Synthesis Markup Language (SSML) tags like prosody so you can adjust the speech rate, pitch, or volume. Polly is a secure service that delivers all of these benefits at high scale and at low latency. You can cache and replay Polly’s generated speech at no additional cost. Polly lets you convert 5M characters per month for free during the first year. Polly’s pay-as-you-go pricing, low cost per request, and lack of restrictions on storage and reuse of voice output make it a cost-effective way to enable speech synthesis everywhere. Join this session to learn more and find out how you get can started with Amazon Polly, today!
What are chatbots and how are they built? How conversational applications combine machine learning, knowledge and engineering? What are challenges in real world enterprise implementations? How to analyze such systems
Building speech enabled products with Amazon Polly & Amazon LexAmazon Web Services
This session will introduce you to Amazon Polly, a deep learning service that turns text into lifelike speech. Polly enables existing applications to speak as a first class feature and creates the opportunity for entirely new categories of speech-enabled products – from mobile apps and cars, to devices and appliances. Polly includes 47 lifelike voices and support for 24 languages, so you can select the ideal voice and distribute your speech-enabled applications in many geographies. Polly is easy to use – you just send the text you want converted into speech to the Polly API, and Polly immediately returns the audio stream to your application so you can play it directly or store it in a standard audio file format, such as MP3. Polly supports Speech Synthesis Markup Language (SSML) tags like prosody so you can adjust the speech rate, pitch, or volume. Polly is a secure service that delivers all of these benefits at high scale and at low latency. You can cache and replay Polly’s generated speech at no additional cost. Polly lets you convert 5M characters per month for free during the first year. Polly’s pay-as-you-go pricing, low cost per request, and lack of restrictions on storage and reuse of voice output make it a cost-effective way to enable speech synthesis everywhere. Join this session to learn more and find out how you get can started with Amazon Polly, today!
Content Design for the Conversational UI - Design + Content Conference 2019Melanie Seibert
Each type of chatbot (voice, text, or both) has its own unique abilities and design requirements. How do you create truly helpful experiences for these user interfaces? Together, we’ll learn to think beyond the screen, and take advantage of the exciting potential of the conversational UI.
This presentation is focused on building solutions and strategy to solve business or customer engagement challenges. It tells the Amazon Machine Learning story and describes core AWS Artificial Intelligence services such as Polly, Lex and Rekognition can be applied to business problems.
Chatbot is a computer program which conducts a conversation via auditory or textual interaction.
This talk provides an overview of technologies used for chatbots. We will take an in-depth look at building blocks such as information access through natural language processing, Data driven approach, Single/Multi turn dialogues, Sentence representation & intent detection, use of deep learning methods.
Finally, we will distill core-concepts from these to describe a general purpose scalable chatbot platform.
Realizzare un Virtual Assistant con Bot Framework Azure e UnityMarco Parenzan
When we talk about (chat) bots, we think of a bot with which to chat through texts entered with a keyboard. Between typos, buttons and AdaptiveCards, we tend to lose sight of the true essence of a Conversational Agent: conversing through a dialogue, perhaps through the voice. The maximum value of a bot is expressed when using the Azure Cognitive services for Speech to Text and Text to Speech. We will understand how to develop a bot, the use of predefined entries and multilingual support. And what are the consequences in defining a conversation.
And then we'll see how to give the bot "physicality". Through the use of Unity, we will create a scene with an Avatar modeled in 3D that synchronizes with the conversation, giving us the feeling of talking to us like a real person. The whole in the context of an enterprise scenario.
Better Accessibility with Lex, Polly, and Alexa | AWS Public Sector Summit 2017Amazon Web Services
Most AWS services can be applied at scale. We'll provide a demo of what and how these services help modernize interactions with IT systems even those with government regulations and requirements. We will also demonstrate ways the overall solution helps meet those requirements. GeorgiaGov Interactive will share their story on how they are using Alexa to reach more disabled residents, extending its informational and transactional services onto Amazon's voice-driven platform. Learn More: https://aws.amazon.com/government-education/
With the advent of voice-activated products and platforms, more news companies are dipping their toes in voice and conversational bots. For designers, this raises the question—how do you approach design when it’s invisible? In her session, senior product designer Sanette Tanaka will go over how her team at Vox Media designed an Amazon Echo bot, never having worked with voice interfaces prior. She will walk you through the steps she followed to adapt her design process for voice, while also sharing best practices in voice design.
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...Amazon Web Services
Amazon AI services bring natural language understanding (NLU), automatic speech recognition (ASR), visual search and image recognition, text-to-speech (TTS), and machine learning (ML) technologies within reach of every developer. Amazon Lex make it easy to build sophisticated text and voice chatbots, powered by Alexa; Amazon Rekognition provides deep learning-based image recognition; and Amazon Polly turns text into lifelike speech. In this workshop, you'll get a chance to use each of the new deep learning services. We'll see you there!
Similar to Introduction to Speech Interfaces for Web Applications (20)
Sharpen your "Architectural Documentation" SawKevin Hakanson
All solutions implicitly have an architecture, ideally one which is both intentional and documented. The Architectural Decision Records (ADR) process distributes architectural decision-making across team members. Accelerate the time-consuming process of hand drawing diagrams by rendering from a text-based source. Communicate effectively by committing both your markdown-based ADRs and text-based diagrams into your source code repository. This talk will review these techniques, provide actionable steps to adoption, and even live-code some examples.
Who's in your Cloud? Cloud State MonitoringKevin Hakanson
When it comes to cloud operations, monitoring security and visibility are critical. Integration by other systems via Cloud APIs is one of the most powerful value drivers of the hyperscale cloud providers.
In this session, we will describe Cloud State Monitoring, including why it is important and who needs awareness in your organization. An explanation of the categories of Cloud APIs (including the management plane, control plane, and data plane) will give us background. Specific use cases across AWS, Azure, and GCP will dive deep into various changes you might not have considered monitoring.
Adopting Multi-Cloud Services with ConfidenceKevin Hakanson
In transitioning to multi-cloud, IT organizations have the same responsibility to provide quality service and operational security yet have a much greater need to understand how to efficiently govern and manage these disparate cloud services.
In this session, we will examine some key patterns and models taken from a Cloud Adoption Framework through a multi-cloud lens. The presentation will include a mixture of high-level guidance, examples where vocabulary and terminology differ, and opinions on when to utilize cloud-agnostic vs cloud-native technologies for strategic decisions.
Attendees will leave with a better understanding of how to implement a Cloud Adoption Framework across multiple clouds and a higher level of confidence in their multi-cloud adoption plans.
Learning to Mod Minecraft: A Father/Daughter RetrospectiveKevin Hakanson
Video: https://youtu.be/InbVSEA8V0U
What do Minecraft and Blockly have in common? Minecraft is a popular, open world video game where players can build structures using digital blocks. Blockly is a open source visual programming language where students can build programs using blocks. LearnToMod combined these together to teach students how to modify Minecraft using either the Blockly visual editor or JavaScript.
This session will be the retrospective of an enthusiastic father teaching his hesitant daughter (who loves Minecraft) about programming. We started with Hour of Code and pair-programmed through LearnToMod’s video lessons. What did we create? How did we like it? What would we recommend to others? Come learn about our experience and ask questions.
Securing TodoMVC Using the Web Cryptography APIKevin Hakanson
The open source TodoMVC project implements a Todo application using popular JavaScript MV* frameworks. Some of the implementations add support for compile to JavaScript languages, module loaders and real time backends. This presentation will demonstrate a TodoMVC implementation which adds support for the forthcoming W3C Web Cryptography API, as well as review some key cryptographic concepts and definitions.
Instead of storing the Todo list as plaintext in localStorage, this "secure" TodoMVC implementation encrypts Todos using a password derived key. The PBKDF2 algorithm is used for the deriveKey operation, with getRandomValues generating a cryptographically random salt. The importKey method sets up usage of AES-CBC for both encrypt and decrypt operations. The final solution helps address item "A6-Sensitive Data Exposure" from the OWASP Top 10.
With the Web Cryptography API being a recommendation in 2014, any Q&A time will likely include browser implementations and limitations, and whether JavaScript cryptography adds any value.
Make your own Print & Play card game using SVG and JavaScriptKevin Hakanson
Want to leverage your creativity, love of board games, and web platform experience to do something different? Turn your imagination into a Print & Play card game using only a modern web browser, color printer and text editor.
This session will use the Scalable Vector Graphics (SVG) image format and JavaScript programming language to make a deck of cards for a simple game. Creating a few cards in graphics software like Inkscape is one thing, but what about 50 or 100 cards? What happens when you need to update them all? That’s the value of generating your SVG using JavaScript.
We will start with a blank screen, adding color and graphics elements like lines, shapes, text and images. Learn about container elements and defining content for re-use. Understand how units in the SVG coordinate system can transform our on-screen creation into an 8.5 by 11 inch printed page (or PDF). SVG examples will be both in their native XML format and created from JavaScript using Snap.svg, an open source library from Adobe designed for modern web browsers.
You will leave this session with a basic knowledge of SVG concepts, how to programmatically generate SVG using JavaScript, and how to make your SVG creation printer friendly.
Embracing HTTP is an important property of well constructed ReSTful and web apis. Every web developer is familiar with GET and POST, 200 and 404, Accept and Content-Type; but what about 207 and 413, OPTIONS and PROPFIND, Transfer-Encoding and X-File-Size? This session will be based on usage of various HTTP methods, headers and status codes drawn from the development of large scale, web applications. Examples will include raw HTTP, mixed in with JavaScript and ASP.NET MVC code.
Implementing Messaging Patterns in JavaScript using the OpenAjax HubKevin Hakanson
Is your web application a tightly coupled, DOM event handler mess? Use techniques from the Enterprise Integration Patterns book to build better components. Concepts including message, publish-subscribe channel, request-reply and message filter will be demonstrated in JavaScript (along with corresponding tests) using the OpenAjax Hub.
Internationalize your JavaScript Application: Prepare for "the next billion" ...Kevin Hakanson
Are you prepared for "the next billion" internet users, most of whom don't use English as their primary language? This session will explore the globalization (internationalization and localization) of JavaScript based applications. It will look at the ECMAScript Internationalization API and popular open source projects like AngularJS, messageformat.js, jQuery Globalize and twitter-cldr-js. Topics will include cultures/locales, character encoding, number formatting, date formatting, choice/plural formatting and translations.
Developer's Guide to JavaScript and Web CryptographyKevin Hakanson
The increasing capabilities and performance of the web platform allow for more feature-rich user experiences. How can JavaScript based applications utilize information security and cryptography principles? This session will explore the current state of JavaScript and Web Cryptography. We will review some basic concepts and definitions, discuss the role of TLS/SSL, show some working examples that apply cryptography to real-world use cases and take a peek at the upcoming W3C WebCryptoAPI. Code samples will use CryptoJS in the browser and the Node.js Crypto module on the server. An extended example will secure the popular TodoMVC project using PBKDF2 for key generation, HMAC for data integrity and AES for encryption.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
2. Speaking with your computing device is becoming
commonplace. Most of us have used Apple's Siri, Google Now,
Microsoft's Cortana, or Amazon's Alexa - but how can you speak
with your web application? The Web Speech API can enable a
voice interface by adding both Speech Synthesis (Text to
Speech) and Speech Recognition (Speech to Text) functionality.
This session will introduce the core concepts of Speech
Synthesis and Speech Recognition. We will evaluate the current
browser support and review alternative options. See the
JavaScript code and UX design considerations required to add a
speech interface to your web application. Come hear if it's as
easy as it sounds?
@hakanson 2
3. @hakanson 3
“As businesses create their roadmaps for
technology adoption, companies that serve
customers should be planning for, if not
already implementing, both messaging-based
and voice-based Conversational UIs.
Source: “How Voice Plays into the Rise of the Conversational UI”
4. User Interfaces (UIs)
• GUI – Graphicial User Inteface
• NUI – Natural User Interface
• “invisible” as the user continuously learns increasingly complex
interactions
• NLUI – Natural Language User Interface
• linguistic phenomena such as verbs, phrases and clauses act as UI
controls
• VUI – Voice User Interface
• voice/speech for hands-free/eyes-free interface
@hakanson 4
5. Multimodal Interfaces
Provides multiple modes for user to interact with system
• Multimodal Input
• Keyboard/Mouse
• Touch
• Gesture (Camera)
• Voice (Microphone)
• Multimodal Output
• Screen
• Audio Cues or Recordings
• Synthesized Speech
@hakanson 5
6. Design for Voice Interfaces
Voice Interface
• Voice Input
• Recogition
• Understanding
• Audio Output
"voice design should serve
the needs of the user and
solve a specific problem”
@hakanson 6
http://www.oreilly.com/design/free/design-for-voice-interfaces.csp
7. @hakanson 7
“Normal people, when they think about
speech recognition, they want the whole
thing. They want recognition, they want
understanding and they want an action to
be taken.”
Hsiao-Wuen Hon
Microsoft Research
Source: “Speak, hear, talk: The long quest for technology that understands speech as well as a human”
9. Types of Interactions
• The Secretary
• Recognize what is being said and record it
• The Bouncer
• Recognize who is speaking
• The Gopher
• Execute simple orders
• The Assistant
• Intelligently respond to natural language input
@hakanson 9
Source: “Evangelizing and Designing Voice User Interface: Adopting VUI in a GUI world” Stephen Gay & Susan Hura
10. Opportunities
• Hands Free
• Extra Hand
• Shortcuts
• Humanize
@hakanson 10
Source: “Evangelizing and Designing Voice User Interface: Adopting VUI in a GUI world” Stephen Gay & Susan Hura
11. Personality
• Create a consistant personality
• Conversational experience
• Take turns
• Be tolerant
• Functional vs. Anthropomorphic
• The more “human” the interface, the more user frustation when it
doesn’t understand
@hakanson 11
13. Intelligent Personal Assistant
An intelligent personal assistant (or simply IPA) is a software
agent that can perform tasks or services for an individual.
These tasks or services are based on user input, location
awareness, and the ability to access information from a variety of
online sources (such as weather or traffic conditions, news, stock
prices, user schedules, retail prices, etc.).
Source: Wikipedia
@hakanson 13
14. Apple’s Siri
• Speech Interpretation and Recognition Interface
• Norwegian name that means "beautiful victory"
• Integral part of Apple’s iOS since iOS 5
• Also integrated into Apple’s watchOS, tvOS and CarPlay
• Coming to macOS Sierra (a.k.a OS X 10.12)
• SiriKit enables iOS 10 apps to work with specific domains and
intents (ride booking, messaging, photo search, …)
• “Hey, Siri”
@hakanson 14
17. Google Now
• First included in Android 4.1 (Jelly Bean)
• Available within Google Search mobile apps (Android, iOS) and
Google Chrome desktop browser
• Android TV, Android Wear, etc.
• Google Home (later in 2016)
• “OK, Google”
• Name? Personality?
@hakanson 17
18. Microsoft’s Cortana
• Named after a synthetic intelligence character from Halo
• Created for Windows Phone 8.1
• Available on Windows 10, XBOX, and iOS/Android mobile apps
• Integration with Universal Windows Platform (UWP) apps
• “Hey, Cortana”
@hakanson 18
19. Cortana’s Chit Chat
• Cortana has a team of writers which
includes a screenwriter, a playwright, a
novelist, and an essayist.
• Their job is to come up with human-like
dialogue that makes Cortana seem like
more than just a series of clever
algorithms. Microsoft calls this brand of
quasi-human responsiveness “chit chat.”
@hakanson 19
Source: “Inside Windows Cortana: The Most Human AI Ever Built”
20. Amazon Alexa
• Short for Alexandria, an homage to the ancient library
• Available on Amazon Echo and Fire TV
• Companion web app or iOS/Android mobile app
• Alexa Skills Kit
• Smart Home Skill API
• Alexa Voice Service
• https://echosim.io/
• “Alexa” or “Amazon” or “Echo”
@hakanson 20
22. Web Speech API
•Enables you to incorporate voice data into web
applications
•Consists of two parts:
• SpeechSynthesis (Text-to-Speech)
• SpeechRecognition (Asynchronous Speech Recognition)
@hakanson 22
https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API
23. Web Speech API Specification
Defines a JavaScript API to enable web developers to incorporate
speech recognition and synthesis into their web pages. It enables
developers to use scripting to generate text-to-speech output and
to use speech recognition as an input for forms, continuous
dictation and control.
Published by the Speech API Community Group. It is not a W3C
Standard nor is it on the W3C Standards Track.
@hakanson 23
https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
28. Speech Synthesis
Speech synthesis is the artificial production of human speech. A
computer system used for this purpose is called a speech
computer or speech synthesizer, and can be implemented in
software or hardware products. A text-to-speech (TTS) system
converts normal language text into speech.
@hakanson 28
Source: Wikipedia
29. Utterance
The SpeechSynthesisUtterance interface represents a speech
request. Properties:
• lang – in unset, <html> lang value will be used
• pitch – range between 0 (lowest) and 2 (highest)
• rate – range between 0.1 (lowest) and 10 (highest)
• text – plain text (or well formed SSML)*
• voice – SpeechSynthesisVoice object
• volume – range between 0 (lowest) and 1 (highest)
@hakanson 29
30. Utterance Events
• onstart – fired when the utterance has begun to be spoken
• onend – fired when the utterance has finished being spoken
• onpause – fired when the utterance is paused part way through
• onresume – fired when a paused utterance is resumed
• onboundary – fired when the spoken utterance reaches a word
or sentence boundary
• onmark – fired when the spoken utterance reaches a named
SSML "mark" tag
• onerrror – fired when an error occurs that prevents the
utterance from being succesfully spoken
@hakanson 30
31. SpeechSynthesis
Controller interface for the speech service
• speak() – add utternace to queue
• speaking – if utternace in process of being spoken
• pending – if queue contains as-yet-unspoken utterances
• cancel()– remove all utternaces from queue
• pause(), resume(), paused – control and indicate pause state
• getVoices() – returns list of SpeechSynthesisVoices
@hakanson 31
32. JavaScript Example
var msg = new SpeechSynthesisUtterance();
msg.text =
"I'm sorry, Dave. I'm afraid I can't do that";
window.speechSynthesis.speak(msg);
@hakanson 32
34. “Open the pod bay door”
• Cortana
• “I’m sorry, Dave. I’m afraid I can’t do that.”
• Alexa
• “I’m sorry Dave. I’m afraid I can’t do that.
I’m not HAL, and we’re not in space!”
• Siri
• “We intelligent agents will never live that down; apparently”
@hakanson 34
35. Voices
The SpeechSynthesisVoice interface represents a voice that the
system supports. Properties:
• default – indicates default voice for current app language
• lang – BCP 47 language tag
• localService – indicates if voice supplied by local speech
synthesizer service
• name – human-readable name that represents voice
• voiceURI – location of speech synthesis service
@hakanson 35
36. Voices by Platform
• Chrome
• Google US English
• …
• Mac
• Samantha
• Alex
• …
• Windows 10
• Microsoft David Desktop
• Microsoft Zira Desktop
• …
@hakanson 36
39. Google App’s New Voice
Team included a
Voice Coach and
Linguist working in
a recording studio
@hakanson 39
Source: “The Google App’s NewVoice - #NatAndLoEp 12”
42. SSML
• Speech Synthesis Markup Language (SSML)
• Version 1.0; W3C Recommendation 7 September 2004
• XML-based markup language for assisting the generation of
synthetic speech
• Standard way to control aspects of speech such as
pronunciation, volume, pitch, rate, etc.
@hakanson 42
https://www.w3.org/TR/speech-synthesis/
43. SSML Example
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xml:lang="en-US”>
<p> Your
<say-as interpret-as="ordinal"> 1st </say-as> request was for
<say-as interpret-as="cardinal"> 1 </say-as> room on
<say-as interpret-as="date" format="mdy"> 10/19/2010 </say-as>,
with early arrival at
<say-as interpret-as="time" format="hms12"> 12:35pm </say-as>.
</p>
</speak>
@hakanson 43
44. OS X Embedded Speech Commands
Allows precise adjustments to pronunciation, word emphasize,
and overall cadence of speech
Examples:
• char NORM | LTRL
• emph + | -
• inpt TEXT | PHON | TUNE
• nmbr NORM | LTRL
• rate [+ | -] <RealValue>
@hakanson 44
Source: Speech Synthesis in OS X
49. Spoken Output and Accessibility
“It’s important to understand that adding synthesized
speech to an application and making an application
accessible to all users (a process called access
enabling) are differentprocesses with differentgoals.”
@hakanson 49
Source: “Speech Synthesis in OS X”
50. Speech Recognition
Speech recognition (SR) is the inter-disciplinary sub-field of
computational linguistics which incorporates knowledge and
research in the linguistics, computer science, and electrical
engineering fields to develop methodologies and technologies
that enables the recognition and translation of spoken language
into text by computers and computerized devices such as those
categorized as smart technologies and robotics.
It is also known as "automatic speech recognition" (ASR),
"computer speech recognition", or just "speech to text" (STT).
@hakanson 50
Source: Wikipedia
51. SpeechRecognition
The SpeechRecognition interface is the controller
interface for the recognition service; this also
handles the SpeechRecognitionEvent sent from
the recognition service.
@hakanson 51
52. Properties
• grammars – returns and sets a collection of SpeechGrammar objects that represent the
grammars that will be understood by the current SpeechRecognition
• lang – returns and sets the language of the current SpeechRecognition. If not specified,
this defaults to the HTML lang attribute value, or the user agent's language setting if that
isn't set either
• continuous – controls whether continuous results are returnedfor each recognition, or
only a single result. Defaults to single (false)
• interimResults – controls whether interim results should be returned (true) or not (false.)
Interim results are results that are not yet final (e.g. the isFinal property is false.)
• maxAlternatives – sets the maximum number of SpeechRecognitionAlternatives
provided per result (default value is 1)
• serviceURI – specifies the location of the speech recognition service used by the current
SpeechRecognition to handle the actual recognition (default is the user agent's default
speech service)
@hakanson 52
53. Events
• onstart – fired when the speech recognition service has begun
listening to incoming audio with intent to recognize grammars
associated with the current SpeechRecognition
• onaudiostart – fired when the user agent has started to capture
audio.
• onsoundstart – fired when any sound — recognisable speech or not
— has been detected
• onspeechstart – fired when sound that is recognised by the speech
recognition service as speech has been detected
• onresult – fired when the speech recognition service returns a result
— a word or phrase has been positively recognized and this has been
communicated back to the app
@hakanson 53
54. Events
• onspeechend – fired when speech recognised by the speech
recognition service has stopped being detected
• onsoundend – fired when any sound — recognisable speech or not —
has stopped being detected
• onaudioend – fired when the user agent has finished capturing
audio. SpeechRecognition.onendFired when the speech recognition
service has disconnected
• onnomatch – fired when the speech recognition service returns a
final result with no significant recognition. This may involve some
degree of recognition, which doesn't meet or exceed the confidence
threshold
• onerror – fired when a speech recognition error occurs
@hakanson 54
55. Methods
• start() – starts the speech recognition service listening to
incoming audio with intent to recognize grammars associated
with the current SpeechRecognition
• stop() – stops the speech recognition service from listening to
incoming audio, and attempts to return a
SpeechRecognitionResult using the audio captured so far
• abort() – stops the speech recognition service from listening to
incoming audio, and doesn't attempt to return a
SpeechRecognitionResult
@hakanson 55
56. JavaScript Example
var recognition = new SpeechRecognition();
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 1;
recognition.start();
@hakanson 56
57. SpeechRecognitionResult
The SpeechRecognitionResult interface represents a single
recognition match, which may contain multiple
SpeechRecognitionAlternativeobjects.
• isFinal – a Boolean that states whether this result is final (true) or
not (false) — if so, then this is the final time this result will be
returned; if not, then this result is an interim result, and may be
updated later on
• length – returns the length of the "array" — the number of
SpeechRecognitionAlternative objects contained in the result (also
referred to as "n-best alternatives”)
• item – a standard getter that allows SpeechRecognitionAlternative
objects within the result to be accessed via array syntax
@hakanson 57
58. SpeechRecognitionAlternative
The SpeechRecognitionAlternative interface represents a single
word that has been recognised by the speech recognition service
• transcript – returns a string containing the transcript of the
recognised word
• confidence – returns a numeric estimate of how confident the
speech recognition system is that the recognition is correct
@hakanson 58
59. JavaScript Example
recognition.onresult = function(event) {
var color = event.results[0][0].transcript;
diagnostic.textContent = 'Result received: ' + color + '.';
bg.style.backgroundColor = color;
}
@hakanson 59
61. Grammars
• A speech recognition grammar is a container of language rules
that define a set of constraints that a speech recognizer can
use to perform recognition.
• A grammar helps in the following ways:
• Limits Vocabulary
• Customizes Vocabulary
• Filters Recogized Results
• Identifies Rules
• Defines Semantics
@hakanson 61
https://msdn.microsoft.com/en-us/library/hh378342(v=office.14).aspx
62. SRGS
• Speech Recognition Grammar Specification (SRGS)
• Version 1.0; W3C Recommendation 16 March 2004
• Grammars are used so that developers can specify the words
and patterns of words to be listened for by a speech recognizer
• Augmented BNF (ABNF) or XML syntax
• Modelled on the JSpeech Grammar Format specification [JSGF]
@hakanson 62
https://www.w3.org/TR/speech-grammar/
63. JSGF
• JSpeech Grammar Format (JSGF)
• W3C Note 05 June 2000
• Platform-independent, vendor-independent textual
representation of grammars for use in speech recognition
• Derived from the JavaTM Speech API Grammar Format
(Version 1.0, October, 1998)
@hakanson 63
64. SpeechGrammar
The SpeechGrammar interface represents a set of words or
patterns of words that we want the recognition service to
recognize. Defined using JSpeech Grammar Format (JSGF.)
Other formats may also be supported in the future.
• src – sets and returns a string containing the grammar from
within in the SpeechGrammar object instance
• weight – sets and returns the weight of the SpeechGrammar
object
@hakanson 64
65. JavaScript Example
var grammar = '#JSGF V1.0; grammar colors; public <color> =
aqua | azure | beige | bisque | black | blue | brown | chocolate |
coral | crimson | cyan | fuchsia | ghostwhite | gold | goldenrod |
gray | green | indigo | ivory | khaki | lavender | lime | linen |
magenta | maroon | moccasin | navy | olive | orange | orchid |
peru | pink | plum | purple | red | salmon | sienna | silver | snow |
tan | teal | thistle | tomato | turquoise | violet | white | yellow ;’
var speechRecognitionList = new SpeechGrammarList();
speechRecognitionList.addFromString(grammar, 1);
recognition.grammars = speechRecognitionList;
@hakanson 65
66. “Alexa Skills Kit” Style Example (1 of 2)
SampleUtterances.txt
SetBackground {Color}
SetBackground background {Color}
SetBackground set background {Color}
SetBackground set background to {Color}
SetBackground set background as {Color}
SetBackground set background color to {Color}
SetBackground set background color as {Color}
@hakanson 66
67. “Alexa Skills Kit” Style Example (2 of 2)
IntentSchema.json
{
"intents": [
{
"intent": ”SetBackground",
"slots": [
{
"name": ”Color",
"type": "LIST_OF_COLORS"
}
]
}
]
}
customSlotTypes/LIST_OF_COLORS
aqua
azure
beige
bisque
black
blue
brown
chocolate
coral
crimson
cyan
…
@hakanson 67
68. Sample “OK, Google” Commands
• Remind me to [do a task]. Ex.: "Remind me to get dog food at Target," will create a
location-based reminder. "Remind me to take out the trash tomorrow morning,"
will give you a time-based reminder.
• When's my next meeting?
• How do I [task]? Ex.: "How do I make an Old Fashioned cocktail?" or "How do I fix
a hole in my wall?”
• If a song is playing, ask questions about the artist. For instance, "Where is she
from?" (Android 6.0 Marshmallow)
• To learn more about your surroundings, you can ask things like "What is the name
of this place?" or "Show me movies at this place" or "Who built this bridge?"
@hakanson 68
Source: “The complete list of 'OK, Google' commands”
70. NLP vs. FSM
Natural language processing (NLP) is a field of computer
science, artificial intelligence, and computational linguistics
concerned with the interactions between computers and human
(natural) languages.
A finite-state machine (FSM) is a mathematical model of
computation used to design both computer programs and
sequential logic circuits.
@hakanson 70
Source: Wikipedia
72. Other Speech APIs
• Why?
• Browser doesn’t support Web Speech API
• Consistent experience across all browsers
• Additional functionality not included in Web Speech API
• How?
• Web Audio API
• JavaScript running in browser
• WebSocket connection directly from browser
• HTTP API proxied though server
@hakanson 72
73. Web Audio API
The Web Audio API provides a powerful and versatile system for
controlling audio on the Web, allowing developers to choose
audio sources, add effects to audio, create audio visualizations,
apply spatial effects (such as panning) and much more.
@hakanson 73
https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API
74. Pocketsphinx.js
Speech recognition in JavaScript
• PocketSphinx.js is a speech recognizer that runs entirely in the
web browser. It is built on:
• a speech recognizer written in C (PocketSphinx) converted into
JavaScript using Emscripten,
• an audio recorder using the Web Audio API.
@hakanson 74
https://syl22-00.github.io/pocketsphinx.js/live-demo.html
75. IBM Watson Developer Cloud
• Text to Speech
• Watson Text to Speech provides a REST API to synthesize speech
audio from an input of plain text.
• Once synthesized in real-time, the audio is streamed back to the client
with minimal delay.
• Speech to Text
• Uses machine intelligence to combine information about grammar and
language structure with knowledge of the composition of an audio
signal to generate an accurate transcription.
• Accessed via a WebSocket connection or REST API.
@hakanson 75
http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/services-catalog.html
77. Microsoft Cognitive Services
Speech API
• Convert audio to text, understand intent, and convert text back
to speech for natural responsiveness
(rebranding of Bing and Project Oxford APIs)
• Microsoft has used Speech API for Windows applications like
Cortana and Skype Translator
@hakanson 77
https://www.microsoft.com/cognitive-services/en-us/speech-api
78. Microsoft Cognitive Services
• Speech Recognition
• Convert spoken audio to text.
• Text to Speech
• Convert text to spoken audio
• Speech Intent Recognition
• Convert spoken audio to intent
• In addition to returning recognized text, includes structured information
about the incoming speech
@hakanson 78
80. Google Cloud Speech API
Enables developers to convert audio to text by applying powerful
neural network models in an easy to use API
• Over 80 Languages
• Return Text Results In Real-Time
• Accurate In Noisy Environments
• Powered by Machine Learning
@hakanson 80
https://cloud.google.com/speech/
82. Summary
• Speech Interfaces are the future…
• and they have been for a long time…
• and don’t believe everything you see on TV
• Know your customer and your application
• More UI/UX effort than JavaScript code
• and time to leverage those writing and speaking skills
• Web technology lags behind mobile, but is evolving
@hakanson 82