Unblocking The Main Thread Solving ANRs and Frozen Frames
Speech Devices SDK
1.
2.
3. VisionMicrosoft Speech Services
Speech To Text
Convert speech to text
and back again, and
understand its intent
Custom Speech to
Text
Fine-tune speech
recognition for anyone,
anywhere
TTS/Custom Voice
Speech recognition and
analytics and
transcription
Speaker Recognition
Give your app the ability
to know who's talking
Translator
Speech translation
4. The Speech Devices SDK is pre-
packaged software fine tuned to
specific hardware (dev kits), that
makes it easy to integrate with the
full range of cloud-based Microsoft
Speech services, creating rich user
experiences for customers.
The Speech Devices SDK allows you
to choose your device’s custom
“wake word” – the cue that initiates
a user interaction. Working with
Microsoft Speech services and other
APIs, the SDK enables tailored Voice
AI experiences.
7. Software and Services:
• Speech Devices SDK (from Microsoft)
• Premium audio processing solution
• Wake Word recognition
• Communicate with the Microsoft Speech services
• Wake Word customization (from Microsoft)
• (Microsoft Speech services and other Azure services for additional cost)
• Device logic and tools (from the dev kit manufacture)
Dev Kit (from the 3rd party provider):
• CPU: QUALCOMM AP08009 4coreA7 1.1 GHz
• Memory: LPDDR3+eMMC, 1GB + 8GB
• Mic Array: I2S Mic x 6+1 or 4
• Network: 802.11 b/g/n
• Charging: DC jack 2.5mm 12V 1.5A
Documentation and Support:
• Hardware tech docs (from the dev kit provider)
• Device hardware specs (from the dev kit provider)
• Sample app and sample code (from Microsoft)
• Documentation (from Microsoft http://aka.ms/sdsdk-info)
• Online Support
8. Speech Services SDK
Azure Speech Services
Microphone Array Audio Stack
Keyword Spotter
Client API
Multi-Channel Raw
Audio Input
Your Application
Speech Audio
Speech Audio
Speech AudioText Transcription,
and intent
Speech Audio
9.
10. Evaluate building
an ambient
device
Visit the Azure
Speech Service site
Learn more
On
https://aka.ms/sds
dk-info on
http://ddk.roobo.c
om
Decide to
Try
Order the
Dev Kit
Through a third
party’s website
Wait for the
Dev Kit to
arrive
Can opt to try out
the Speech
Services on the
PC, while waiting
for the Hardware
Received
the Dev Kit
Use the sample
code and default
KWS to test
everything E2E
Customize
the KWS
•Through the
Custom
Speech portal
•Deploy the
model for the
custom
keyword
Run
everything
E2E
•Build the
sample app or
their own
application and
get everything
working E2E
Complete
Evaluation
Move to
commercialization
Phase
Satisfied with the
evaluation, and want
to move to
production
Contact Dev Kit Provider
•Customization
•Production
•Pricing
•Certification/Testing
•Shipping, etc
Contact Microsoft
• Pricing/Package discussion for Speech
service
• Customization of service, if applicable
• Pricing discussion for other Azure service, if
applicable
Move to
Production
Get a Speech
Subscription Key
Sign up for the
SDK
Download SDK
14. final SpeechRecognizer reco = factory.createSpeechRecognizer();
final Task<SpeechRecognitionResult> task = reco.recognizeAsync();
setOnTaskCompletedListener(task, result -> {
final String s = result.getRecognizedText();
});
final SpeechRecognizer reco = factory.createSpeechRecognizer();
reco.IntermediateResultReceived.addEventListener((o,
speechRecognitionResultEventArgs) -> {
final String s =
speechRecognitionResultEventArgs.getResult().getRecognizedText();
Log.i(logTag, "Intermediate result received: " + s);
setRecognizedText(s);
});
final Task<SpeechRecognitionResult> task = reco.recognizeAsync();
setOnTaskCompletedListener(task, result -> {
final String s = result.getRecognizedText();
}
final SpeechRecognizer reco = factory.createSpeechRecognizer();
reco.IntermediateResultReceived.addEventListener((o, speechRecognitionResultEventArgs) -> {
final String s = speechRecognitionResultEventArgs.getResult().getRecognizedText();
});
reco.FinalResultReceived.addEventListener((o, speechRecognitionResultEventArgs) -> {
final String s = speechRecognitionResultEventArgs.getResult().getRecognizedText();
});
final Task<?> task = reco.startContinuousRecognitionAsync();
15. reco = factory.createSpeechRecognizer();
reco.SessionEvent.addEventListener((o, sessionEventArgs) -> {
if (sessionEventArgs.getEventType() == SessionEventType.SessionStartedEvent) {
//do some customized stuff
}
});
reco.IntermediateResultReceived.addEventListener((o, intermediateResultEventArgs) -> {
final String s = intermediateResultEventArgs.getResult().getRecognizedText();
});
reco.FinalResultReceived.addEventListener((o, finalResultEventArgs) -> {
String s = finalResultEventArgs.getResult().getRecognizedText();
});
final Task<?> task = reco.startKeywordRecognitionAsync(KeywordRecognitionModel.fromFile(KeywordModel));
setOnTaskCompletedListener(task, result -> {
content.set(0, "say `" + Keyword + "`...");
setRecognizedText(TextUtils.join(delimiter, content));
continuousListeningStarted = true;
});
16. final HashMap<String, String> intentIdMap = new HashMap<>();
intentIdMap.put("1", "play music");
intentIdMap.put("2", "stop");
final IntentRecognizer reco = factory.createIntentRecognizer();
LanguageUnderstandingModel intentModel = LanguageUnderstandingModel.fromSubscription(LuisRegion, LuisSubscriptionKey, LuisAppId);
for (Map.Entry<String, String> entry : intentIdMap.entrySet()) {
reco.addIntent(entry.getKey(), intentModel, entry.getValue());
}
reco.IntermediateResultReceived.addEventListener((o, intentRecognitionResultEventArgs) -> {
final String s = intentRecognitionResultEventArgs.getResult().getRecognizedText();
});
final Task<IntentRecognitionResult> task = reco.recognizeAsync();
setOnTaskCompletedListener(task, result -> {
String s = result.getRecognizedText();
String intentId = result.getIntentId();
String intent = "";
if (intentIdMap.containsKey(intentId)) {
intent = intentIdMap.get(intentId);
}
}