A Journey with Microsoft Cognitive Service I
This slide is about Microsoft Cognitive Services. By going through you will understand what and how Microsoft Cognitive Service works.
Marvin Heng
Medium: @hmheng
Twitter: @hmheng
Github: hmheng
5. Azure
Cognitive
Services
From faces to feelings, allow your
apps to understand images and video
Hear and speak to your users by filtering noise, identifying
speakers, and understanding intent
Process text and learn how to recognize what
users want
Tap into rich knowledge amassed from
the web, academia, or your own data
Access billions of web pages, images, videos, and news with
the power of Bing APIs
9. Bing Search
• Allow developers to integrate a search function to their apps that
allows users to find webpages, images, news, locations, and more
without advertisements
• For knowledge mining
12. Computer Vision
• Computer vision is an area of artificial intelligence (AI) in which
software systems are designed to perceive the world visually,
though cameras, images, and video.
• Computer vision is one of the core areas of artificial intelligence
(AI), and focuses on creating solutions that enable AI-enabled
applications to "see" the world and make sense of it.
13. Use Cases of Computer Vision
• Analyze an image and suggest an appropriate caption.
• Suggest relevant tags that could be used to index an image.
• Categorize an image.
• Identify objects in an image.
• Detect faces and people in an image.
• Recognize celebrities and landmarks in an image.
• Read text in an image.
14. What can CV tell us?
• A black and white photo of a city
• A black and white photo of a large city
• A large white building in a city
15. Not only that! It tags too!
• Tagging
• Type of identified object
• Bounding Box
• Set of coordinates (Top, left, width and height)
22. Custom Vision
• Azure Custom Vision is an image recognition service that lets you
build, deploy, and improve your own image identifiers.
• An image identifier applies labels (which represent classes or
objects) to images, according to their visual characteristics.
• The Custom Vision service uses a machine learning algorithm to
analyze images.
23. What can Custom Vision do?
• Classification
• Object Detection
• Export as standalone offline
model for your app
development.
29. Video Indexer
• Video Indexer provides ability to extract deep insights
(with no need for data analysis or coding skills) using
machine learning models based on multiple channels
(voice, vocals, visual).
• The service enables deep search, reduces operational
costs, enables new monetization opportunities, and
creates new user experiences on large archives of
videos (with low entry barriers).
30. Video Indexer
• Keywords extraction
• Named entities extraction
• Topic inference
• Artifacts Sentiment analysis: Identifies positive, negative, and
neutral sentiments from speech and visual text.
33. Use Cases of Video Indexer
• Deep search
• Content creation
• Accessibility.
• Monetization
• Content moderation
• Recommendations
34. Video Indexer
Face detection
Celebrity identification
Account-based face identification
Visual text recognition
Visual content moderation
Labels identification
Scene segmentation
Shot detection
Black frame detection
Keyframe extraction
Rolling credits
Animated characters detection
Editorial shot type detection
Audio transcription
Automatic language detection
Multi-language speech identification and transcription
Two channel processing
Closed captioning
Noise reduction
Transcript customization (CRIS)
Speaker enumeration
Speaker statistics
Textual content moderation
Audio effects
Emotion detection
Translation
35. Form Recognizer
• Extract text and data from business’s forms and documents.
• Easily extract text and structure, with simple REST API
• Pre-trained model:
• Receipt
• Business Card
• Layouts
• Custom Trained Model
• Supports printed and handwritten forms, PDFs and images.
• Container support
36. What can you do with Form Recognizer?
• Automate written text > digital text conversion
• Automate capturing receipt data
• Automate converting business card into digital contacts
39. Speech-to-Text
• Speech-to-text service
• Improves meeting efficiency by transcribing conversations in real-time
• Help safeguard data with industry-leading security and compliance
certifications.
• Integrates with a variety of meeting conference solutions including
Microsoft Teams and other third-party meeting software.
• SDK is available.
42. Speaker Verification
• Text-dependent verification means
speakers need to choose the same
passphrase to use during both
enrollment and verification phases.
• Text-independent verification means
speakers can speak in everyday
language in the enrollment and
verification phrases.
43. Text-to-Speech
• Convert text into human-like synthesized speech.
• Offer 75+ standard in more than 45 languages and locales, and 5
neural voices
• Tune voice output by easily adjusting rate, pitch, pronunciation,
pauses, and more.
• Speech synthesis
• Asynchronous synthesis of long audio
• Speech Synthesis Markup Language (SSML)