A Journey with Microsoft Cognitive Service I

Azure Weekend 2020 –
A Journey with Microsoft
Azure Cognitive Service I
Marvin Heng | @hmheng
www.techconnect.io

Azure
Cognitive
Services
From faces to feelings, allow your
apps to understand images and video
Hear and speak to your users by filtering noise, identifying
speakers, and understanding intent
Process text and learn how to recognize what
users want
Tap into rich knowledge amassed from
the web, academia, or your own data
Access billions of web pages, images, videos, and news with
the power of Bing APIs

Why Azure Cognitive Services ?

Cognitive Services
Emotion
Computer Vision
Face
Video Indexer
Form Recognizer
Speech To Text
Text To Speech
Speech Translation
Speaker Recognition
Immersive Reader
Language
Understanding
QnA Maker
Text Analytics
Translator
Anomaly Detector
Content Moderator
Metrics Advisor
Personalizer
Bing Autosuggest
Bing Custom Search
Bing Entity Search
Bing Image Search
Bing News Search
Bing Spell Check
Bing Video Search
Bing Visual Search
Bing Web Search
Custom Vision

Bing Search
• Allow developers to integrate a search function to their apps that
allows users to find webpages, images, news, locations, and more
without advertisements
• For knowledge mining

Bing Search
Autosuggest
Entity Search
Custom Search
Image Search
News Search
Video Search
Visual Search
Spell Check
Local Business

Computer Vision
• Computer vision is an area of artificial intelligence (AI) in which
software systems are designed to perceive the world visually,
though cameras, images, and video.
• Computer vision is one of the core areas of artificial intelligence
(AI), and focuses on creating solutions that enable AI-enabled
applications to "see" the world and make sense of it.

Use Cases of Computer Vision
• Analyze an image and suggest an appropriate caption.
• Suggest relevant tags that could be used to index an image.
• Categorize an image.
• Identify objects in an image.
• Detect faces and people in an image.
• Recognize celebrities and landmarks in an image.
• Read text in an image.

What can CV tell us?
• A black and white photo of a city
• A black and white photo of a large city
• A large white building in a city

Not only that! It tags too!
• Tagging
• Type of identified object
• Bounding Box
• Set of coordinates (Top, left, width and height)

Categorization in 86-category taxonomy
abstract_ animal_horse building_street food_grilled others_ outdoor_road people_hand plant_tree text_menu
abstract_net animal_panda dark_ food_pizza outdoor_
outdoor_sportsf
ield people_many object_screen text_sign
abstract_nonph
oto building_ drink_ indoor_ outdoor_city
outdoor_stoner
ock people_portrait
object_sculptur
e trans_bicycle
abstract_rect building_arch drink_can
indoor_churchw
indow outdoor_field outdoor_street people_show sky_cloud trans_bus
abstract_shape
building_brickw
all dark_fire indoor_court outdoor_grass outdoor_water people_tattoo sky_sun trans_car
abstract_texture building_church dark_fireworks
indoor_doorwin
dows outdoor_house
outdoor_watersi
de people_young
people_swimmi
ng
trans_trainstatio
n
animal_ building_corner sky_object
indoor_markets
tore
outdoor_mount
ain people_ plant_ outdoor_pool
animal_bird
building_doorwi
ndows food_ indoor_room
outdoor_oceanb
each people_baby plant_branch text_
animal_cat building_pillar food_bread indoor_venue
outdoor_playgro
und people_crowd plant_flower text_mag
animal_dog building_stair food_fastfood dark_light outdoor_railway people_group plant_leaves text_map

Optical character recognition
Faith
CAN MOVE
MOUNTAINS

Some Additional Capabilities
• Detect image
• Detect image color schemes
• Generate thumbnails
• Moderate content

Custom Vision
• Azure Custom Vision is an image recognition service that lets you
build, deploy, and improve your own image identifiers.
• An image identifier applies labels (which represent classes or
objects) to images, according to their visual characteristics.
• The Custom Vision service uses a machine learning algorithm to
analyze images.

What can Custom Vision do?
• Classification
• Object Detection
• Export as standalone offline
model for your app
development.

Face Verification
Verification result: The two faces belong to the same
person. Confidence is 0.93468.

Video Indexer
• Video Indexer provides ability to extract deep insights
(with no need for data analysis or coding skills) using
machine learning models based on multiple channels
(voice, vocals, visual).
• The service enables deep search, reduces operational
costs, enables new monetization opportunities, and
creates new user experiences on large archives of
videos (with low entry barriers).

Video Indexer
• Keywords extraction
• Named entities extraction
• Topic inference
• Artifacts Sentiment analysis: Identifies positive, negative, and
neutral sentiments from speech and visual text.

Use Cases of Video Indexer
• Deep search
• Content creation
• Accessibility.
• Monetization
• Content moderation
• Recommendations

Video Indexer
Face detection
Celebrity identification
Account-based face identification
Visual text recognition
Visual content moderation
Labels identification
Scene segmentation
Shot detection
Black frame detection
Keyframe extraction
Rolling credits
Animated characters detection
Editorial shot type detection
Audio transcription
Automatic language detection
Multi-language speech identification and transcription
Two channel processing
Closed captioning
Noise reduction
Transcript customization (CRIS)
Speaker enumeration
Speaker statistics
Textual content moderation
Audio effects
Emotion detection
Translation

Form Recognizer
• Extract text and data from business’s forms and documents.
• Easily extract text and structure, with simple REST API
• Pre-trained model:
• Receipt
• Business Card
• Layouts
• Custom Trained Model
• Supports printed and handwritten forms, PDFs and images.
• Container support

What can you do with Form Recognizer?
• Automate written text > digital text conversion
• Automate capturing receipt data
• Automate converting business card into digital contacts

Speech-to-Text
• Speech-to-text service
• Improves meeting efficiency by transcribing conversations in real-time
• Help safeguard data with industry-leading security and compliance
certifications.
• Integrates with a variety of meeting conference solutions including
Microsoft Teams and other third-party meeting software.
• SDK is available.

Speaker Recognition
“who is speaking?”

Speaker Verification
• Text-dependent verification means
speakers need to choose the same
passphrase to use during both
enrollment and verification phases.
• Text-independent verification means
speakers can speak in everyday
language in the enrollment and
verification phrases.

Text-to-Speech
• Convert text into human-like synthesized speech.
• Offer 75+ standard in more than 45 languages and locales, and 5
neural voices
• Tune voice output by easily adjusting rate, pitch, pronunciation,
pauses, and more.
• Speech synthesis
• Asynchronous synthesis of long audio
• Speech Synthesis Markup Language (SSML)

Speech Translation
Microsoft’s
Translation
Engine
Statistical
machine
translation
(SMT)
Neural
machine
translation
(NMT)

Speech Translation
• Speech-to-text translation with recognition results.
• Speech-to-speech translation.
• Support for translation to multiple target languages.
• Interim recognition and translation results.

A Journey with Microsoft Cognitive Service I

More Related Content

Similar to A Journey with Microsoft Cognitive Service I

More from Marvin Heng

Recently uploaded

A Journey with Microsoft Cognitive Service I