Microsoft Confidential
Understanding the latent value in all content
Text
(1) Validate enrichment pipeline
Tags
“throwing”, “ball”, “girl”, “grass”, “basketball”
Caption
“A girl throwing a ball”
Entities
Persons
“Anita Christiansen”,
“Conrad Nuber”,
Locations
“Bothell”, “Woodinville”
Organization
“Litware Insurance Corp.”
Computer Vision
Face
Emotion
Content Moderator
Video Indexer
Custom Vision
Service
Custom Decision
Q-n-A Maker
Language
Understanding (LUIS)
Text Analytics
Bing Spell Check
Translator Text
Speaker
Recognition
Bing Speech
Custom Speech
Translator Speech
Unified Speech
Service
Bing Autosuggest
Bing Search
Bing Entity Search
Bing Statistics add-in
Bing Visual Search
Bing Custom Search
Management Free
Keyword search
Faceting
Geospatial support
Multi-Language Support
Suggestions/auto-complete
Customizable scoring models
Proximity Search
Synonyms
etc.
INGEST
Data in any
format, any
Azure store
ENRICH EXPLORE
Annotations
Cognitive skills
Search
Annotated
Documents
Customer
Data
Built-in Cognitive Skills
OCR,
Key Phrase Extraction,
People Names,
Company Names,
Sentiment Analyzer,
Computer Vision,
etc.
Search
Index
.pdf
.doc
.jpeg
…
Third Party Enrichers
Custom classification models,
Custom entity extraction,
etc.
Azure Machine
Learning
Annotated
Documents
Built-in Cognitive Skills
OCR,
Key Phrase Extraction,
People Names,
Company Names,
Sentiment Analyzer,
Computer Vision,
etc.
Search
Index
Third Party Enrichers
Custom classification models,
Custom entity extraction,
etc.
Azure Machine
Learning
Customer
Data
.pdf
.doc
.jpeg
…
Annotated
Documents
Search
Index
Built-in Cognitive Skills
OCR,
Key Phrase Extraction,
People Names,
Company Names,
Sentiment Analyzer,
Computer Vision,
etc.
Third Party Enrichers
Custom classification models,
Custom entity extraction,
etc.
Customer
Data
.pdf
.doc
.jpeg
…
Key Phrase Extraction
Sentiment Analysis
Organization Entity Extraction
Location Entity Extraction
Persons Entity Extraction
Language Detection
Face Detection
Tag Extraction
Celebrity Recognition
Landmark Detection
Handwriting Recognition (Preview)
Printed Text Recognition
…,
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"uri" "https://myskill.azurewebsites.net/api/OrgId"
"httpHeaders": {"Api-Key": "mySecret" },
"context": "/document/organizations/*" ,
"inputs":
[
{ "name": “organizationName", "source": "/document/organizations/*" },
],
"outputs":
[
{ "name": "organizationId", "targetName": "organizationId" }
]
},
{
"values": [
{
"recordId": "7cad2",
"data":
{
"myOuput1": “animals"
}
},
{
"recordId": "7cad3",
"data":
{
"myOutput1": “colors"
}
},
…
]
}
{
"values": [
{
"recordId": "7cad2",
"data":
{
"myInput1": "fox",
"myInput2": "cat",
}
},
{
"recordId": "7cad3",
"data":
{
"myInput1": "blue",
"myInput2": "red",
}
},
…
]
}
Azure Machine
Learning
content
keyPhrases
organizations
docClass
content
normalized
images
language
tags
orgs
content
content
"skills": [
{
"@odata.type": "#Microsoft.Skills.Text.LanguageDetectionSkill",
"inputs":
[
{ "name": "text", "source": "/document/content" }
],
"outputs":
[
{ "name": "languageCode", "targetName": "myLanguageCode" },
{ "name": "languageName", "targetName": "myLanguageName" }
]
},
…,
{
"@odata.type": "#Microsoft.Skills.Text.NamedEntityRecognitionSkill",
"categories": [ "Organization" ],
"defaultLanguageCode": "en",
"inputs":
[
{ "name": "text", "source": "/document/content" },
"name" "languageCode" "source" "/document/myLanguageCode"
],
"outputs":
[
{ "name": "organizations", "targetName": "organizations" }
]
},
content
normalized
images
language
tags
orgs
content
content
…,
{
"@odata.type": "#Microsoft.Skills.Text.NamedEntityRecognitionSkill",
"categories": [ "Organization" ],
"defaultLanguageCode": "en",
"inputs":
[
{ "name": "text", "source": "/document/content" },
"name" "languageCode" "source" "/document/myLanguagecode"
],
"outputs":
[
{ "name": "organizations", "targetName": "organizations" }
]
},
…,
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"uri" "https://myskill.azurewebsites.net/api/OrgId"
"context": "/document/organizations/*" ,
"httpHeaders": {"Api-Key": "mySecret" },
"inputs":
[
{ "name": “organizationName", "source": "/document/organizations/*" },
],
"outputs":
[
{ "name": "organizationId", "targetName": "organizationId" }
]
},
Search
Index
Built-in Cognitive Skills
OCR,
Key Phrase Extraction,
People Names,
Company Names,
Sentiment Analyzer,
Computer Vision,
etc.
Third Party Enrichers
Custom classification models,
Custom entity extraction,
etc.
Customer
Data
.pdf
.doc
.jpeg
…
Annotated
Documents
/document
/languageCode /keyPhrases /organizations /images
/1
/2
/…
/n
/1
/2
/…
/n
organizationId
organizationId
organizationId
organizationId
/1
/2
/…
/n
tags
tags
tags
tags
document.pdf
Annotated
Documents
Built-in Cognitive Skills
OCR,
Key Phrase Extraction,
People Names,
Company Names,
Sentiment Analyzer,
Computer Vision,
etc.
Third Party Enrichers
Custom classification models,
Custom entity extraction,
etc.
Customer
Data
.pdf
.doc
.jpeg
…
Search
Index
/document
/keyPhrases
/0
/1
/…
/n
/organizations
/0
/1
/…
/n
organizationId
organizationId
organizationId
organizationId
/images
/0
/1
/…
/n
tags
tags
tags
tags
New Indexer Property
{
…
"outputFieldMappings":
[
{
"sourceFieldName":
"/document/organizations/*/organizationId",
"targetFieldName":
"orgIds"
} ,
…
]
}
Annotated
Documents
Customer
Data
Built-in Cognitive Skills
OCR,
Key Phrase Extraction,
People Names,
Company Names,
Sentiment Analyzer,
Computer Vision,
etc.
Search
Index
.pdf
.doc
.jpeg
…
Third Party Enrichers
Custom classification models,
Custom entity extraction,
etc.
Azure Machine
Learning
“Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut
labore et dolore magna aliqua. Ut
enim ad minim veniam, quis
nostrud exercitation ullamco
laboris nisi…”
Class A
Class B
Class C
“Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut
labore et dolore magna aliqua. Ut
enim ad minim veniam, quis
nostrud exercitation ullamco
laboris nisi…”
“Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut
labore et dolore magna aliqua. Ut
enim ad minim veniam, quis
nostrud exercitation ullamco
laboris nisi…”
Entity type A
Entity type B
Labeled
Data
Custom
Entity
Extraction
Template
Azure ML
Annotated
Documents
Customer
Data
Search
Index
Cognitive Search
Documentation | Sign up for Azure Search
Azure Machine Learning Package for Text Analytics
Documentation | Create a Data Science Virtual Machine
Cognitive Services
Documentation | Sign up
Knowledge Mining with Azure Search Technical Deck

Knowledge Mining with Azure Search Technical Deck

Editor's Notes

  • #3 Understanding latent value in all content
  • #8 I verified accuracy of this slide with Giampaolo. Notes : Voice Font is part of Unified Speech Service. Custom Decision is not out for //build.
  • #10 INGEST (Understanding documents in a variety of format) AUGMENT (Extract “information”, Create structure out of the unstructured.) EXPLORE (Search)
  • #13 MongoDB?
  • #18 TODO: Change properties (foo  bar)
  • #20 We use the term skillset to refer to all the skills that should be run as part of the enrichment process. In a basic example…
  • #21 Sometimes you need to do something more complex. For instance, you may want to use the language you detected to improve the accuracy of the key-phrase extractor. Or you may want to get metadata of metadata.
  • #24 Sometimes you need to do something more complex. For instance, you may want to use the language you detected to improve the accuracy of the key-phrase extractor. Or you may want to get metadata of metadata.
  • #28 At each step of enrichment more structure is added to the document. Before-a-skill and after-a-skill diagram. (SHOW RESTFUL CALL)
  • #30 At each step of enrichment more structure is added to the document. Before-a-skill and after-a-skill diagram. (SHOW RESTFUL CALL)
  • #35 http://medicalentitydetector.azurewebsites.net/
  • #39 TODO: Add link to Cognitive Services.