Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Andreas Blumauer
CEO & Managing Partner
Semantic Web Company /
PoolParty Semantic Suite
Taxonomy Boot Camp 2017
Washington...
INTRODUCTION
2
Semantic Web
Company
founder &
CEO of
Andreas
Blumauer
developer and
vendor of
2004
founded
6.0
current
Ver...
Agenda
▸ Cognitive Computing:
Semantic Technologies & Machine Learning
▸ Terms, Concepts, Shadow Concepts
▸ Corpus Analysi...
Cognitive
Computing
Combining Semantic Technologies
With Machine Learning
4
A key
assumption
of this talk
People do not search
for documents only,
they seek facts about
things and smaller
chunks of ...
A quick
question at the
beginning
Will Artificial
Intelligence
make
Subject Matter
Experts
obsolete?
6 Imagine you want to...
How Semantic
Computing
and Machine
Learning
complement
each other
7
Structured Data
Machine
Learning
Cognitive
Applications
How Semantic
Computing
and Machine
Learning
complement
each other
8 Unstructured Data
Structured Data
Machine
Learning
Cog...
How Semantic
Computing
and Machine
Learning
complement
each other
9 Unstructured Data
Structured Data
Knowledge Graphs
Mac...
Towards a
Digital Twin
Proposal for a
Cognitive
Computing
Platform
Architecture
10 Unstructured Data
Structured Data
Knowl...
Terms, Concepts,
Shadow Concepts
How to make sense of text and data
11
Terms and
co-occurence
models
12
Document
Corpus
- Websites
- PDF, Word, …
- Abstracts from
DBpedia
- RSS Feeds
Term 8
Ter...
‘Things’ but not Strings:
Using a ‘Semantic Knowledge Graph’
http://www.my.com/
taxonomy/62346723
prefLabel
Retina
image
h...
Shadow Concepts
Use co-occurences
between concepts
and terms to
extract ‘shadow
concepts’
14 This site is a
15th-century I...
Corpus Analysis
Use PoolParty for Deep Text Analysis
15
Bionics
How do we learn
from a lot of text?
16 Bla bla
bla bla.
Bla bla
bla bla
The stove is on.
The stove is hot!
Ontolog...
Graphs +
Machine Learning
PoolParty as a
supervised
learning system
17 Content Manager
Integrator
Taxonomist/
Ontologist
T...
Knowledge
graphs as a
result of
human-machine
cooperation
18 Manually created parts of graph
Supervised learning
Automatic...
PoolParty
Corpus Analysis
How taxonomists
can extend
taxonomies with
some help from
machine learning
algorithms
19
Candida...
Network-based
Knowledge
Graph
Assessment
Thesaurus
Harmonizer
20 ▸ Find missing relationships between
concepts, which are ...
Use Cases
Benefit from Semantic Knowledge Graphs
and Machine Learning
21
PoolParty
Extractor
Extract concepts
from text even if
not used explicitly
22
Some domains use text that doesn’t always ca...
PoolParty
Semantic
Classifier
Text Classification
based on Machine
Learning and
Semantic
Knowledge Models
23
PoolParty Sem...
Benchmarking
the PoolParty
Semantic
Classifier
Improvement of
5.2% compared
to traditional
(term-based)
SVM
24
Features us...
Sample
Calculation
Based on an
improvement of
5.2%
25
Inbound
Documents
PoolParty
Semantic
Classifier
Experienced
Agent
● ...
Use Shadow
Concepts to
improve
Recommender
Systems
26
Mini Countryman
And it’s probably more of a
crossover than ever, wit...
Use a Knowledge
Graph +
Co-occurences for
precise Content
Recommendation
27 RavingDe-Void
Scott
attack
Stilinski
friend
sh...
Rules-based
Recommender
Systems
Example:
Wine-to-Cheese
Harmonizer
Live Demo
28 Dry
Medium-bodied
High acidity
Weingut
Wei...
Why ‘The Knot’
uses Machine
Learning and
Semantic
Models
29 ▹ XO Group runs ‘The Knot’
since 1996
▹ NYSE: XOXO (S&P 600
Co...
Thank you for
your interest!
Andreas Blumauer
CEO, Semantic Web Company
▸ Mail andreas.blumauer@semantic-web.com
▸ Company...
Upcoming SlideShare
Loading in …5
×

Leveraging Taxonomy Management with Machine Learning

0 views

Published on

Machines learn better with Semantics!

See how taxonomy management and the maintenance of knowledge graphs benefit from machine learning and corpus analysis, and how, in return, machine learning gets improved when using semantic knowledge models for further enrichment.

Published in: Data & Analytics
  • Be the first to comment

Leveraging Taxonomy Management with Machine Learning

  1. 1. Andreas Blumauer CEO & Managing Partner Semantic Web Company / PoolParty Semantic Suite Taxonomy Boot Camp 2017 Washington, DC Leveraging Taxonomy Management With Machine Learning
  2. 2. INTRODUCTION 2 Semantic Web Company founder & CEO of Andreas Blumauer developer and vendor of 2004 founded 6.0 current Version active at based on Vienna located part of Enterprise Knowledge Graphs manages standard for part of enriches >200serves customers editor of Taxonomies is about Ontologies standard for graduates Text Mining used for
  3. 3. Agenda ▸ Cognitive Computing: Semantic Technologies & Machine Learning ▸ Terms, Concepts, Shadow Concepts ▸ Corpus Analysis & (Shadow) Concept Extraction with PoolParty ▸ A comparison with LSA and Word2Vec ▸ Use Cases ▹ Document Annotation & Indexing ▹ Text Classification (incl. Benchmarks) ▹ Recommender Systems (incl. Use Case) 3
  4. 4. Cognitive Computing Combining Semantic Technologies With Machine Learning 4
  5. 5. A key assumption of this talk People do not search for documents only, they seek facts about things and smaller chunks of information. Machines shall help to create links across data silos to give answers to questions. 5 Converging A.I. Technologies
  6. 6. A quick question at the beginning Will Artificial Intelligence make Subject Matter Experts obsolete? 6 Imagine you want to build an application that helps to identify patients and treatments pairings. Which will you prefer? Applications solely based on machine learning, those ones which are based on doctors' knowledge only, or a combination of both?
  7. 7. How Semantic Computing and Machine Learning complement each other 7 Structured Data Machine Learning Cognitive Applications
  8. 8. How Semantic Computing and Machine Learning complement each other 8 Unstructured Data Structured Data Machine Learning Cognitive Applications
  9. 9. How Semantic Computing and Machine Learning complement each other 9 Unstructured Data Structured Data Knowledge Graphs Machine Learning Cognitive Applications
  10. 10. Towards a Digital Twin Proposal for a Cognitive Computing Platform Architecture 10 Unstructured Data Structured Data Knowledge Graphs Machine Learning Semantic Layer IoT & Cognitive Applications
  11. 11. Terms, Concepts, Shadow Concepts How to make sense of text and data 11
  12. 12. Terms and co-occurence models 12 Document Corpus - Websites - PDF, Word, … - Abstracts from DBpedia - RSS Feeds Term 8 Term 3 Term 7 Term 8 Term 6 Term 9 Term 5 Term 10 - Relevant terms and phrases - Relevancy of terms - co-occurence between terms and terms Term 1 Term 4 Term 2
  13. 13. ‘Things’ but not Strings: Using a ‘Semantic Knowledge Graph’ http://www.my.com/ taxonomy/62346723 prefLabel Retina image http://www.my.com/ images/90546089 http://www.my.com/ taxonomy/ 97345854 prefLabel Funduscope altLabel Ophthalmoscope http://www.mycom.com /taxonomy/4543567 prefLabel Diagnostic Equipment has broader
  14. 14. Shadow Concepts Use co-occurences between concepts and terms to extract ‘shadow concepts’ 14 This site is a 15th-century Inca site located 2,430 metres above sea level. It is located in Cusco, Peru. It is situated on a mountain ridge above the Sacred Valley through which the Urubamba River flows. Most archaeologists believe that it was built as an estate for the Inca emperor Pachacuti. Often mistakenly referred to as the "Lost City of the Incas", it is the most familiar icon of Inca civilization. The Incas built the estate around 1450, but abandoned it a century later at the time of the Spanish Conquest. Inca site Machu Picchu Cusco Inca empire Inca emperor Peru Spanish Conquest Sacred Valley Chankas Lost City Pachacuti In addition to explicitly used concepts and terms, Machu Picchu is extracted from the article as a Shadow Concept. As a prerequisite, one has to provide and analyze a representative text corpus first. Example:
  15. 15. Corpus Analysis Use PoolParty for Deep Text Analysis 15
  16. 16. Bionics How do we learn from a lot of text? 16 Bla bla bla bla. Bla bla bla bla The stove is on. The stove is hot! Ontological model → reasoningTaxonomical model → is-a abstractions Bla stove bla bla. Bla bla bla hot Switched on devices are dangerous devices. The stove is on. The stove is hot! Statistical model/cooccurences → is related The stove is on. The stove is hot! Switched on devices are dangerous, only if the operating temperature is above 100 degrees and the automatic shutdown mechanism is broken. Bla bla bla bla. Bla bla bla bla
  17. 17. Graphs + Machine Learning PoolParty as a supervised learning system 17 Content Manager Integrator Taxonomist/ Ontologist Thesaurus Server Extractor PowerTagging uses API is user of is user of is basis of is basis of Index annotates enriches Corpus Learning/ Semantic Analysis CMS extends is basis of analyzes uses API proposes extensions
  18. 18. Knowledge graphs as a result of human-machine cooperation 18 Manually created parts of graph Supervised learning Automatically created parts of graph (corpus analysis, RDF transformation, machine learning, ….)
  19. 19. PoolParty Corpus Analysis How taxonomists can extend taxonomies with some help from machine learning algorithms 19 Candidate Concepts derived from sample documents can be easily integrated into taxonomy. A list of possible Candidate Concepts is shown per document or as a list of most relevant candidates per corpus. Context of a given taxonomy concept can be visualised with a few mouse-clicks. Terms, concepts and shadow concepts can be high-lighted per document.
  20. 20. Network-based Knowledge Graph Assessment Thesaurus Harmonizer 20 ▸ Find missing relationships between concepts, which are of high semantic relevance ▸ Point out structural flaws in existing thesauri ▸ Identify corpora that only reflect a fraction of a thesaurus ▹ Or, vice versa: identify thesauri that are far too big for their domain applications, and possibly missing details
  21. 21. Use Cases Benefit from Semantic Knowledge Graphs and Machine Learning 21
  22. 22. PoolParty Extractor Extract concepts from text even if not used explicitly 22 Some domains use text that doesn’t always call a spade a spade. With ‘shadow concept extraction’ those ‘masked’ concepts still can be surfaced. Since these technologies would have become conventional technologies that are made into products and introduced into market at the time of their introduction, it would be difficult to differentiate them as innovative environmental and energy technologies from other global warming prevention technologies that have already been put to practical use in the industrial, commercial, residential, and energy conversion sectors. - The Innovative Global Warming Prevention Technology Working Group under the Research and Development Subcommittee - Council assessed that innovative global warming prevention technologies would bring about a reduction effect of 7.49 million t-CO2 case of average emissions factor for all power sources of carbon dioxide in 2010. In view of the difficulty in putting innovative carbon dioxide sequestration technology into practical use by 2010, the Working Group reassigned it as an issue of global warming prevention technology to be tackled by 2030. The Central Environment Council, however, has not had the opportunity to examine the contents of these technologies in detail. (Promotion of climate change prevention activities by every social actor) - The Programme encourages every social actor to take actions to prevent global warming. The actions include measures undertaken by the public sector. Climate Change Since these technologies would have become conventional technologies that are made into products and introduced into market at the time of their introduction, it would be difficult to differentiate them as innovative environmental and energy technologies from other global warming prevention technologies that have already been put to practical use in the industrial, commercial, residential, and energy conversion sectors. - The Innovative Global Warming Prevention Technology Working Group under the Research and Development Subcommittee - Council assessed that innovative global warming prevention technologies would bring about a reduction effect of 7.49 million t-CO2 case of average emissions factor for all power sources of carbon dioxide in 2010. In view of the difficulty in putting innovative carbon dioxide sequestration technology into practical use by 2010, the Working Group reassigned it as an issue of global warming prevention technology to be tackled by 2030. The Central Environment Council, however, has not had the opportunity to examine the contents of these technologies in detail. (Promotion of climate change prevention activities by every social actor) - The Programme encourages every social actor to take actions to prevent global warming. The actions include measures undertaken by the public sector. Climate Change
  23. 23. PoolParty Semantic Classifier Text Classification based on Machine Learning and Semantic Knowledge Models 23 PoolParty Semantic Classifier combines machine learning algorithms (SVM, Deep Learning, Naive Bayes, etc.) with Semantic Knowledge Graphs.
  24. 24. Benchmarking the PoolParty Semantic Classifier Improvement of 5.2% compared to traditional (term-based) SVM 24 Features used Classifier F1 (5 folds) Variance Terms LinearSVC 0.83175 0.0008 Concepts from REEGLE + Shadow Concepts LinearSVC 0.84451 0.0011 Concepts from REEGLE LinearSVC 0.84647 0.0009 Terms + Concepts from REEGLE + Shadow Concepts LinearSVC 0.87474 0.0009 Reegle thesaurus A comprehensive SKOS taxonomy for the clean energy sector (http://data.reeep.org/thesaurus/guide) ● 3,420 concepts ● 7,280 labels (English version) ● 9,183 relations (broader/narrower + related) Document Training Set 1.800 documents in 7 classes Renewable Energy, District Heating Systems, Cogeneration, Energy Efficiency, Energy (general), Climate Protection, Rural Electrification
  25. 25. Sample Calculation Based on an improvement of 5.2% 25 Inbound Documents PoolParty Semantic Classifier Experienced Agent ● 100,000 documents (emails, tickets, etc.) per month ● 5 Euros extra costs per document when misrouted ● Cost savings per year: ○ 1,200.000 x €5.0 x 0.052 = € 312,000 per annum
  26. 26. Use Shadow Concepts to improve Recommender Systems 26 Mini Countryman And it’s probably more of a crossover than ever, with the design to match, Being a Mini, the Countryman is clearly meant to be the driver’s car among small crossovers. The suspension is sophisticated, and there are lots of chassis options (a stiffer sports setup, variable damping, the electronically controlled ALL4 all-wheel-drive). But it’s also the crossover for people who’ve bags of cash to blow on personalisation and luxury. There’s been a lot of effort on ramping up the cabin quality, but then the outgoing Countryman was a sad let-down in that department. On the outside, plastic wheel-arch extensions, with eyebrow creases in the metalwork above, as well as roof bars and sill protectors all add to the visual crossover-ness. This remains the only Mini with angular rather than oval headlamps, and there’s a load of visual posturing going on in the lower face. There are eight versions at launch, and they’re exactly what you’d expect. It’s Cooper or Cooper S, each fuelled by petrol or diesel, each of them with front drive or ALL4. Oh and an eight-speed auto, too, if you count that as a separate choice. The Cooper petrol is a three-cylinder, the rest fours. You get extra kit as standard versus the old car, including navigation, Bluetooth, emergency call and park sensors. Upgrades include a bigger touch-screen nav with high-definition traffic, various posher seats, a HUD, and driver aids. Oh and a cushion thingy that folds out from the boot so you can sit on the rear bumper without getting your clothes mucky. In June 2017 a Cooper E will launch, which has the Cooper three-cylinder petrol driving the front wheels, and an electric motor for the rears, with a capacity to do a claimed 25 miles of gentle all-electric running. So it has the performance of a Cooper S ALL4 with the tax-busting advantages of a plug-in hybrid. And you wouldn’t use any fuel if you commuted a short distance. The platform is BMW’s contemporary transverse-engined hardware, in the bigger of its two sizes. That means it shares a lot with the BMW X1. The 4WD system is more sophisticated than the previous Countryman’s. The proportion of drive to the rear is computed by a controller that takes into account parameters including grip, steering angle and throttle position, as well as whether you’ve got the sports mode and sports traction systems selected.
  27. 27. Use a Knowledge Graph + Co-occurences for precise Content Recommendation 27 RavingDe-Void Scott attack Stilinski friend shame O’Brien woman married girl attractive Similarepisodes! love Example: Find similar episodes
  28. 28. Rules-based Recommender Systems Example: Wine-to-Cheese Harmonizer Live Demo 28 Dry Medium-bodied High acidity Weingut Weinrieder Grüner Veltliner Alte Reben is characterized by Nutmeg Full-bodied Warm finish Tobacco is characterized by Nagelkaas Cumin Clove Hard cheese Higher fat ? is characterized by matches matches does not match
  29. 29. Why ‘The Knot’ uses Machine Learning and Semantic Models 29 ▹ XO Group runs ‘The Knot’ since 1996 ▹ NYSE: XOXO (S&P 600 Component) ▹ 1.5 million active members ▹ The Knot has helped marry 25 million couples ▹ Partnering with 300,000 wedding vendors ▹ Millions of vendor reviews
  30. 30. Thank you for your interest! Andreas Blumauer CEO, Semantic Web Company ▸ Mail andreas.blumauer@semantic-web.com ▸ Company https://www.semantic-web.com ▸ LinkedIn https://www.linkedin.com/in/andreasblumauer ▸ Twitter https://twitter.com/semwebcompany ▸ Blog https://www.linkedin.com/today/ author/andreasblumauer 30 © Semantic Web Company - http://www.semantic-web.com and http://www.poolparty.biz/

×