PoolParty Semantic Classifier

Andreas Blumauer
CEO & Managing Partner
Semantic Web Company /
PoolParty Semantic Suite
PoolParty Semantic Classifier
Bringing Machine Learning, NLP
and Knowledge Graphs together

Introduction
Semantic Web Company (SWC)
▸ Founded in 2004
▸ Based in Vienna
▸ Privately held
▸ 40+ FTE
▸ Experts in NLP, Semantics and
Machine learning
▸ ~30% growth/year
▸ 2.5 Mio Euro funding for R&D
▸ SWC named to KMWorld’s
‘100 Companies That Matter in
Knowledge Management’ in
2016 and 2017
▸ www.semantic-web.com
2 PoolParty Semantic Suite
▸ First release in 2009
▸ Current version 6.2
▸ W3C standards compliant
▸ Over 200 installations
world-wide
▸ 50% of revenue is reinvested
into PoolParty development
▸ PoolParty on-premises or
used as a cloud service
▸ KMWorld listed PoolParty as
Trend-Setting Product 2015,
2016 and 2017
▸ www.poolparty.biz

Agenda
3
Semantic
AI
▸ Introduction to Semantic AI
▹ Machine Learning, NLP and
Knowledge Graphs
▹ Current status of Artificial Intelligence?
▹ What is Semantic AI?
▸ PoolParty Semantic Classifier
▹ How does it work?
▹ Benchmarks
▹ Integration Scenarios
▸ Use Cases
▹ Overview
▹ Business Case
▹ Example: Issue Classifier

SEMANTIC
ARTIFICIAL INTELLIGENCE
#SemanticAI: Bringing Machine Learning,
NLP and Knowledge Graphs together
4

ArtificiaI
Intelligence -
An overview
5
Artificial
Intelligence (AI)
Artificial Neural
Network (ANN)
Symbolic AI
(GOFAI*)
Sub-Symbolic AI Statistical AI
Knowledge graphs &
reasoning
Natural Language
Processing (NLP)
Machine Learning
* Good old-fashioned AI
Word Embedding
(Word2Vec)
Deep Learning
(DNN)
Natural Language
Understanding
Entity Recognition
& Linking
Knowledge
Extraction
Semantic enhanced
Text Classification

What makes
someone an
intelligent
being?
Assessment of
the current status
of Artificial
Intelligence
Level Example Typical Problems Questions
(6)
Create
Convert an "unhealthy" recipe
for apple pie to a "healthy"
recipe by replacing your choice
of ingredients.
- Create a new product or point of view
- Combine elements in a new pattern
- Propose alternative solutions
How would you improve …?
Can you formulate a theory for …?
Can you predict the outcome if …?
(5)
Evaluate
Which kinds of knowledge
models are best for machine
learning, and why?
- Judge the value of material (statement,
poem, research report) for a given
purpose
- Defend opinions
What is your opinion of …?
How would you prioritize …?
What would you use to support the
view …?
(4)
Analyse
How does a graph database
and a semantic knowledge
model work together?
- Recognise organizational principles
- Identify parts
- Understand relationships between
parts
How is ... related to ...?
What is the function of ...?
What conclusions can you draw ...?
(3)
Apply
How can taxonomies be used
to enhance machine learning?
- Apply facts, rules, principles, and
theories
- Use learned material in new and
concrete situations
Why is … significant?
How is … an example of …?
What elements would you use to
change …?
(2)
Understand
What is the difference between
an ontology and a taxonomy?
- Understand facts and ideas
- Classify objects and summarise text
- Grasp the meaning of material
What is the difference between …?
What is the main idea of …?
Which statements support …?
(1)
Remember
Who is the inventor of the
World Wide Web?
- Recognise facts, terms, and basic
concepts
- Recall of a wide range of material,
from specific facts to complete theories
Who is …?
Where is …?
Why did …?
6
Bloom’s Taxonomy: Classify cognitive processes

Remember
Knowledge Graphs
& Knowledge
Extraction
7
Perth
Australia
Perth is one of the
most isolated
major cities in the
world, with a
population of
2,022,044 living
in Greater Perth.
Australia is a
member of the
OECD, United
Nations, G20,
ANZUS, and the
World Trade
Organisation.
Country
City
is a
is a
is located in
Avoid illogical answers:
Support complex Q&A:
distance between
Which cities located in the
Commonwealth of Nations
have a population of more
than 2 mio. people?
Commonwealth
of Nations
International
Organisation
is part of
is a

Remember
Knowledge Graphs
& Knowledge
Extraction
Knowledge Graphs (KG) can cover
general knowledge (often also
called cross-domain or
encyclopedic knowledge), or
provide knowledge about special
domains such as biomedicine.
In most cases KGs are based on
Semantic Web standards, and have
been generated by a mixture of
automatic extraction from text or
structured data, and manual
curation work.
Examples:
▸ DBpedia
▸ Google Knowledge Graph
▸ YAGO
▸ OpenCyc
▸ Wikidata
8 Who is the inventor of the World Wide Web?

The Semantic
Web
A standards-based
graph of
knowledge graphs
9
Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/

“Understand”
Google Featured
Snippets based on
Sentence
Compression
Algorithms
(based on DL)
To train Google’s artificial Q&A brain, the
company uses old news stories, where
machines start to see how headlines
serve as short summaries of the longer
articles that follow. But for now, the
company still needs its team of PhD
linguists.
Spanning about 100 PhD linguists across
the globe, the Pygmalion team produces
“the gold data,” while the news stories
are the “silver.” The silver data is still
useful, because there’s so much of it. But
the gold data is essential.
WIRED article
10 What is the difference between an ontology and a taxonomy?

“Create”
Example for
DL-based
“creativity” Aiva’s compositions still require human input with regards to
orchestration and musical production. In fact, Aiva’s creators envisage
a future where man and machine will collaborate to fulfill their
creative potential, rather than replace one another.
http://www.aiva.ai/
11 After having listened to a large
amount of music and learned its own
models of music theory, Aiva
composes its very own sheet music.
These partitions are then played by
professional artists on real
instruments in a recording studio,
achieving the best sound quality
possible.

What is
Semantic AI?
12
Artificial Intelligence
ANN
Symbolic AISub-Symbolic AI Statistical AI
KR & reasoning
NLP
Machine Learning
Word Embedding Deep Learning
Natural Language
Understanding
Entity Recognition &
Linking
Knowledge Extraction
Semantic enhanced
Text Classification
In Semantic AI, various methods from
Symbolic AI are combined with
machine learning methods, and/or
neuronal networks.
Examples:
● Semantic enrichment of
text corpora to enhance
word embeddings
● Extraction of semantic features
from text to improve ML-based
classification tasks
● Combine ML-based with
Graph-based entity extraction
● Knowledge Graphs as a Data
Model for Machine Learning
● ….

Knowledge
Graphs as a
Data Model for
Machine Learning
These transformations can result in loss of information and introduce bias. To solve this problem, we
require machine learning methods to consume knowledge in a data model more suited to represent
this heterogeneous knowledge. We argue that knowledge graphs are that data model.
Three examples for the benefits of using knowledge graphs:
▸ they allow for true end-to-end-learning,
▸ they simplify the integration of heterogeneous data sources and data harmonization,
▸ they provide a natural way to seamlessly integrate different forms of background knowledge.
Wilcke X, Bloem P, De Boer V. The Knowledge Graph as the Default Data Model for Machine Learning. Data Science. 2017 Oct 17;1-19. Available from, DOI:
10.3233/DS-170007
13 Traditionally, when faced with
heterogeneous knowledge in a
machine learning context, data
scientists preprocess the data
and engineer feature vectors so
they can be used as input for
learning algorithms (e.g., for
classification).

POOLPARTY
SEMANTIC CLASSIFIER
Bringing Machine Learning, NLP and
Knowledge Graphs together
14

PoolParty
Semantic
Classifier
In a Nutshell
15
PoolParty Semantic Classifier combines machine learning algorithms
(SVM, Deep Learning, Naive Bayes, etc.) with Semantic Knowledge Graphs.

PoolParty
Semantic
Classifier
Feature highlights
▸ Classification
▹ Select from various machine learning
algorithms such as SVM, Deep Learning
and Naive Bayes for content classification.
▹ Benefit from a rich feature set such as
terms, concepts, shadow concepts which
gives you more flexibility when training
classifiers.
▸ User Experience
▹ A user-friendly interface enables
non-technical experts to perform
classification tasks and benefit from
machine learning.
▸ Scalability
▹ Large content repositories can be
classified on top of a Spark cluster.
▸ Easy integration
▹ New resources can be classified via the
PoolParty API.
▹ With the GraphSearch plugin system, the
Classifier can be easily integrated for
semantic applications.
▸ Transparency
▹ The Classifier works on the principles of
‘Explainable AI’.
16

Extraction,
Categorization,
Classification
What is the
difference?
▸ Classification
▹ Implemented based on a training corpus and uses the
labels of the training corpus as classes
▸ Extraction
▹ Finding terms and concepts in text, using scoring
mechanisms to give an indication of their importance
▸ Categorization
▹ Uses only concepts and categorizes text based on a
thesaurus
17

How it works
18 1. Determine classes (labels)
2. Identify training documents per class
3. Create classifier
a. Pick machine learning method
b. Choose correct parameters
c. Determine used features
d. Train model
4. Evaluate results (Cross validation/F1)
5. Goto 2. and try to find better classification
method
6. Make use of the Classifier API

Explainable AI
Classifiers based on ML algorithms such as Deep Learning perform better when training data is
semantically enhanced. Additional features are derived from a controlled vocabulary, which also
make the used features more transparent to the Data Scientist.
19

Benchmarking
the PoolParty
Semantic
Classifier
Improvement of
5.2% compared
to traditional
(term-based)
SVM
20
Features used Classifier F1 (5 folds) Variance
Terms LinearSVC 0.83175 0.0008
Concepts from REEGLE + Shadow Concepts LinearSVC 0.84451 0.0011
Concepts from REEGLE LinearSVC 0.84647 0.0009
Terms + Concepts from REEGLE + Shadow Concepts LinearSVC 0.87474 0.0009
Reegle thesaurus
A comprehensive SKOS taxonomy
for the clean energy sector
(http://data.reeep.org/thesaurus/guide)
● 3,420 concepts
● 7,280 labels (English version)
● 9,183 relations (broader/narrower + related)
Document Training Set
1,800 documents in 7 classes
Renewable Energy, District Heating Systems,
Cogeneration, Energy Efficiency, Energy (general),
Climate Protection, Rural Electrification

The Classifier as
a component of
PoolParty
Semantic Suite
Most complete
Semantic
Middleware on
the Global Market
21
Bain Capital is a venture capital
company based in Boston, MA.
Since inception it has invested in
hundreds of companies including AMC
Entertainment, Brookstone, and Burger
King. The company was co-founded by
Mitt Romney.
Taxonomy &
Ontology Server
Entity Extractor &
Semantic Classifier
Data Integration &
Data Linking
Unstructured
Data
Semi-
structured
Data
Structured
Data
Unified
Views
PoolParty
GraphSearch
Identify new
candidate concepts
to be included in a
controlled vocabulary
Controlled vocabularies as a basis for
highly precise knowledge extraction
and text classification
Entity Extractor informs
all incoming data
streams about its
semantics and links them
Schema mapping
based on ontologies
RDF
Graph Database
Factsheet

Integration
scenarios in
PoolParty
Semantic Suite
22
▸ UnifiedViews DPU
▸ Recommender plugin in GraphSearch
▸ Document classifier in GraphSearch

USE CASES
Bringing Machine Learning, NLP and
Knowledge Graphs together
23

Examples for
use cases based
on PoolParty
Semantic
Classifier
▸ News classification
▹ Reduce manual effort of classifying inbound documents or news
▸ Recommender Services
▹ Identify appropriate agents in help desk systems
▹ Matchmaking between user(groups) and products/content
▸ Sentiment Analysis
▹ Improve customer retention management by precise sentiment analysis
▸ Enhanced Domain-specific Text Mining
▹ Complement rule-based systems for fraud detection
▹ Analyze judicial decisions
24

▸ To understand
▹ Content aboutness in a defined
framework
▹ Data relationships and context within a
unified organizational model
▹ Connections across disparate datasets
▸ To increase precision
▹ Hierarchical or other mapped
relationships allow for recommending
similar content when exact matches not
found
▹ Granularity allows for more specific
recommendations
▹ Consistency across structure results more
precise analysis and predictions
Source: Suzanne Carroll, Data Science Product Director at XO Group
Why Data
Scientists need
Semantic
Models
25

Business Case
Based on an
improvement of
5.2%
26
Inbound
Documents
PoolParty
Semantic
Classifier
Experienced
Agent
● 100,000 documents (emails, tickets, etc.) per month
● 5 Euros extra costs per document when misrouted
● Cost savings per year:
○ 1,200.000 x €5.0 x 0.052 = € 312,000 per annum

One question
at the end
Will Artificial
Intelligence
make
Subject Matter
Experts
obsolete?
28 Imagine you want to
build an application
that helps to identify
patients and
treatments pairings.
Which will you prefer?
Applications solely based on machine learning, those ones which
are based on doctors' knowledge only, or a combination of both?

Learn more and
ask for a demo
based on your
own data!
https://www.poolparty.biz/semantic-classifier/
29

Thank you for
your interest!
Andreas Blumauer
CEO, Semantic Web Company
▸ Mail andreas.blumauer@semantic-web.com
▸ Company https://www.semantic-web.com
▸ LinkedIn https://www.linkedin.com/in/andreasblumauer
▸ Twitter https://twitter.com/semwebcompany
▸ Blog https://www.linkedin.com/today/
author/andreasblumauer
30
© Semantic Web Company - http://www.semantic-web.com and http://www.poolparty.biz/

PoolParty Semantic Classifier

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PoolParty Semantic Classifier

Similar to PoolParty Semantic Classifier (20)

More from Semantic Web Company

More from Semantic Web Company (20)

Recently uploaded

Recently uploaded (20)

PoolParty Semantic Classifier