Text Mining in PoolParty Semantic Suite

Martin Kaltenböck
CFO, Semantic Web Company
Timea Turdean
Technical Consultant, SWC
POOLPARTY
SEMANTIC
SUITE
AIMS Webinar
21st Sept 2017
1

PoolParty
Drupal
Integration
2
Agenda
▸ Introduction Semantic Web Company (SWC)
▸ Introduction PoolParty Semantic Suite
▸ Using PoolParty for Text & Data Mining
▹ Text Mining for continuous knowledge graph modelling
▹ Entity linking and data integration
▹ Classification and semantic annotation / tagging
▸ DEMO(s) of text mining capability of PoolParty
▸ Customer Success Stories
▹ REEEP ClimateTagger
▹ healthdirect Australia
▹ CTCN Semantic Search
▹ EIP Water Matchmaking
▸ Q&A Session

INTRODUCTION
Semantic Web Company &
PoolParty Semantic Suite
3

INTRODUCING
SEMANTIC
WEB COMPANY
Semantic Web Company (SWC)
▸ Founded in 2004
▸ Based in Vienna
▸ Privately held
▸ 40+ employees, experts in text
mining & linked data
▸ ~15-20% revenue growth / year
▸ 2.5 Mio Euro funding for R&D
▸ SWC named to KMWorld’s 2017
‘100 Companies That Matter in
Knowledge Management’
▸ Organising SEMANTiCS
conference series for 13 years
▸ https://www.semantic-web.com
4

INTRODUCING
POOLPARTY
PoolParty Semantic Suite
▸ First release in 2009
▸ Current version 6.0
▸ W3C standards compliant
▸ Over 200 installations
worldwide
▸ 50% of revenue is reinvested
into PoolParty development
PoolParty on-premises or
used as a cloud service
▸ KMWorld listed PoolParty as
Trend-Setting Product 2015,
2016 and 2017
▸ https://www.poolparty.biz/
5

SELECTED
CUSTOMER
REFERENCES
AND PARTNERS
SWC head-
quarters
6
Customer References
● Credit Suisse
● Boehringer Ingelheim
● Roche
● adidas
● The Pokémon Company
● Canadian Broadcasting Corporation
● Harvard Business School
● Wolters Kluwer
● Talend
● HealthStream
● TC Media
● Techtarget
● Seek
● Alliander N.V.
● Pearson - Always Learning
● Education Services Australia
● American Physical Society
● Healthdirect Australia
● World Bank Group
● Inter-American Development Bank
● Renewable Energy Partnership
● Wood MacKenzie
● Oxford University Press
● International Atomic Energy Agency
● Norwegian Directorate of Immigration
● Ministry of Finance (AT)
● Council of the E.U.
● Australian National Data Service
Partners
● Accenture
● EPAM Systems
● Enterprise Knowledge
● Mekon Intelligent Content Solutions
● B-S-S Business Software Solutions
● MarkLogic
● Wolters Kluwer
● Digirati
● Quark
US
East
US
West
AUS/
NZL
UK

MAKE USE OF
POOLPARTY
SEMANTIC
SUITE
OVERVIEW
7

TECHNICAL
CORE
COMPONENTS
8
Bain Capital is a venture capital
company based in Boston, MA.
Since inception it has invested in
hundreds of companies including AMC
Entertainment, Brookstone, and Burger
King. The company was co-founded by
Mitt Romney.
Taxonomy &
Ontology Server
Entity Extractor &
Text Mining
Data Integration &
Data Linking
Unstructured
Data
Semi-
structured
Data
Structured
Data
Unified
Views
PoolParty
GraphSearch
Identify new
candidate concepts
to be included in a
controlled vocabulary
Controlled vocabularies
as a basis for highly
precise entity
extraction
Entity Extractor informs
all incoming data
streams about its
semantics and links them
Schema mapping
based on ontologies
RDF
Graph Database

PoolParty
Semantic Suite
System
Architecture
Overview
9

360-degree
views over
various
content
repositories
10

‘Elevator
Pitch’
▸ Built as a ‘Semantic Middleware’
▸ Outstanding user-friendliness
▸ Fully standards-compliant
▸ Highly precise entity extraction
▸ Comprehensive API
▸ Excellent maintainability of extraction models
▸ Integrated with leading search engines & graph databases
▸ Integrated with leading content management platforms
▸ Product configuration options for growing requirements
▸ Highly expertised partners / service team
11

Product
Overview
All products are
available as
cloud services or
for on-premise
installation
> PoolParty
Feature & Price
Matrix
12
PoolParty
Basic
Server
PoolParty
Advanced
Server
PoolParty
Enterprise
Server
PoolParty
Semantic
Integrator
SKOS Taxonomy Management
Multiple Projects
Taxonomy Rest API
Import/Export (incl. Excel)
Rollback and History
Ontologies and Custom Schemes
Quality Management & Reports
Advanced Corpus Management
Vocabulary Mapping, Linked Data Mapping
Linked Data Enrichment, Frontend, and SPARQL endpoint
Entity Extractor Extractor API
Auto Populate project from DBpedia
Export to Remote Repository
Workflow Management
SKOS-XL (optional)
Integration with Graph databases
Integration with Search engines
Data linking & mapping
Data transformation pipelines with UnifiedViews
Graph Search Server

HOW DOES
THIS WORK
Taking a look
under the hood
13

BASIC PRINCIPLES
Benefiting from the Semantic Web
in a Nutshell
14

Four-layered
Content
Architecture
15

Metadata and
semantic data
16
The Peggy Guggenheim Collection
is a modern art museum on the
Grand Canal in the Dorsoduro
sestiere of Venice, Italy. It is one of
the most visited attractions in
Venice. The collection is housed in
the Palazzo Venier dei Leoni, an
18th-century palace, which was
the home of the American heiress
Peggy Guggenheim for three
decades. She began displaying her
private collection of modern
artworks to the public seasonally
in 1951. After her death in 1979, it
passed to the Solomon R.
Guggenheim Foundation, which
eventually opened the collection
year-round.

Metadata and
semantic data
17
year-round.
Peggy Guggenheim
Peggy Guggenheim
Collection
Venice
Canale
Grande
http://my.com/resource/328832
skos:preLabel
http://my.com/docs/45367
skos:preLabel
skos:preLabel
skos:preLabel

Metadata and
semantic data
18
year-round.
Peggy Guggenheim
Peggy Guggenheim
Collection
Venice
museum
Canale
Grande
skos:preLabel
skos:preLabel
skos:preLabel
skos:preLabel
skos:preLabel
http://www.mycom.com/
images/90546089
imgae
has ladmark
named after
hosted in
hosted in
has

Metadata and
semantic data
19
year-round.
Peggy Guggenheim
Collection
dct:title
Mike Miller
Michael Miller
skos:prefLabel
skos:altLabel
dct:creator
http://my.com/people/32schema:Article
rdf:type
http://my.com/img/99.jpg
schema:image
skos:subject
Peggy Guggenheim
Collection Venice
museum
skos:prefLabel
skos:subject
skos:altLabel
skos:broader
skos:prefLabel
schema:image
Canale
Grande
skos:prefLabel

Resolving Language Problems
“While most people can deal with
linguistic features as synonyms,
homographs, polyhierarchies,
and even with far more peculiar
characteristics of natural
languages, machines often
struggle with automatic sense-
making because of the lack of a
semantic knowledge model that
can be used programmatically.”

Knowledge Graph
Text Mining for
knowledge graph development
21

PoolParty
Extractor
Uses several components of a knowledge model:
▸ Taxonomies based on the SKOS standard
▸ Ontologies based on RDF Schema or OWL
▸ Word form dictionaries
▸ Blacklists and stop word lists
▸ Disambiguation settings
▸ Domain-specific reference document corpus
▸ Statistical language model
22

PoolParty’s
SKOS editor
23
The Audi Q3 is a compact
crossover SUV made by
Audi.
It is based on the PQ35
platform of Volkswagen.
A5 platform
A series

PoolParty’s
ontology and
custom
schema
management
24
Taxonomy
Ontology
Ontology 1
from library
Ontology 2
(imported)
Ontology 3
(custom-made)
Custom Schema

‘Setting the
rules’ for text
mining & entity
extraction via
thesaurus
25
Proper use of an funduscope
requires a bit of practice and
familiarity with the functions of
your device.
Diagnostic Equipment
Ophtalmoscope

Corpus
analysis results
in a network of
concepts and
terms
28
I need support to
continuously extend our
taxonomy / controlled
vocabulary!
skos:
Concept
Reference
Corpus
- Websites
- PDF, Word, …
- Abstracts from
DBpedia
- RSS Feeds
skos:
Concept
skos:
Concept
Term 1
Term 3
Term 7
Term 8
Term 6
Term 4
Term 2
Term 5
- Relevant terms and phrases
- Relevancy of concepts
- co-occurence between concepts and terms
- co-occurence between terms and terms

Semantic Annotation
Classification and Semantic
Annotation / Tagging
29

Entity
Extraction
based on
Knowledge
Graphs
30

PoolParty as a
supervised
learning
system
31
Content Manager
Integrator
Taxonomist/
Ontologist
Thesaurus
Server
Extractor
PowerTagging
uses API
is user of
is user of
is basis of
is basis of
Index
annotates
enriches
Referenc
e Corpus
CMS
extends
is basis of
analyzes
uses API

Data Integration
Mapping and Linking of Data
32

PoolParty
Semantic
Integrator -
at a glance
https://youtu.be/l_LppfS3wxk
33
Deep Data
Analytics
Semantic
Search
Semantic
Integrator
Unstructured
Data
Structured
Data
ETL / Monitoring / Scheduling

PoolParty
Semantic
Integrator
High-level
architecture
34

DEMO(s)
… lets see how it works in action
35

PoolParty Thesaurus Manager
● SKOS editor
● Ontology and custom scheme manager
PoolParty PowerTagging for Drupal (backend)
● Automated Tagging
● Manual Tagging
● Configuration of modules
PoolParty GraphSearch for Drupal (frontend)
● Semantic Search
● Explore Trends & Sentiments
● Facets and Similarity
36
DEMOS

Drupal and
PoolParty
at a Glance
37
PoolParty Drupal Integration Demo: http://drupal.poolparty.biz/

USE CASES
Success Stories about Text Mining and Linked Data
using PoolParty Semantic Suite
38

Use Cases:
Text Mining &
Linked Data
▸ Climate Tagger (PDF)
Streamline and catalogue data and information resources
▸ healthdirect Australia (PDF)
Semantic Search based on the Australian Health Thesaurus
▸ CTCN Semantic Search
Integrating thousands of documents from several sources on climate technology
▸European Innovation Partnership /EIP) on Water
Online Marketplace including semantic Matchmaking
39

Place your screenshot here
40
Climate Tagger
Help organizations in the
climate and development
arenas catalogue, categorize,
contextualize, and connect data
and information resources.
Climate Tagger is backed by the
expansive Climate Compatible
Development Thesaurus.
http://www.climatetagger.net

42
EIP Water
Matchmaking
Controlled vocabularies enable
accurate matchmaking
between Supply and Demand
for Water Innovation in Europe.
Matchmaking is based upon
the EIP Water Innovation
Thesaurus (GEMET based).
http://www.eip-water.eu

43
CTCN Semantic
Search
Help organisations in the climate
technology field to explore and find
relevant content from thousands of
Drupal Nodes and several sources
using PoolParty, PowerTagging and
s0nr webmining
CTCN is backed by the CTCN
Climate Technology Thesaurus.
https://www.ctc-n.org/semantic-search

44
healthdirect
Australia
Integrated views and
semantic search over more
than 100 trusted sources.
Harmonization of various
metadata systems through
the use of a central
vocabulary hub:
Australian Health Thesaurus.
http://www.healthdirect.gov.au

SUMMARY
WHY
TAXONOMISTS
AND
INFORMATION
ARCHITECTS
LIKE
POOLPARTY
Read more
Different project stakeholders expect specific
qualities from a semantic technology platform:
45
I am a taxonomist. I need a tool that
provides convenient functionalities and
intuitive user interfaces for my daily work.
I am an information architect. Enterprise
metadata management deserves scalable
technologies, which provide semantic services
on top of rich APIs based on standards.

PoolParty
Academy
Get certified!
46
https://www.poolparty.biz/academy/

GET STARTED
47
Get your test account at
www.poolparty.biz

CONNECT
Timea Turdean
Technical Consultant, SWC
▸ timea.turdean@semantic-web.com
▸ https://www.linkedin.com/in/timeaturdean/
▸ https://twitter.com/poolparty_team
48
© Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/
Martin Kaltenböck
CFO, Semantic Web Company
▸ m.kaltenboeck@semantic-web.at
▸ https://www.linkedin.com/in/martinkaltenboeck
▸ https://twitter.com/semwebcompany
▸ https://blog.semantic-web.at/

Text Mining in PoolParty Semantic Suite

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Text Mining in PoolParty Semantic Suite

Similar to Text Mining in PoolParty Semantic Suite (20)

More from Martin Kaltenböck

More from Martin Kaltenböck (20)

Recently uploaded

Recently uploaded (20)

Text Mining in PoolParty Semantic Suite

Editor's Notes