Welcome to
this webinar!
Andreas Blumauer
CEO of Semantic Web Company
About Semantic Web Company (SWC)
SWC was founded 2001, head-quartered in Vienna
25 experts in Linked Data technologies
PoolParty Suite based on RDF Graph Data Model
Serving customers from all over the world
EU- & US-based consulting services
Our Ecosystem: Customers & Partners
Some of our Customers
● Credit Suisse
● Boehringer Ingelheim
● Roche
● adidas
● The Pokémon Company
● Canadian Broadcasting Corporation (CBC)
● Red Bull Media House
● Wolters Kluwer
● TC Media
● Techtarget
● BMJ Publishing Group
● CafePress
● Pearson - Always Learning
● Education Services Australia
● American Physical Society
● Healthdirect Australia
● World Bank Group
● Inter-American Development Bank (IADB)
● Renewable Energy Partnership
● Wood MacKenzie
● Development Initiatives
● International Atomic Energy Agency (IAEA)
Finance / Automotive / Publisher / Health Care / Public Administration / Energy / Education
Selected Partners
● PwC
● EPAM Systems
● iQuest
● EBCONT
● Gravity Zero
● MarkLogic
● OpenLink Software
● Ontotext
● Wolters Kluwer
● Data to Value
● Digirati
● Term Management
● Altotech
We are all working on the
replacement of data chaos
by networking information
● Norwegian Directorate
of Immigration
● Ministry of Finance (A)
● Council of the E.U.
● Australian National
Data Service
PoolParty Core Modules
Bain Capital is a venture capital
company based in Boston, MA.
Since inception it has invested in
hundreds of companies including
AMC Entertainment, Brookstone,
and Burger King. The company was
co-founded by Mitt Romney.
Taxonomy &
Ontology Server
Entity Extraction &
Text Mining
Semantic Search,
Analytics & Visualization
Why Graph
Databases?
The Enterprise Perspective:
The End of the Document
“Life is no longer as
simple as making
PDF documents.”
John Walker
Business Analyst at NXP Semiconductors
The Enterprise Perspective:
Graph Databases are Smart Data Lakes
“Data in a large cooperation is
often scattered over various tools,
comes in different formats and with
different levels of quality.”
Fabian Heinemann
Data Scientist at Roche
The NPO Perspective:
Using common Definitions and Standards
“Very few datasets
tell a story
in isolation.”
The Data Manifesto
Development Initiatives
The warehouse approach
seems to be broken in a complex world
Data Warehouse
- structures and categories predefine
the kind of analysis that is possible
- excludes data to simplify the data
model
- does not efficiently handle new
types of data
- supports efficient indexing
- enforces consistency
- includes all data that may be used and
even data that may never be used
- all data regardless of source and structure
is kept
- data kept in its raw form and only
transformed when used
- handles structured and unstructured data
- data models emerge with usage over time
Data Lake
The Analyst’s Perspective: Data Lakes don’
t fix the problem of lacking semantics
“Organizations should focus on
semantic consistency and
performance in upstream applications
and data stores instead of information
consolidation in a data lake.”
Gartner
Beware of the Data Lake Fallacy
Data Lakes have all the information to
answer complex queries, but….
Country GDP Pop
AUS 1,560 23.14
SVE 580 9.60
WITH A COMBINED NUMBER of
357,100 registered asylum claims
in 2013, Germany, the United
States of America, France, Sweden
and Turkey were the top five
receiving countries, together
accounting for nearly six out of ten
asylum claims submitted in the 44
industrialized countries covered by
this report.
Place Asylum
seekers
Year
Australia 24,300 2013
Sweden 54,300 2013
Show me all reports, in which EU member countries
are mentioned with regards to their asylum politics,
which have more than 10 asylum-seekers per 1,000
inhabitants.
...taxonomies link constantly changing data
sources while analytic needs are evolving
Countries
European
Union
Sweden
SVE
France
FRA
Austria
AUT
Oceania
Country GDP Pop
AUS 1,560 23.14
SVE 580 9.60
Place Asylum
seekers
Year
Australia 24,300 2013
Sweden 54,300 2013
WITH A COMBINED NUMBER of
357,100 registered asylum claims
in 2013, Germany, the United
States of America, France, Sweden
and Turkey were the top five
receiving countries, together
accounting for nearly six out of ten
asylum claims submitted in the 44
industrialized countries covered by
this report.
Linked Data Warehouses are Smart Data Lakes
Data Lake
Data Warehouse
- supports efficient indexing
- enforces consistency
- handles structured &
unstructured data
- data models emerge
with usage over time
- standards-based
- unified data model
- powerful query language
What if questions
emerge when one
starts analyzing the
data?
The power of knowledge graphs:
Agility, flexibility, complexity
doc doc doc
Norway France Austria Canada
doc
Norway France Austria Canada
doc
Show me all
documents about
European countries
Traditional approach Graph-based approach
doc doc doc
The power of knowledge graphs:
Agility, flexibility, complexity
doc doc doc
Europe,
Norway
Europe,
France
Europe,
Austria
America,
Canada
doc
Norway France Austria Canada
doc
Show me all
documents about
European countries
Europe
Traditional approach Graph-based approach
doc doc doc
The power of knowledge graphs:
Agility, flexibility, complexity
doc doc doc
Europe,
Norway
Europe,
France
Europe,
Austria
America,
Canada
doc
Norway France Austria Canada
doc
Show me all
documents about
European countries
Europe
Traditional approach Graph-based approach
Show me all
documents about EU
member countries
doc doc doc
Norway France Austria Canada
The power of knowledge graphs:
Agility, flexibility, complexity
doc doc doc
Europe,
Norway
E.U,
Europe,
France
E.U,
Europe,
Austria
America,
Canada
doc doc doc doc doc
Show me all
documents about
European countries
Europe
Traditional approach Graph-based approach
Show me all
documents about EU
member countries
E.U
Norway France Austria Canada
The power of knowledge graphs:
Agility, flexibility, complexity
doc doc doc
Europe,
Norway
French,
EU,
Europe,
France
EU,
Europe,
Austria
French,
America,
Canada
doc doc doc doc doc
Show me all
documents about
European countries
Europe
Traditional approach Graph-based approach
Show me all
documents about EU
member countries
French-
speaking?
French-
speaking
EU
Norway France Austria Canada
The power of knowledge graphs:
Agility, flexibility, complexity
doc doc doc
Europe,
Norway
French,
EU,
Europe,
France
EU,
Europe,
Austria
French,
America,
Canada
doc doc doc doc doc
Show me all
documents about
European countries
Europe
Traditional approach Graph-based approach
Show me all
documents from EU
member countries
French-
speaking?
French-
speaking
EU
Metadata per
document
1. No or little network effects
2. No reuse of metadata
3. Metadata resides in silos
4. Data quality hard to measure
5. Not machine-readable
Knowledge about
metadata
1. Explicit knowledge models
2. Reusable and measurable
3. Metadata is machine-processable
4. Standards-based metadata
5. Linkable metadata opens silos
Better Together:
Unstructured and
Structured Data.
Towards a Linked Data based search
Bringing structure to text:
PoolParty GraphSearch
PoolParty GraphSearch =
Semantic Search + Analytics
Complex Queries based on
SPARQL and Linked Data
SELECT DISTINCT ?personname ?picture ?countryname ?hdi ?picture
WHERE
{
?person skos:prefLabel ?personname .
?country skos:prefLabel ?countryname .
?person a dbpedia:Person .
?country a dbpedia:Country .
?person skos:related ?country .
?country <http://dbpedia.org/property/hdi> ?hdi .
FILTER ( ?hdi < 0.6)
OPTIONAL
{
?person foaf:depiction ?picture .
}
} ORDER BY DESC(?hdi)
I want to explore medical
research trends in relation
to regional prosperity.
Organizing data in graphs using links
Graph nervous_system_diseases-abstracts
Graph en.dbpedia.org
Graph www.nlm.nih.gov/mesh
Graph www.geonames.org
PoolParty Semantic Integrator
System Architecture
Classified documents +
Linked taxonomies +
Knowledge graphs
● Dynamic filter criterias
● BI-like interface
● Large scale RDF store
● Fully RDF compatible
● All queries via SPARQL
sa
dd
sd
s
sa
dd
sd
s
ad
sa
dd
sd
s
ds
ad
ds
ds
sa
dd
sd
s
ds
ad
ds
ds
UnfiedViews as part of
PoolParty Semantic Integrator
UnifiedViews differs
from other ETL
frameworks by natively
supporting RDF data and
ontologies.
UnifiedViews has a
graphical user interface
for the administration,
debugging, and
monitoring of the ETL
process.
Use Cases
Success story: Healthdirect Australia
Over 120 information partners and sources
Great variety of category and metadata systems
One central vocabulary hub:
Australian Health Thesaurus (AHT)
Single point of access incl. harmonized search facets:
http://www.healthdirect.gov.au/
Clean Energy Data - Country Profiles
sOnr webMining for Confluence
Complex queries with SPARQL
PREFIX mrv-schema: <http://gbpn.org/mrv-schema/>
PREFIX qb: <http://purl.org/linked-data/cube#>
SELECT DISTINCT *
WHERE {
GRAPH <http://gbpn.org/mrv> {
?observation mrv-schema:year ?year.
?observation mrv-schema:region ?region.
?observation mrv-schema:region <http://gbpn.org/mrv-thes/region/India>.
?observation mrv-schema:scenario ?scenario.
?observation mrv-schema:scenario <http://gbpn.org/mrv-thes/scenario/deep-efficiency>.
{
?observation mrv-schema:urbanizationType ?urbanizationType.
?observation mrv-schema:urbanizationType <http://gbpn.org/mrv-thes/urbanization-type/urban>.
?observation mrv-schema:buildingType ?buildingType.
?observation mrv-schema:buildingType <http://gbpn.org/mrv-thes/building-type/MF>.
?observation mrv-schema:publicBuildingType ?publicBuildingType.
?observation mrv-schema:publicBuildingType <http://gbpn.org/mrv-thes/public-building-
type/NO>.
}
UNION
{
?observation mrv-schema:urbanizationType ?urbanizationType.
?observation mrv-schema:urbanizationType <http://gbpn.org/mrv-thes/urbanization-type/urban>.
?observation mrv-schema:buildingType ?buildingType.
?observation mrv-schema:buildingType <http://gbpn.org/mrv-thes/building-type/Slums>.
?observation mrv-schema:publicBuildingType ?publicBuildingType.
?observation mrv-schema:publicBuildingType <http://gbpn.org/mrv-thes/public-building-
type/NO>.
}
UNION
{
…….
More PoolParty Applications & Demos
Thesaurus Publishing Business Intelligence Content Recommendation Semantic Expert Finder
Web Mining Semantic Search Linked Data Visualization Symptom Checker
PoolParty 5.1
Highly precise entity extraction
Domain-specific extraction, highly
performant, language-agnostic,
disambiguation rules, REST API
Providing context in the knowledge graph
Activating disambiguation
Semantic Records Management: Integration
with Confluence Blueprints
⇒ Solution for Semantic
Records Management
Fully integrated web crawler
Make use of text corpus analysis:
Retrieve documents from various
sources, like RSS or from websites
Web Crawler extracts candidate terms
from any website
Extended ontology management &
semantic reasoning
From SKOS taxonomies to full-
blown ontologies:
PoolParty supports various levels of
knowledge modeling
Publishing custom schemes
Further extension of PoolParty API
● API method for skos:notes
● API method for skosxl:labels
● API methods for skos:collections
● API method to collect custom properties, attributes and types
● API method to R/W workflow status
● Retrieve history API method
● Retrieve SKOS subtree
Developer
Get started with PoolParty. Try it out now!
Get your PoolParty 5.1
Thesaurus Server &
Entity Extractor trial:
http://www.poolparty.biz/test-demo/
Contact points & further information
Andreas Blumauer, MSc IT
a.blumauer@semantic-web.at
https://www.linkedin.com/in/andreasblumauer
Semantic Web Company GmbH
Mariahilfer Strasse 70/8, A-1070 Vienna
+43-1-4021235
http://www.semantic-web.at
http://www.poolparty-software.com
Social Media Channels
http://slideshare.net/semwebcompany
http://youtube.com/semwebcompany
https://www.linkedin.com/groups?home=&gid=4059165

Dive deep into your Data Pools

  • 1.
    Welcome to this webinar! AndreasBlumauer CEO of Semantic Web Company
  • 2.
    About Semantic WebCompany (SWC) SWC was founded 2001, head-quartered in Vienna 25 experts in Linked Data technologies PoolParty Suite based on RDF Graph Data Model Serving customers from all over the world EU- & US-based consulting services
  • 3.
    Our Ecosystem: Customers& Partners Some of our Customers ● Credit Suisse ● Boehringer Ingelheim ● Roche ● adidas ● The Pokémon Company ● Canadian Broadcasting Corporation (CBC) ● Red Bull Media House ● Wolters Kluwer ● TC Media ● Techtarget ● BMJ Publishing Group ● CafePress ● Pearson - Always Learning ● Education Services Australia ● American Physical Society ● Healthdirect Australia ● World Bank Group ● Inter-American Development Bank (IADB) ● Renewable Energy Partnership ● Wood MacKenzie ● Development Initiatives ● International Atomic Energy Agency (IAEA) Finance / Automotive / Publisher / Health Care / Public Administration / Energy / Education Selected Partners ● PwC ● EPAM Systems ● iQuest ● EBCONT ● Gravity Zero ● MarkLogic ● OpenLink Software ● Ontotext ● Wolters Kluwer ● Data to Value ● Digirati ● Term Management ● Altotech We are all working on the replacement of data chaos by networking information ● Norwegian Directorate of Immigration ● Ministry of Finance (A) ● Council of the E.U. ● Australian National Data Service
  • 4.
    PoolParty Core Modules BainCapital is a venture capital company based in Boston, MA. Since inception it has invested in hundreds of companies including AMC Entertainment, Brookstone, and Burger King. The company was co-founded by Mitt Romney. Taxonomy & Ontology Server Entity Extraction & Text Mining Semantic Search, Analytics & Visualization
  • 5.
  • 6.
    The Enterprise Perspective: TheEnd of the Document “Life is no longer as simple as making PDF documents.” John Walker Business Analyst at NXP Semiconductors
  • 7.
    The Enterprise Perspective: GraphDatabases are Smart Data Lakes “Data in a large cooperation is often scattered over various tools, comes in different formats and with different levels of quality.” Fabian Heinemann Data Scientist at Roche
  • 8.
    The NPO Perspective: Usingcommon Definitions and Standards “Very few datasets tell a story in isolation.” The Data Manifesto Development Initiatives
  • 9.
    The warehouse approach seemsto be broken in a complex world Data Warehouse - structures and categories predefine the kind of analysis that is possible - excludes data to simplify the data model - does not efficiently handle new types of data - supports efficient indexing - enforces consistency - includes all data that may be used and even data that may never be used - all data regardless of source and structure is kept - data kept in its raw form and only transformed when used - handles structured and unstructured data - data models emerge with usage over time Data Lake
  • 10.
    The Analyst’s Perspective:Data Lakes don’ t fix the problem of lacking semantics “Organizations should focus on semantic consistency and performance in upstream applications and data stores instead of information consolidation in a data lake.” Gartner Beware of the Data Lake Fallacy
  • 11.
    Data Lakes haveall the information to answer complex queries, but…. Country GDP Pop AUS 1,560 23.14 SVE 580 9.60 WITH A COMBINED NUMBER of 357,100 registered asylum claims in 2013, Germany, the United States of America, France, Sweden and Turkey were the top five receiving countries, together accounting for nearly six out of ten asylum claims submitted in the 44 industrialized countries covered by this report. Place Asylum seekers Year Australia 24,300 2013 Sweden 54,300 2013 Show me all reports, in which EU member countries are mentioned with regards to their asylum politics, which have more than 10 asylum-seekers per 1,000 inhabitants.
  • 12.
    ...taxonomies link constantlychanging data sources while analytic needs are evolving Countries European Union Sweden SVE France FRA Austria AUT Oceania Country GDP Pop AUS 1,560 23.14 SVE 580 9.60 Place Asylum seekers Year Australia 24,300 2013 Sweden 54,300 2013 WITH A COMBINED NUMBER of 357,100 registered asylum claims in 2013, Germany, the United States of America, France, Sweden and Turkey were the top five receiving countries, together accounting for nearly six out of ten asylum claims submitted in the 44 industrialized countries covered by this report.
  • 13.
    Linked Data Warehousesare Smart Data Lakes Data Lake Data Warehouse - supports efficient indexing - enforces consistency - handles structured & unstructured data - data models emerge with usage over time - standards-based - unified data model - powerful query language
  • 14.
    What if questions emergewhen one starts analyzing the data?
  • 15.
    The power ofknowledge graphs: Agility, flexibility, complexity doc doc doc Norway France Austria Canada doc Norway France Austria Canada doc Show me all documents about European countries Traditional approach Graph-based approach doc doc doc
  • 16.
    The power ofknowledge graphs: Agility, flexibility, complexity doc doc doc Europe, Norway Europe, France Europe, Austria America, Canada doc Norway France Austria Canada doc Show me all documents about European countries Europe Traditional approach Graph-based approach doc doc doc
  • 17.
    The power ofknowledge graphs: Agility, flexibility, complexity doc doc doc Europe, Norway Europe, France Europe, Austria America, Canada doc Norway France Austria Canada doc Show me all documents about European countries Europe Traditional approach Graph-based approach Show me all documents about EU member countries doc doc doc
  • 18.
    Norway France AustriaCanada The power of knowledge graphs: Agility, flexibility, complexity doc doc doc Europe, Norway E.U, Europe, France E.U, Europe, Austria America, Canada doc doc doc doc doc Show me all documents about European countries Europe Traditional approach Graph-based approach Show me all documents about EU member countries E.U
  • 19.
    Norway France AustriaCanada The power of knowledge graphs: Agility, flexibility, complexity doc doc doc Europe, Norway French, EU, Europe, France EU, Europe, Austria French, America, Canada doc doc doc doc doc Show me all documents about European countries Europe Traditional approach Graph-based approach Show me all documents about EU member countries French- speaking? French- speaking EU
  • 20.
    Norway France AustriaCanada The power of knowledge graphs: Agility, flexibility, complexity doc doc doc Europe, Norway French, EU, Europe, France EU, Europe, Austria French, America, Canada doc doc doc doc doc Show me all documents about European countries Europe Traditional approach Graph-based approach Show me all documents from EU member countries French- speaking? French- speaking EU Metadata per document 1. No or little network effects 2. No reuse of metadata 3. Metadata resides in silos 4. Data quality hard to measure 5. Not machine-readable Knowledge about metadata 1. Explicit knowledge models 2. Reusable and measurable 3. Metadata is machine-processable 4. Standards-based metadata 5. Linkable metadata opens silos
  • 21.
  • 22.
    Towards a LinkedData based search
  • 23.
    Bringing structure totext: PoolParty GraphSearch
  • 24.
  • 25.
    Complex Queries basedon SPARQL and Linked Data SELECT DISTINCT ?personname ?picture ?countryname ?hdi ?picture WHERE { ?person skos:prefLabel ?personname . ?country skos:prefLabel ?countryname . ?person a dbpedia:Person . ?country a dbpedia:Country . ?person skos:related ?country . ?country <http://dbpedia.org/property/hdi> ?hdi . FILTER ( ?hdi < 0.6) OPTIONAL { ?person foaf:depiction ?picture . } } ORDER BY DESC(?hdi) I want to explore medical research trends in relation to regional prosperity.
  • 26.
    Organizing data ingraphs using links Graph nervous_system_diseases-abstracts Graph en.dbpedia.org Graph www.nlm.nih.gov/mesh Graph www.geonames.org
  • 27.
    PoolParty Semantic Integrator SystemArchitecture Classified documents + Linked taxonomies + Knowledge graphs ● Dynamic filter criterias ● BI-like interface ● Large scale RDF store ● Fully RDF compatible ● All queries via SPARQL sa dd sd s sa dd sd s ad sa dd sd s ds ad ds ds sa dd sd s ds ad ds ds
  • 28.
    UnfiedViews as partof PoolParty Semantic Integrator UnifiedViews differs from other ETL frameworks by natively supporting RDF data and ontologies. UnifiedViews has a graphical user interface for the administration, debugging, and monitoring of the ETL process.
  • 29.
  • 30.
    Success story: HealthdirectAustralia Over 120 information partners and sources Great variety of category and metadata systems One central vocabulary hub: Australian Health Thesaurus (AHT) Single point of access incl. harmonized search facets: http://www.healthdirect.gov.au/
  • 31.
    Clean Energy Data- Country Profiles
  • 32.
  • 33.
    Complex queries withSPARQL PREFIX mrv-schema: <http://gbpn.org/mrv-schema/> PREFIX qb: <http://purl.org/linked-data/cube#> SELECT DISTINCT * WHERE { GRAPH <http://gbpn.org/mrv> { ?observation mrv-schema:year ?year. ?observation mrv-schema:region ?region. ?observation mrv-schema:region <http://gbpn.org/mrv-thes/region/India>. ?observation mrv-schema:scenario ?scenario. ?observation mrv-schema:scenario <http://gbpn.org/mrv-thes/scenario/deep-efficiency>. { ?observation mrv-schema:urbanizationType ?urbanizationType. ?observation mrv-schema:urbanizationType <http://gbpn.org/mrv-thes/urbanization-type/urban>. ?observation mrv-schema:buildingType ?buildingType. ?observation mrv-schema:buildingType <http://gbpn.org/mrv-thes/building-type/MF>. ?observation mrv-schema:publicBuildingType ?publicBuildingType. ?observation mrv-schema:publicBuildingType <http://gbpn.org/mrv-thes/public-building- type/NO>. } UNION { ?observation mrv-schema:urbanizationType ?urbanizationType. ?observation mrv-schema:urbanizationType <http://gbpn.org/mrv-thes/urbanization-type/urban>. ?observation mrv-schema:buildingType ?buildingType. ?observation mrv-schema:buildingType <http://gbpn.org/mrv-thes/building-type/Slums>. ?observation mrv-schema:publicBuildingType ?publicBuildingType. ?observation mrv-schema:publicBuildingType <http://gbpn.org/mrv-thes/public-building- type/NO>. } UNION { …….
  • 34.
    More PoolParty Applications& Demos Thesaurus Publishing Business Intelligence Content Recommendation Semantic Expert Finder Web Mining Semantic Search Linked Data Visualization Symptom Checker
  • 35.
  • 36.
    Highly precise entityextraction Domain-specific extraction, highly performant, language-agnostic, disambiguation rules, REST API
  • 37.
    Providing context inthe knowledge graph
  • 38.
  • 39.
    Semantic Records Management:Integration with Confluence Blueprints ⇒ Solution for Semantic Records Management
  • 40.
    Fully integrated webcrawler Make use of text corpus analysis: Retrieve documents from various sources, like RSS or from websites
  • 41.
    Web Crawler extractscandidate terms from any website
  • 42.
    Extended ontology management& semantic reasoning From SKOS taxonomies to full- blown ontologies: PoolParty supports various levels of knowledge modeling
  • 43.
  • 44.
    Further extension ofPoolParty API ● API method for skos:notes ● API method for skosxl:labels ● API methods for skos:collections ● API method to collect custom properties, attributes and types ● API method to R/W workflow status ● Retrieve history API method ● Retrieve SKOS subtree Developer
  • 45.
    Get started withPoolParty. Try it out now! Get your PoolParty 5.1 Thesaurus Server & Entity Extractor trial: http://www.poolparty.biz/test-demo/
  • 46.
    Contact points &further information Andreas Blumauer, MSc IT a.blumauer@semantic-web.at https://www.linkedin.com/in/andreasblumauer Semantic Web Company GmbH Mariahilfer Strasse 70/8, A-1070 Vienna +43-1-4021235 http://www.semantic-web.at http://www.poolparty-software.com Social Media Channels http://slideshare.net/semwebcompany http://youtube.com/semwebcompany https://www.linkedin.com/groups?home=&gid=4059165