Karma, a Data Integration Tool
Pedro Szekely, Project Leader/Research Associate Professor, Information Sciences Institute, University of Southern California
4. dsearles/Flickr
Karma’s Goals
tear down data silos
connect information in separate databases
expose untapped value of database content
Pedro Szekely CC-By 2.0 4
5. Karma’s Audience
Cultural heritage
Entertainment
Intelligence
Science
... anyone who has data silo problems
Pedro Szekely CC-By 2.0 5
10. Pedro Szekely
web pages are machine processable,
but not machine understandable
impractical for building applications using the data
Pedro Szekely CC-By 2.0 10
11. Solution: Linked Data
A method of publishing structured data
so that it can be interlinked
and become more useful
Builds upon standard Web technologies
such as HTTP and URIs
to share information
in a way that can be read automatically by
from Wikipedia computers
Pedro Szekely CC-By 2.0 11
12. Represent Resources Using
URIs
That guy has first name “Pedro”
http://szekelys.com/family#pedro
“Pedro”
http://xmlns.com/foaf/0.1/firstName
Pedro Szekely CC-By 2.0 12
13. Represent Information as
Triples
http://szekelys.com/family#pedro
http://xmlns.com/foaf/0.1/firstName
Subject
Predicate
“Pedro”
Object
The resource being described
A property of the resource
The value of the property
Pedro Szekely CC-By 2.0 13
17. Steps to Create Linked Open
Data
• Select ontologies
… that define classes and properties for our
data
• Convert data to RDF
… from data sources to the ontologies
• Identify links to other Linked Data datasets
… to other Linked Data
Pedro Szekely CC-By 2.0 17
18. • Select ontologies
… that define classes and properties for our
data
Pedro Szekely
CC-By 2.0 18
CIDOC CRM
http://www.cidoc-crm.org/
e.g.
19. Pedro Szekely
CC-By 2.0 19
• Select ontologies
… that define classes and properties for our
data
• Convert data to RDF
… from data sources to the ontologies
20. RDF Mapping Tools
Tool Shortcomings Benefits
custom
labor intensive,
flexible
code
error prone
R2RML difficult to learn,
only for SQL
databases
W3C standard, good
documentation, multiple
vendors
RDF
Refine
only for tabular
data
graphical user interface,
support for reconciliation,
open source
Karma semi-automatic, graphical
user interface, supports
tabular data, XML and JSON,
multiple export formats,
Pedro Szekely CC-By 2.0 20
22. Karma
Interactive tool for rapidly extracting, cleaning,
transforming, integrating and publishing data
Tabular
Sources
Hierarchic
al Sources
Service
Karma
s Model
RDF
Database
JSON
…
Pedro Szekely 22
23. Inputs: Ontologies and Data
Sources
Data Source
object property
data property
subClassOf
Domain Ontology
birthdate
Person
Organization
Place
State
name
bornIn
worksFor state
name
phone
name
livesIn
City
Event
ceo
location
organizer
nearby
startDate
title
isPartOf
postalCode
Column 1 Column 2 Column 3 Column 4 Column 5
Bill Gates Oct 1955 Microsoft Seattle WA
Mark Zuckerberg May 1984 Facebook White Plains NY
Pedro Szekely Larry Page Mar 1973 Google East Lansing CCM-BI y 2.0 23
24. Pedro Szekely
Semantic Model: maps
source to domain
Source
object property
data property
subClassOf
Domain Ontology
birthdate
Person
Organization
Place
State
name
bornIn
worksFor state
name
phone
name
livesIn
City
Event
ceo
location
organizer
nearby
startDate
title
isPartOf
postalCode
ontology
Column 1 Column 2 Column 3 Column 4 Column 5
Bill Gates Oct 1955 Microsoft Seattle WA
Mark Zuckerberg May 1984 Facebook White Plains NY
CC-By 2.0 24
Larry Page Mar 1973 Google East Lansing MI
25. Semantic Model = Semantic Types + Relationships
Pedro Szekely
CC-By 2.0 25
26. Semantic Types
Person Person
Organization City State
name birthdate name name name
Column 1 Column 2 Column 3 Column 4 Column 5
Bill Gates Oct 1955 Microsoft Seattle WA
Mark Zuckerberg May 1984 Facebook White Plains NY
Larry Page Mar 1973 Google East Lansing MI
Pedro Szekely CC-By 2.0 26
27. Relationships
Person
City
worksFor
Organization
State
bornIn
name birthdate
state
name
name
name
Column 1 Column 2 Column 3 Column 4 Column 5
Bill Gates Oct 1955 Microsoft Seattle WA
Mark Zuckerberg May 1984 Facebook White Plains NY
Larry Page Mar 1973 Google East Lansing MI
Pedro Szekely CC-By 2.0 27
29. Karma uses semantic models to create linked Pedro Szekely
Linked Data
CC-By 2.0 29
30. Karma uses semantic models to create linked Karma Pedro Szekely
semi-automatically builds semantic models
Linked Data
CC-By 2.0 30
31. Karma uses semantic models to create linked Karma Pedro Szekely
semi-automatically builds semantic models
… and provides a nice GUI to edit them
Linked Data
CC-By 2.0 31
45. Storage Options
Technolog
y
Shortcomings Benefits
SPARQL
endpoint
low reliability,
esoteric, slow
sophisticated query
language
RDF
dump
no query capability,
esoteric
flexibility: clients can
download and use in
applications, easy to
publish
JSON-LD
+
ElasticSe
arch
restricted query
language
very high performance,
mainstream technology,
Karma supports the etahsyre toe p uobplisthions
Pedro Szekely CC-By 2.0 45
46. thanks for your attention
https://github.com/usc-isi-i2/Web-Karma
Open Source, Apache 2 License
CC-By 2.0 46