Your SlideShare is downloading. ×
0
Data integration and
visualization
Ivan Ermilov
University of Leipzig
USING RDF
Agenda
• Data discovery
• Data conversion
• Data integration
Linked Data Lifecycle
http://stack.lod2.eu/blog/
DATA DISCOVERY
Data Discovery
• Ontologies
• Vocabularies
• Documents
Data Discovery: Ontologies
Specification of a conceptualization
Data Discovery: Ontologies
Data Discovery: Ontologies
http://swoogle.umbc.edu/
http://watson.kmi.open.ac.uk/WatsonWUI/
Data Discovery: Vocabularies
FOAF – Friend of a Friend:
• A Semantic Web Vocabulary used to describe people,
their activit...
Data Discovery: Vocabularies
http://xmlns.com/foaf/spec/
Data Discovery: Vocabularies
Data Discovery: Vocabularies
http://lov.okfn.org/dataset/lov/
Data Discovery: Documents
<http://www.linkedin.com/in/timbl> <http://purl.org/dc/terms/title> "Tim Berners-Lee - LinkedIn"...
Data Discovery: Documents
http://sindice.com/
Data Catalogs
• Community maintained registry exists
• Contains 362 data catalogs (growing)
• Based on CKAN data catalog p...
Data Catalogs
http://datacatalogs.org/
What is CKAN?
• Metadata repository with crowd-sourcing enabled
• Everybody can register and publish data about their data...
CKAN Architecture
Packages Resources
contain
And you can search
for them
The Data Hub
The Data Hub
Hub of Data
Hub of Data
CKAN API
• Well-documented
• http://docs.ckan.org/en/latest/api.html
• Covers everything you can do with the web interface...
CKAN API: Methods
• Retrieving data
• Creating new data
• Update existing data
• Delete existing data
• Data is: packages,...
CKAN API: Examples
ckan = CkanClient(base_location=ckan_api_url,
api_key=ckan_api_key)
package_list = ckan.package_list()
...
Use Case: CSV2RDF Conversion
• Framework for CSV2RDF conversion
• Crowd-sourcing enabled
• RDF Visualizations
https://gith...
CSV2RDF Conversion: Why CSV?
CSV2RDF Conversion: Data Quality
Data conversion
Data Conversion
• Structured: Relational Databases
• Semi structured: XML, HTML, XLS, CSV, APIs
• Unstructured: Raw text
P...
XML
RDB
Spreadsheet
?
How does government
spending in certain sectors
relates to my company’s
earnings?
How does the histo...
Custom
scripts
XML
RDB
Spreadsheet
?
Data Conversion
XPath
SQL
Result
aggregation
Merging data with RDF
XML
RDB
Spreadsheet
Once in RDF:
 Easily integrate your
data
 Concepts can be
mapped to one
anothe...
Merging Data with RDF: Example
• Blue App has model
• Red App has model
• Need to integrate Red & Blue models
Merging Data with RDF: Example
• Step 1: Merge RDF
• Same nodes (URIs) join automatically
Merging Data with RDF: Example
• Step 2: Add relationships and rules
• (Relationships are also RDF)
Merging Data with RDF: Example
• Step 3: Define Green model
• (Making use of Red
• & Blue models)
Merging Data with RDF: Example
• What the Blue app sees:
• No difference!
Merging Data with RDF: Example
• What the Red app sees
• No difference!
Merging Data with RDF: Example
RDF helps bridge other formats/models
• Producers and consumers may use different formats/models
• Rules can specify trans...
RDB2RDF
Extract, Transform, Load (ETL)
Automatic Mapping
Semi-Automatic Mapping
R2RML
Sparqlify: Examples
Sparqlify: Examples
Sparqlify: Examples
Sparqlify: Examples
Sparqlify: Examples
Sparqlify: CSV2RDF
Prefix pdd: <http://data.publicdata.eu/>
Prefix pdo: <http://wiki.publicdata.eu/ontology/>
Create View ...
Raw Text Processing: ConTEXT
●
No installation and configuration required.
●
Access content from a variety of sources
●
In...
Processing Raw Text: ConTEXT
Data Integration
Definition
• In general, integration of multiple information systems
aims at combining selected systems so that they form
...
Semantic Data Integration
Federated SPARQL Queries
• Query processing involving multiple distributed data
sources, e.g. Linked Open Data cloud
DBped...
Federated Query Processing
Federation mediator at the server
Virtual integration of (remote) data sources
Communication vi...
Federated Query Engines
Engine Name Implementation
language
License
FedX Java GNU A.G.P.L
SPLENDID Java L.G.P.L
LHD Java M...
Data Visualization
LD Visualization Techniques
LD Visualization Techniques
LD Visualization Techniques
LD Visualization Techniques
Classification of Visualization
Techniques
Comparison of Values/Attributeshttp://goo.gl/IvsGbU
http://goo.gl/JeFhlM
Analysis of Relationships and
Hierarchies
Analysis of Relationships and
Hierarchies
http://rhizomik.net/dbpedia/treemap.jsp
http://lov.okfn.org/dataset/lov/
Analysis of Temporal and
Geographical Events http://lov.okfn.org/dataset/lov/details/vocabulary_dcterms.html
Analysis of Multidimensional Data
http://mbostock.github.io/protovis/ex/cars.html
Other Visualization Techniques
Applications of LD Visualization
Techniques
Tool Types
Tool Types
CubeViz
Facete
Thank you
Ivan Ermilov
iermilov@informatik.uni-leipzig.de
University of Leipzig
FOR YOUR ATTENTION
Data Integration And Visualization
Upcoming SlideShare
Loading in...5
×

Data Integration And Visualization

458

Published on

Using RDF

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
458
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Example scenario DBpedia and New York Times collections DBpedia as structured knowledge base New York Times as a news provider
  • Transcript of "Data Integration And Visualization"

    1. 1. Data integration and visualization Ivan Ermilov University of Leipzig USING RDF
    2. 2. Agenda • Data discovery • Data conversion • Data integration
    3. 3. Linked Data Lifecycle http://stack.lod2.eu/blog/
    4. 4. DATA DISCOVERY
    5. 5. Data Discovery • Ontologies • Vocabularies • Documents
    6. 6. Data Discovery: Ontologies Specification of a conceptualization
    7. 7. Data Discovery: Ontologies
    8. 8. Data Discovery: Ontologies http://swoogle.umbc.edu/ http://watson.kmi.open.ac.uk/WatsonWUI/
    9. 9. Data Discovery: Vocabularies FOAF – Friend of a Friend: • A Semantic Web Vocabulary used to describe people, their activities and their relationships between one another. • It is becoming very popular for people who discover this to setup and have their own FOAF profile. • This vocabulary is the base from which other vocabularies are extended.
    10. 10. Data Discovery: Vocabularies http://xmlns.com/foaf/spec/
    11. 11. Data Discovery: Vocabularies
    12. 12. Data Discovery: Vocabularies http://lov.okfn.org/dataset/lov/
    13. 13. Data Discovery: Documents <http://www.linkedin.com/in/timbl> <http://purl.org/dc/terms/title> "Tim Berners-Lee - LinkedIn"@en . _:node0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2006/vcard/ns#Address> . _:node0 <http://www.w3.org/2006/vcard/ns#locality> "Greater Boston Area" . <http://www.linkedin.com/in/timbl> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/12/cal/icaltzd#vcalendar> . _:node1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/12/cal/icaltzd#Vevent> . _:node1 <http://www.w3.org/2002/12/cal/icaltzd#summary> "MIT" . _:node1 <http://www.w3.org/2002/12/cal/icaltzd#description> "Director, World Wide Web ConsortiumnnAlso, part time Prof in ECS at Southampton University, UK" .
    14. 14. Data Discovery: Documents http://sindice.com/
    15. 15. Data Catalogs • Community maintained registry exists • Contains 362 data catalogs (growing) • Based on CKAN data catalog platform http://datacatalogs.org/
    16. 16. Data Catalogs http://datacatalogs.org/
    17. 17. What is CKAN? • Metadata repository with crowd-sourcing enabled • Everybody can register and publish data about their datasets • Developer-friendly web application • Provides a well-documented API • Easy to install, easy to use as your own metadata repository
    18. 18. CKAN Architecture Packages Resources contain And you can search for them
    19. 19. The Data Hub
    20. 20. The Data Hub
    21. 21. Hub of Data
    22. 22. Hub of Data
    23. 23. CKAN API • Well-documented • http://docs.ckan.org/en/latest/api.html • Covers everything you can do with the web interface • You can write your own web interface • OKFN maintained library for accessing API • ckanclient (python)
    24. 24. CKAN API: Methods • Retrieving data • Creating new data • Update existing data • Delete existing data • Data is: packages, resources, groups, tags, users etc.
    25. 25. CKAN API: Examples ckan = CkanClient(base_location=ckan_api_url, api_key=ckan_api_key) package_list = ckan.package_list() formats = [] for package in package_list: resource_list = package[‘resources’] for resource in resource_list: if(not resource['format'] in formats): formats.append(resource['format']) return sorted(formats) https://github.com/okfn/ckanclient
    26. 26. Use Case: CSV2RDF Conversion • Framework for CSV2RDF conversion • Crowd-sourcing enabled • RDF Visualizations https://github.com/earthquakesan/CSV2RDF-WIKI
    27. 27. CSV2RDF Conversion: Why CSV?
    28. 28. CSV2RDF Conversion: Data Quality
    29. 29. Data conversion
    30. 30. Data Conversion • Structured: Relational Databases • Semi structured: XML, HTML, XLS, CSV, APIs • Unstructured: Raw text PublicData.eu Statistics
    31. 31. XML RDB Spreadsheet ? How does government spending in certain sectors relates to my company’s earnings? How does the historic spending relates to the current figures? Give me report about all of my customers across the whole organization Data Conversion
    32. 32. Custom scripts XML RDB Spreadsheet ? Data Conversion XPath SQL Result aggregation
    33. 33. Merging data with RDF XML RDB Spreadsheet Once in RDF:  Easily integrate your data  Concepts can be mapped to one another  Query everything with one W3C standard language (SPARQL)
    34. 34. Merging Data with RDF: Example • Blue App has model
    35. 35. • Red App has model • Need to integrate Red & Blue models Merging Data with RDF: Example
    36. 36. • Step 1: Merge RDF • Same nodes (URIs) join automatically Merging Data with RDF: Example
    37. 37. • Step 2: Add relationships and rules • (Relationships are also RDF) Merging Data with RDF: Example
    38. 38. • Step 3: Define Green model • (Making use of Red • & Blue models) Merging Data with RDF: Example
    39. 39. • What the Blue app sees: • No difference! Merging Data with RDF: Example
    40. 40. • What the Red app sees • No difference! Merging Data with RDF: Example
    41. 41. RDF helps bridge other formats/models • Producers and consumers may use different formats/models • Rules can specify transformations • Inference engine finds path to desired result model RDF Model Transform A1 A2 A3 B1 B2 C1 C2 X Y Z Ontologies & Rules Ontologies & Rules Ontologies & Rules
    42. 42. RDB2RDF
    43. 43. Extract, Transform, Load (ETL)
    44. 44. Automatic Mapping
    45. 45. Semi-Automatic Mapping
    46. 46. R2RML
    47. 47. Sparqlify: Examples
    48. 48. Sparqlify: Examples
    49. 49. Sparqlify: Examples
    50. 50. Sparqlify: Examples
    51. 51. Sparqlify: Examples
    52. 52. Sparqlify: CSV2RDF Prefix pdd: <http://data.publicdata.eu/> Prefix pdo: <http://wiki.publicdata.eu/ontology/> Create View Template DefaultMapping As Construct { ?s ?p1 ?o1 ; ?p2 ?o2 ... } With ?s = uri(concat(pdd:,’csv-path/’,?rowId)) ?p1 = uri(concat(pdo:, ?headingName1)) ?o1 = plainLiteral(?1) ?p2 = ... http://sparqlify.org/
    53. 53. Raw Text Processing: ConTEXT ● No installation and configuration required. ● Access content from a variety of sources ● Instantly show the results of text analysis to users in a variety of visualizations. ● Allow refinement of automatic annotations and take feedback into account ● Provide a generic architecture where different modules for content acquisition, natural language processing and visualization can be plugged together. http://rdface.aksw.org/nlp/hub.php
    54. 54. Processing Raw Text: ConTEXT
    55. 55. Data Integration
    56. 56. Definition • In general, integration of multiple information systems aims at combining selected systems so that they form a unified new whole and give users the illusion of interacting with one single information system
    57. 57. Semantic Data Integration
    58. 58. Federated SPARQL Queries • Query processing involving multiple distributed data sources, e.g. Linked Open Data cloud DBpedia New York Times Query both data collections in an integrated way
    59. 59. Federated Query Processing Federation mediator at the server Virtual integration of (remote) data sources Communication via SPARQL protocol SPARQL Data Source SPARQL Data Source Federation Mediator SPARQL Data Source Query
    60. 60. Federated Query Engines Engine Name Implementation language License FedX Java GNU A.G.P.L SPLENDID Java L.G.P.L LHD Java MIT DARQ Java GPL ANAPSID Python GNU G.P.L ADERIS Java Apache
    61. 61. Data Visualization
    62. 62. LD Visualization Techniques
    63. 63. LD Visualization Techniques
    64. 64. LD Visualization Techniques
    65. 65. LD Visualization Techniques
    66. 66. Classification of Visualization Techniques
    67. 67. Comparison of Values/Attributeshttp://goo.gl/IvsGbU http://goo.gl/JeFhlM
    68. 68. Analysis of Relationships and Hierarchies
    69. 69. Analysis of Relationships and Hierarchies http://rhizomik.net/dbpedia/treemap.jsp http://lov.okfn.org/dataset/lov/
    70. 70. Analysis of Temporal and Geographical Events http://lov.okfn.org/dataset/lov/details/vocabulary_dcterms.html
    71. 71. Analysis of Multidimensional Data http://mbostock.github.io/protovis/ex/cars.html
    72. 72. Other Visualization Techniques
    73. 73. Applications of LD Visualization Techniques
    74. 74. Tool Types
    75. 75. Tool Types
    76. 76. CubeViz
    77. 77. Facete
    78. 78. Thank you Ivan Ermilov iermilov@informatik.uni-leipzig.de University of Leipzig FOR YOUR ATTENTION
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×