Data Integration And Visualization

797 views

Published on

Using RDF

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
797
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
17
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Example scenario DBpedia and New York Times collections DBpedia as structured knowledge base New York Times as a news provider
  • Data Integration And Visualization

    1. 1. Data integration and visualization Ivan Ermilov University of Leipzig USING RDF
    2. 2. Agenda • Data discovery • Data conversion • Data integration
    3. 3. Linked Data Lifecycle http://stack.lod2.eu/blog/
    4. 4. DATA DISCOVERY
    5. 5. Data Discovery • Ontologies • Vocabularies • Documents
    6. 6. Data Discovery: Ontologies Specification of a conceptualization
    7. 7. Data Discovery: Ontologies
    8. 8. Data Discovery: Ontologies http://swoogle.umbc.edu/ http://watson.kmi.open.ac.uk/WatsonWUI/
    9. 9. Data Discovery: Vocabularies FOAF – Friend of a Friend: • A Semantic Web Vocabulary used to describe people, their activities and their relationships between one another. • It is becoming very popular for people who discover this to setup and have their own FOAF profile. • This vocabulary is the base from which other vocabularies are extended.
    10. 10. Data Discovery: Vocabularies http://xmlns.com/foaf/spec/
    11. 11. Data Discovery: Vocabularies
    12. 12. Data Discovery: Vocabularies http://lov.okfn.org/dataset/lov/
    13. 13. Data Discovery: Documents <http://www.linkedin.com/in/timbl> <http://purl.org/dc/terms/title> "Tim Berners-Lee - LinkedIn"@en . _:node0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2006/vcard/ns#Address> . _:node0 <http://www.w3.org/2006/vcard/ns#locality> "Greater Boston Area" . <http://www.linkedin.com/in/timbl> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/12/cal/icaltzd#vcalendar> . _:node1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/12/cal/icaltzd#Vevent> . _:node1 <http://www.w3.org/2002/12/cal/icaltzd#summary> "MIT" . _:node1 <http://www.w3.org/2002/12/cal/icaltzd#description> "Director, World Wide Web ConsortiumnnAlso, part time Prof in ECS at Southampton University, UK" .
    14. 14. Data Discovery: Documents http://sindice.com/
    15. 15. Data Catalogs • Community maintained registry exists • Contains 362 data catalogs (growing) • Based on CKAN data catalog platform http://datacatalogs.org/
    16. 16. Data Catalogs http://datacatalogs.org/
    17. 17. What is CKAN? • Metadata repository with crowd-sourcing enabled • Everybody can register and publish data about their datasets • Developer-friendly web application • Provides a well-documented API • Easy to install, easy to use as your own metadata repository
    18. 18. CKAN Architecture Packages Resources contain And you can search for them
    19. 19. The Data Hub
    20. 20. The Data Hub
    21. 21. Hub of Data
    22. 22. Hub of Data
    23. 23. CKAN API • Well-documented • http://docs.ckan.org/en/latest/api.html • Covers everything you can do with the web interface • You can write your own web interface • OKFN maintained library for accessing API • ckanclient (python)
    24. 24. CKAN API: Methods • Retrieving data • Creating new data • Update existing data • Delete existing data • Data is: packages, resources, groups, tags, users etc.
    25. 25. CKAN API: Examples ckan = CkanClient(base_location=ckan_api_url, api_key=ckan_api_key) package_list = ckan.package_list() formats = [] for package in package_list: resource_list = package[‘resources’] for resource in resource_list: if(not resource['format'] in formats): formats.append(resource['format']) return sorted(formats) https://github.com/okfn/ckanclient
    26. 26. Use Case: CSV2RDF Conversion • Framework for CSV2RDF conversion • Crowd-sourcing enabled • RDF Visualizations https://github.com/earthquakesan/CSV2RDF-WIKI
    27. 27. CSV2RDF Conversion: Why CSV?
    28. 28. CSV2RDF Conversion: Data Quality
    29. 29. Data conversion
    30. 30. Data Conversion • Structured: Relational Databases • Semi structured: XML, HTML, XLS, CSV, APIs • Unstructured: Raw text PublicData.eu Statistics
    31. 31. XML RDB Spreadsheet ? How does government spending in certain sectors relates to my company’s earnings? How does the historic spending relates to the current figures? Give me report about all of my customers across the whole organization Data Conversion
    32. 32. Custom scripts XML RDB Spreadsheet ? Data Conversion XPath SQL Result aggregation
    33. 33. Merging data with RDF XML RDB Spreadsheet Once in RDF:  Easily integrate your data  Concepts can be mapped to one another  Query everything with one W3C standard language (SPARQL)
    34. 34. Merging Data with RDF: Example • Blue App has model
    35. 35. • Red App has model • Need to integrate Red & Blue models Merging Data with RDF: Example
    36. 36. • Step 1: Merge RDF • Same nodes (URIs) join automatically Merging Data with RDF: Example
    37. 37. • Step 2: Add relationships and rules • (Relationships are also RDF) Merging Data with RDF: Example
    38. 38. • Step 3: Define Green model • (Making use of Red • & Blue models) Merging Data with RDF: Example
    39. 39. • What the Blue app sees: • No difference! Merging Data with RDF: Example
    40. 40. • What the Red app sees • No difference! Merging Data with RDF: Example
    41. 41. RDF helps bridge other formats/models • Producers and consumers may use different formats/models • Rules can specify transformations • Inference engine finds path to desired result model RDF Model Transform A1 A2 A3 B1 B2 C1 C2 X Y Z Ontologies & Rules Ontologies & Rules Ontologies & Rules
    42. 42. RDB2RDF
    43. 43. Extract, Transform, Load (ETL)
    44. 44. Automatic Mapping
    45. 45. Semi-Automatic Mapping
    46. 46. R2RML
    47. 47. Sparqlify: Examples
    48. 48. Sparqlify: Examples
    49. 49. Sparqlify: Examples
    50. 50. Sparqlify: Examples
    51. 51. Sparqlify: Examples
    52. 52. Sparqlify: CSV2RDF Prefix pdd: <http://data.publicdata.eu/> Prefix pdo: <http://wiki.publicdata.eu/ontology/> Create View Template DefaultMapping As Construct { ?s ?p1 ?o1 ; ?p2 ?o2 ... } With ?s = uri(concat(pdd:,’csv-path/’,?rowId)) ?p1 = uri(concat(pdo:, ?headingName1)) ?o1 = plainLiteral(?1) ?p2 = ... http://sparqlify.org/
    53. 53. Raw Text Processing: ConTEXT ● No installation and configuration required. ● Access content from a variety of sources ● Instantly show the results of text analysis to users in a variety of visualizations. ● Allow refinement of automatic annotations and take feedback into account ● Provide a generic architecture where different modules for content acquisition, natural language processing and visualization can be plugged together. http://rdface.aksw.org/nlp/hub.php
    54. 54. Processing Raw Text: ConTEXT
    55. 55. Data Integration
    56. 56. Definition • In general, integration of multiple information systems aims at combining selected systems so that they form a unified new whole and give users the illusion of interacting with one single information system
    57. 57. Semantic Data Integration
    58. 58. Federated SPARQL Queries • Query processing involving multiple distributed data sources, e.g. Linked Open Data cloud DBpedia New York Times Query both data collections in an integrated way
    59. 59. Federated Query Processing Federation mediator at the server Virtual integration of (remote) data sources Communication via SPARQL protocol SPARQL Data Source SPARQL Data Source Federation Mediator SPARQL Data Source Query
    60. 60. Federated Query Engines Engine Name Implementation language License FedX Java GNU A.G.P.L SPLENDID Java L.G.P.L LHD Java MIT DARQ Java GPL ANAPSID Python GNU G.P.L ADERIS Java Apache
    61. 61. Data Visualization
    62. 62. LD Visualization Techniques
    63. 63. LD Visualization Techniques
    64. 64. LD Visualization Techniques
    65. 65. LD Visualization Techniques
    66. 66. Classification of Visualization Techniques
    67. 67. Comparison of Values/Attributeshttp://goo.gl/IvsGbU http://goo.gl/JeFhlM
    68. 68. Analysis of Relationships and Hierarchies
    69. 69. Analysis of Relationships and Hierarchies http://rhizomik.net/dbpedia/treemap.jsp http://lov.okfn.org/dataset/lov/
    70. 70. Analysis of Temporal and Geographical Events http://lov.okfn.org/dataset/lov/details/vocabulary_dcterms.html
    71. 71. Analysis of Multidimensional Data http://mbostock.github.io/protovis/ex/cars.html
    72. 72. Other Visualization Techniques
    73. 73. Applications of LD Visualization Techniques
    74. 74. Tool Types
    75. 75. Tool Types
    76. 76. CubeViz
    77. 77. Facete
    78. 78. Thank you Ivan Ermilov iermilov@informatik.uni-leipzig.de University of Leipzig FOR YOUR ATTENTION

    ×