• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data Integration And Visualization
 

Data Integration And Visualization

on

  • 439 views

Using RDF

Using RDF

Statistics

Views

Total Views
439
Views on SlideShare
439
Embed Views
0

Actions

Likes
1
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Example scenario DBpedia and New York Times collections DBpedia as structured knowledge base New York Times as a news provider

Data Integration And Visualization Data Integration And Visualization Presentation Transcript

  • Data integration and visualization Ivan Ermilov University of Leipzig USING RDF
  • Agenda • Data discovery • Data conversion • Data integration
  • Linked Data Lifecycle http://stack.lod2.eu/blog/
  • DATA DISCOVERY
  • Data Discovery • Ontologies • Vocabularies • Documents
  • Data Discovery: Ontologies Specification of a conceptualization
  • Data Discovery: Ontologies
  • Data Discovery: Ontologies http://swoogle.umbc.edu/ http://watson.kmi.open.ac.uk/WatsonWUI/
  • Data Discovery: Vocabularies FOAF – Friend of a Friend: • A Semantic Web Vocabulary used to describe people, their activities and their relationships between one another. • It is becoming very popular for people who discover this to setup and have their own FOAF profile. • This vocabulary is the base from which other vocabularies are extended.
  • Data Discovery: Vocabularies http://xmlns.com/foaf/spec/
  • Data Discovery: Vocabularies
  • Data Discovery: Vocabularies http://lov.okfn.org/dataset/lov/
  • Data Discovery: Documents <http://www.linkedin.com/in/timbl> <http://purl.org/dc/terms/title> "Tim Berners-Lee - LinkedIn"@en . _:node0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2006/vcard/ns#Address> . _:node0 <http://www.w3.org/2006/vcard/ns#locality> "Greater Boston Area" . <http://www.linkedin.com/in/timbl> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/12/cal/icaltzd#vcalendar> . _:node1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/12/cal/icaltzd#Vevent> . _:node1 <http://www.w3.org/2002/12/cal/icaltzd#summary> "MIT" . _:node1 <http://www.w3.org/2002/12/cal/icaltzd#description> "Director, World Wide Web ConsortiumnnAlso, part time Prof in ECS at Southampton University, UK" .
  • Data Discovery: Documents http://sindice.com/
  • Data Catalogs • Community maintained registry exists • Contains 362 data catalogs (growing) • Based on CKAN data catalog platform http://datacatalogs.org/
  • Data Catalogs http://datacatalogs.org/
  • What is CKAN? • Metadata repository with crowd-sourcing enabled • Everybody can register and publish data about their datasets • Developer-friendly web application • Provides a well-documented API • Easy to install, easy to use as your own metadata repository
  • CKAN Architecture Packages Resources contain And you can search for them
  • The Data Hub
  • The Data Hub
  • Hub of Data
  • Hub of Data
  • CKAN API • Well-documented • http://docs.ckan.org/en/latest/api.html • Covers everything you can do with the web interface • You can write your own web interface • OKFN maintained library for accessing API • ckanclient (python)
  • CKAN API: Methods • Retrieving data • Creating new data • Update existing data • Delete existing data • Data is: packages, resources, groups, tags, users etc.
  • CKAN API: Examples ckan = CkanClient(base_location=ckan_api_url, api_key=ckan_api_key) package_list = ckan.package_list() formats = [] for package in package_list: resource_list = package[‘resources’] for resource in resource_list: if(not resource['format'] in formats): formats.append(resource['format']) return sorted(formats) https://github.com/okfn/ckanclient
  • Use Case: CSV2RDF Conversion • Framework for CSV2RDF conversion • Crowd-sourcing enabled • RDF Visualizations https://github.com/earthquakesan/CSV2RDF-WIKI
  • CSV2RDF Conversion: Why CSV?
  • CSV2RDF Conversion: Data Quality
  • Data conversion
  • Data Conversion • Structured: Relational Databases • Semi structured: XML, HTML, XLS, CSV, APIs • Unstructured: Raw text PublicData.eu Statistics
  • XML RDB Spreadsheet ? How does government spending in certain sectors relates to my company’s earnings? How does the historic spending relates to the current figures? Give me report about all of my customers across the whole organization Data Conversion
  • Custom scripts XML RDB Spreadsheet ? Data Conversion XPath SQL Result aggregation
  • Merging data with RDF XML RDB Spreadsheet Once in RDF:  Easily integrate your data  Concepts can be mapped to one another  Query everything with one W3C standard language (SPARQL)
  • Merging Data with RDF: Example • Blue App has model
  • • Red App has model • Need to integrate Red & Blue models Merging Data with RDF: Example
  • • Step 1: Merge RDF • Same nodes (URIs) join automatically Merging Data with RDF: Example
  • • Step 2: Add relationships and rules • (Relationships are also RDF) Merging Data with RDF: Example
  • • Step 3: Define Green model • (Making use of Red • & Blue models) Merging Data with RDF: Example
  • • What the Blue app sees: • No difference! Merging Data with RDF: Example
  • • What the Red app sees • No difference! Merging Data with RDF: Example
  • RDF helps bridge other formats/models • Producers and consumers may use different formats/models • Rules can specify transformations • Inference engine finds path to desired result model RDF Model Transform A1 A2 A3 B1 B2 C1 C2 X Y Z Ontologies & Rules Ontologies & Rules Ontologies & Rules
  • RDB2RDF
  • Extract, Transform, Load (ETL)
  • Automatic Mapping
  • Semi-Automatic Mapping
  • R2RML
  • Sparqlify: Examples
  • Sparqlify: Examples
  • Sparqlify: Examples
  • Sparqlify: Examples
  • Sparqlify: Examples
  • Sparqlify: CSV2RDF Prefix pdd: <http://data.publicdata.eu/> Prefix pdo: <http://wiki.publicdata.eu/ontology/> Create View Template DefaultMapping As Construct { ?s ?p1 ?o1 ; ?p2 ?o2 ... } With ?s = uri(concat(pdd:,’csv-path/’,?rowId)) ?p1 = uri(concat(pdo:, ?headingName1)) ?o1 = plainLiteral(?1) ?p2 = ... http://sparqlify.org/
  • Raw Text Processing: ConTEXT ● No installation and configuration required. ● Access content from a variety of sources ● Instantly show the results of text analysis to users in a variety of visualizations. ● Allow refinement of automatic annotations and take feedback into account ● Provide a generic architecture where different modules for content acquisition, natural language processing and visualization can be plugged together. http://rdface.aksw.org/nlp/hub.php
  • Processing Raw Text: ConTEXT
  • Data Integration
  • Definition • In general, integration of multiple information systems aims at combining selected systems so that they form a unified new whole and give users the illusion of interacting with one single information system
  • Semantic Data Integration
  • Federated SPARQL Queries • Query processing involving multiple distributed data sources, e.g. Linked Open Data cloud DBpedia New York Times Query both data collections in an integrated way
  • Federated Query Processing Federation mediator at the server Virtual integration of (remote) data sources Communication via SPARQL protocol SPARQL Data Source SPARQL Data Source Federation Mediator SPARQL Data Source Query
  • Federated Query Engines Engine Name Implementation language License FedX Java GNU A.G.P.L SPLENDID Java L.G.P.L LHD Java MIT DARQ Java GPL ANAPSID Python GNU G.P.L ADERIS Java Apache
  • Data Visualization
  • LD Visualization Techniques
  • LD Visualization Techniques
  • LD Visualization Techniques
  • LD Visualization Techniques
  • Classification of Visualization Techniques
  • Comparison of Values/Attributeshttp://goo.gl/IvsGbU http://goo.gl/JeFhlM
  • Analysis of Relationships and Hierarchies
  • Analysis of Relationships and Hierarchies http://rhizomik.net/dbpedia/treemap.jsp http://lov.okfn.org/dataset/lov/
  • Analysis of Temporal and Geographical Events http://lov.okfn.org/dataset/lov/details/vocabulary_dcterms.html
  • Analysis of Multidimensional Data http://mbostock.github.io/protovis/ex/cars.html
  • Other Visualization Techniques
  • Applications of LD Visualization Techniques
  • Tool Types
  • Tool Types
  • CubeViz
  • Facete
  • Thank you Ivan Ermilov iermilov@informatik.uni-leipzig.de University of Leipzig FOR YOUR ATTENTION