Data Integration And Visualization
Upcoming SlideShare
Loading in...5

Data Integration And Visualization



Using RDF

Using RDF



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as OpenOffice

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Example scenario DBpedia and New York Times collections DBpedia as structured knowledge base New York Times as a news provider

Data Integration And Visualization Data Integration And Visualization Presentation Transcript

  • Data integration and visualization Ivan Ermilov University of Leipzig USING RDF
  • Agenda • Data discovery • Data conversion • Data integration
  • Linked Data Lifecycle
  • Data Discovery • Ontologies • Vocabularies • Documents
  • Data Discovery: Ontologies Specification of a conceptualization
  • Data Discovery: Ontologies
  • Data Discovery: Ontologies
  • Data Discovery: Vocabularies FOAF – Friend of a Friend: • A Semantic Web Vocabulary used to describe people, their activities and their relationships between one another. • It is becoming very popular for people who discover this to setup and have their own FOAF profile. • This vocabulary is the base from which other vocabularies are extended.
  • Data Discovery: Vocabularies
  • Data Discovery: Vocabularies
  • Data Discovery: Vocabularies
  • Data Discovery: Documents <> <> "Tim Berners-Lee - LinkedIn"@en . _:node0 <> <> . _:node0 <> "Greater Boston Area" . <> <> <> . _:node1 <> <> . _:node1 <> "MIT" . _:node1 <> "Director, World Wide Web ConsortiumnnAlso, part time Prof in ECS at Southampton University, UK" .
  • Data Discovery: Documents
  • Data Catalogs • Community maintained registry exists • Contains 362 data catalogs (growing) • Based on CKAN data catalog platform
  • Data Catalogs
  • What is CKAN? • Metadata repository with crowd-sourcing enabled • Everybody can register and publish data about their datasets • Developer-friendly web application • Provides a well-documented API • Easy to install, easy to use as your own metadata repository
  • CKAN Architecture Packages Resources contain And you can search for them
  • The Data Hub
  • The Data Hub
  • Hub of Data
  • Hub of Data
  • CKAN API • Well-documented • • Covers everything you can do with the web interface • You can write your own web interface • OKFN maintained library for accessing API • ckanclient (python)
  • CKAN API: Methods • Retrieving data • Creating new data • Update existing data • Delete existing data • Data is: packages, resources, groups, tags, users etc.
  • CKAN API: Examples ckan = CkanClient(base_location=ckan_api_url, api_key=ckan_api_key) package_list = ckan.package_list() formats = [] for package in package_list: resource_list = package[‘resources’] for resource in resource_list: if(not resource['format'] in formats): formats.append(resource['format']) return sorted(formats)
  • Use Case: CSV2RDF Conversion • Framework for CSV2RDF conversion • Crowd-sourcing enabled • RDF Visualizations
  • CSV2RDF Conversion: Why CSV?
  • CSV2RDF Conversion: Data Quality
  • Data conversion
  • Data Conversion • Structured: Relational Databases • Semi structured: XML, HTML, XLS, CSV, APIs • Unstructured: Raw text Statistics
  • XML RDB Spreadsheet ? How does government spending in certain sectors relates to my company’s earnings? How does the historic spending relates to the current figures? Give me report about all of my customers across the whole organization Data Conversion
  • Custom scripts XML RDB Spreadsheet ? Data Conversion XPath SQL Result aggregation
  • Merging data with RDF XML RDB Spreadsheet Once in RDF:  Easily integrate your data  Concepts can be mapped to one another  Query everything with one W3C standard language (SPARQL)
  • Merging Data with RDF: Example • Blue App has model
  • • Red App has model • Need to integrate Red & Blue models Merging Data with RDF: Example
  • • Step 1: Merge RDF • Same nodes (URIs) join automatically Merging Data with RDF: Example
  • • Step 2: Add relationships and rules • (Relationships are also RDF) Merging Data with RDF: Example
  • • Step 3: Define Green model • (Making use of Red • & Blue models) Merging Data with RDF: Example
  • • What the Blue app sees: • No difference! Merging Data with RDF: Example
  • • What the Red app sees • No difference! Merging Data with RDF: Example
  • RDF helps bridge other formats/models • Producers and consumers may use different formats/models • Rules can specify transformations • Inference engine finds path to desired result model RDF Model Transform A1 A2 A3 B1 B2 C1 C2 X Y Z Ontologies & Rules Ontologies & Rules Ontologies & Rules
  • Extract, Transform, Load (ETL)
  • Automatic Mapping
  • Semi-Automatic Mapping
  • R2RML
  • Sparqlify: Examples
  • Sparqlify: Examples
  • Sparqlify: Examples
  • Sparqlify: Examples
  • Sparqlify: Examples
  • Sparqlify: CSV2RDF Prefix pdd: <> Prefix pdo: <> Create View Template DefaultMapping As Construct { ?s ?p1 ?o1 ; ?p2 ?o2 ... } With ?s = uri(concat(pdd:,’csv-path/’,?rowId)) ?p1 = uri(concat(pdo:, ?headingName1)) ?o1 = plainLiteral(?1) ?p2 = ...
  • Raw Text Processing: ConTEXT ● No installation and configuration required. ● Access content from a variety of sources ● Instantly show the results of text analysis to users in a variety of visualizations. ● Allow refinement of automatic annotations and take feedback into account ● Provide a generic architecture where different modules for content acquisition, natural language processing and visualization can be plugged together.
  • Processing Raw Text: ConTEXT
  • Data Integration
  • Definition • In general, integration of multiple information systems aims at combining selected systems so that they form a unified new whole and give users the illusion of interacting with one single information system
  • Semantic Data Integration
  • Federated SPARQL Queries • Query processing involving multiple distributed data sources, e.g. Linked Open Data cloud DBpedia New York Times Query both data collections in an integrated way
  • Federated Query Processing Federation mediator at the server Virtual integration of (remote) data sources Communication via SPARQL protocol SPARQL Data Source SPARQL Data Source Federation Mediator SPARQL Data Source Query
  • Federated Query Engines Engine Name Implementation language License FedX Java GNU A.G.P.L SPLENDID Java L.G.P.L LHD Java MIT DARQ Java GPL ANAPSID Python GNU G.P.L ADERIS Java Apache
  • Data Visualization
  • LD Visualization Techniques
  • LD Visualization Techniques
  • LD Visualization Techniques
  • LD Visualization Techniques
  • Classification of Visualization Techniques
  • Comparison of Values/Attributes
  • Analysis of Relationships and Hierarchies
  • Analysis of Relationships and Hierarchies
  • Analysis of Temporal and Geographical Events
  • Analysis of Multidimensional Data
  • Other Visualization Techniques
  • Applications of LD Visualization Techniques
  • Tool Types
  • Tool Types
  • CubeViz
  • Facete
  • Thank you Ivan Ermilov University of Leipzig FOR YOUR ATTENTION