Web Science & Technologies

University of Koblenz ▪ Landau, Germany

Information-Rich Programming
in F#
with Semantic Data
Linked Open Data Cloud
Where’s the Data in
the Big Data Wave?
Gerhard Weikum
SIGMOD Blog, 6.3.2013
http://wp.sigmod.org/

...
Some „Bubbles“ of the LOD Cloud

WeST

Steffen Staab
staab@uni-koblenz.de

3
RDF: Simple Foundations

WeST

Steffen Staab
staab@uni-koblenz.de

4
Example RDF Graph

Native Graph
OR
R2RML: RDB to RDF Mapping Language
(W3C rec)
WeST

Steffen Staab
staab@uni-koblenz.de

...
Agenda

LiteQ – Language integrated types,
extensions and queries for RDF graphs
 Exploring
 Programming, Typing
Evaluat...
Programming against unknown data source

Exploring a
data source

WeST

Steffen Staab
staab@uni-koblenz.de

Using a data
s...
Example application

• Goal: Application that helps to collect dog license fee
• Send Email reminders to dog owners

• Dat...
Programmer‘s Task 1: Schema Exploration

Schema exploration & Identification of important RDF types
• Find RDF types repre...
Naive Approach Task 1: Schema Exploration

Schema exploration & Identification of important RDF types
• Find RDF types rep...
Programmer‘s Task 2: Code Type Creation

Code type creation in host language
• Convert the identified dog and person RDF t...
Programmer‘s Task 3: Data querying

Data querying
• Write a query that returns all dog owners

WeST

Steffen Staab
staab@u...
Naive Approach Task 3: Data querying

Data querying
• Write a query that returns all dog owners

Tooling for Naive Approac...
Naive Approach Task 4: Object manipulation

Create the objects, manipulate them & make them persistent
• Develop functiona...
The LITEQ approach

WeST

Steffen Staab
staab@uni-koblenz.de

15
Node Path Query Language

WeST

Steffen Staab
staab@uni-koblenz.de

16
Graph Traversal with NPQL: Subtype Navigation >
NPQL

rdf:Resource > ex:Creature

WeST

Steffen Staab
staab@uni-koblenz.de...
Graph Traversal with NPQL: Property Navigation .
NPQL

ex:Dog . ex:hasOwner

WeST

Steffen Staab
staab@uni-koblenz.de

18
Extensional Semantics: Task 3 – Querying for Owners
NPQL

rdf:Resource > ex:Dog
ex:Creature > ex:Dog . ex:hasOwner
-> Exte...
Intensional Semantics: Task 2 - Creating Person Code Type
NPQL

rdf:Resource > ex:Creature > ex:Dog.hasOwner ->
Intension
...
Autocompletion Semantics: Task 1 - Exploration
NPQL

rdf:Resource > ex:Creature >
Suggestions during query writing
• Insta...
Extensional Semantics: LA Conjunctive Queries
NPQL

ex:Dog <- ex:hasOwner
Left associative
conjunctive query
with projecti...
Host Language Extension: Task 4 – Create Objects

Create the objects, manipulation & persistence
• Develop the functionali...
Web Science & Technologies

University of Koblenz ▪ Landau, Germany

Live demo of LITEQ in Visual Studio/F#
Related Work
Task

LINQ

XML Freebase
Type
Type
Provider Provider

LITEQ
current
version

LITEQ
Concept

1 Schema
explorat...
Future work wrt LITEQ

• Current implementation is a prototype
• Current implementation uses erased types
 At runtime, no...
Challenge: Joint Type Inference

Data modeling world
Description Logics

Program modeling world
ML type inference

RDF

UM...
Agenda

LiteQ – Language integrated types, extensions
and queries for RDF graphs
 Exploring
 Programming, Typing
Evaluat...
Preliminary Evaluation of LITEQ/NPQL

Focused on NPQL
• Reason:
Test subjects lacked knowledge of F# and functional
progra...
Evaluation Subjects

Evaluation with 11 participants
• 1 subject a posteriori eliminated from analysis of evaluation,
beca...
Evaluation - Setup

1. Pre-questionaire
1. Training in RDF, SPARQL & NPQL
1. Experimental tasks to be solved by subjects
1...
Phase 1: Pre-Questionnaire – Knowledge & skills

• Programming:
All
• Object-orientation:
8
• Functional programming:

 “...
Phase 2: Training in RDF, SPARQL, NPQL

Training in RDF & SPARQL
• Presentation of RDF & SPARQL (20 minutes)
• Practical e...
Phase 3: Solving experimental tasks by subjects

9 different experimental tasks to solve
• Half of tasks in NPQL using Vis...
Evaluation across different user types

WeST

Steffen Staab
staab@uni-koblenz.de

36
Evaluations per Task

WeST

Steffen Staab
staab@uni-koblenz.de

37
Phase 4: Post-Questionnaire
“Do you want to explore a data source in your IDE?”
4 yes”
3 no, prefer separation of st...
Agenda

LiteQ – Language integrated types, extensions
and queries for RDF graphs
 Exploring
 Programming, Typing
Evaluat...
Searching the LOD cloud

SELECT ?x
foaf:Document
WHERE {
?x rdf:type foaf:Document .
?x rdf:type swrc:InProceedings .
?x d...
Searching the LOD cloud
SELECT ?x
WHERE {
?x rdf:type foaf:Document .
?x rdf:type swrc:InProceedings .
?x dc:creator ?y .
...
Schema-level index

Schema information on LOD

Explicit

Implicit

Assigning class types

Modelling attributes

Class
rdf:...
Schema-level index

C1

C3

C2
P1

DS1

C1

P2
C3

C2
P1

E1
P2

WeST

E2

XYZ

Steffen Staab
staab@uni-koblenz.de

DS1

4...
Typecluster

 Entities with the same Set of types

C1

C2

...

Cn

...

DSm

TCj

DS1

WeST

Steffen Staab
staab@uni-kob...
Typecluster: Example

foaf:Document

swrc:InProceedings

tc2309

DBLP

WeST

Steffen Staab
staab@uni-koblenz.de

ACM

45
Bi-Simulation

 Entities are equivalent, if they refer with the same
attributes to equivalent entities
 Restriction: 1-B...
Bi-Simulation: Example

dc:creator

bs2608

BBC

WeST

Steffen Staab
staab@uni-koblenz.de

DBLP

47
SchemEX: Combination TC and Bi-Simulation

 Partition of TC based on 1-Bi-Simulation with
restrictions on the destination...
SchemEX: Example
foaf:Document

swrc:InProceedings

fb:Computer_Scientist

tc2309

tc2101
bs260
8

eqc707

DBLP

WeST

Ste...
SchemEX: Computation

 Precise computation: Brute-Force

Schema

C1

Payload

...

Cn

C12

C2

TCj

...

Cn„

TCk
BSi

E...
Stream-based Computation of SchemEX

 LOD Crawler: Stream of n-Quads (triple + data source)
… Q16, Q15, Q14, Q13, Q12, Q1...
Quality of Approximated Index

Stream-based computation vs. brute force
Data set of 11 Mio. tripel

WeST

Steffen Staab
st...
SchemEX @ BTC 2011

SchemEX
Allows complex queries (Star, Chain)
Scalable computation
High quality

Index over BTC 2011 da...
Future work wrt SchemEX

Further exploration of
• schema induction
• query federation
Federation vs Link Traversal based q...
Agenda

LiteQ – Language integrated types, extensions
and queries for RDF graphs
 Exploring
 Programming, Typing
Evaluat...
Future

1.
2.
3.
4.

Searching for distributed data
Understanding distributed data
Intelligent queries on distributed data...
Web Science & Technologies

University of Koblenz ▪ Landau, Germany

Thank you for your attention!
Upcoming SlideShare
Loading in …5
×

Information-Rich Programming in F# with Semantic Data

1,802
-1

Published on

Programming with rich data frequently implies that one
needs to search for, understand, integrate and program with
new data - with each of these steps constituting a major
obstacle to successful data use.

In this talk we will explain and demonstrate how our approach,
LITEQ - Language Integrated Types, Extensions and Queries for
RDF Graphs, which is realized as part of the F# / Visual Studio-
environment, supports the software developer. Using the extended
IDE the developer may now

a. explore new, previously unseen data sources,
which are either natively in RDF or mapped into RDF;
b. use the exploration of schemata and data in order to
construct types and objects in the F# environment;
c. automatically map between data and programming language objects in
order to make them persistent in the data source;
d. have extended typing functionality added to the F#
environment and resulting from the exploration of the data source
and its mapping into F#.

Core to this approach is the novel node path query language, NPQL,
that allows for interactive, intuitive exploration of data schemata and
data proper as well as for the mapping and definition
of types, object collections and individual objects.
Beyond the existing type provider mechanism for F#
our approach also allows for property-based navigation
and runtime querying for data objects.

Published in: Technology, Education
2 Comments
6 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,802
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
2
Likes
6
Embeds 0
No embeds

No notes for slide

Information-Rich Programming in F# with Semantic Data

  1. 1. Web Science & Technologies University of Koblenz ▪ Landau, Germany Information-Rich Programming in F# with Semantic Data
  2. 2. Linked Open Data Cloud Where’s the Data in the Big Data Wave? Gerhard Weikum SIGMOD Blog, 6.3.2013 http://wp.sigmod.org/ … the Web of Linked Data consisting of more than 30 Billion RDF triples from hundreds of data sources … WeST Steffen Staab staab@uni-koblenz.de 2
  3. 3. Some „Bubbles“ of the LOD Cloud WeST Steffen Staab staab@uni-koblenz.de 3
  4. 4. RDF: Simple Foundations WeST Steffen Staab staab@uni-koblenz.de 4
  5. 5. Example RDF Graph Native Graph OR R2RML: RDB to RDF Mapping Language (W3C rec) WeST Steffen Staab staab@uni-koblenz.de 5
  6. 6. Agenda LiteQ – Language integrated types, extensions and queries for RDF graphs  Exploring  Programming, Typing Evaluation of LITEQ (NPQL) against SPARQL Understandability Ease of use SchemEX Construction of schema-based index Schema induction WeST Steffen Staab staab@uni-koblenz.de 6
  7. 7. Programming against unknown data source Exploring a data source WeST Steffen Staab staab@uni-koblenz.de Using a data source 7
  8. 8. Example application • Goal: Application that helps to collect dog license fee • Send Email reminders to dog owners • Data is given as RDF graph WeST Steffen Staab staab@uni-koblenz.de 8
  9. 9. Programmer‘s Task 1: Schema Exploration Schema exploration & Identification of important RDF types • Find RDF types representing dogs and persons WeST Steffen Staab staab@uni-koblenz.de 9
  10. 10. Naive Approach Task 1: Schema Exploration Schema exploration & Identification of important RDF types • Find RDF types representing dogs and persons Tooling for Naïve Approach: SPARQL Query Formulation WeST Steffen Staab staab@uni-koblenz.de 10
  11. 11. Programmer‘s Task 2: Code Type Creation Code type creation in host language • Convert the identified dog and person RDF types to code types in the host language type exCreature(uri) = class member this.hasName : String = … Member this.hasAge : int = … end type exDog(uri) = class inherit exCreature(uri) member this.hasOwner : exPerson = … member this.TaxNo : Integer = … end type exPerson(uri) = class inherit exCreature(uri) end WeST Steffen Staab staab@uni-koblenz.de 11
  12. 12. Programmer‘s Task 3: Data querying Data querying • Write a query that returns all dog owners WeST Steffen Staab staab@uni-koblenz.de 12
  13. 13. Naive Approach Task 3: Data querying Data querying • Write a query that returns all dog owners Tooling for Naive Approach: SPARQL Query formulation WeST Steffen Staab staab@uni-koblenz.de 13
  14. 14. Naive Approach Task 4: Object manipulation Create the objects, manipulate them & make them persistent • Develop functionality around query to send reminder let queryString = “SELECT ?owner WHERE { ?dog rdf:type exDog. ?dog ex:hasOwner ?owner }“ dbConnection.evaluate(queryString) |> Seq.iter ( fun uri -> let p = new Person(uri) sendReminderEmail(p) ) WeST Steffen Staab staab@uni-koblenz.de 14
  15. 15. The LITEQ approach WeST Steffen Staab staab@uni-koblenz.de 15
  16. 16. Node Path Query Language WeST Steffen Staab staab@uni-koblenz.de 16
  17. 17. Graph Traversal with NPQL: Subtype Navigation > NPQL rdf:Resource > ex:Creature WeST Steffen Staab staab@uni-koblenz.de 17
  18. 18. Graph Traversal with NPQL: Property Navigation . NPQL ex:Dog . ex:hasOwner WeST Steffen Staab staab@uni-koblenz.de 18
  19. 19. Extensional Semantics: Task 3 – Querying for Owners NPQL rdf:Resource > ex:Dog ex:Creature > ex:Dog . ex:hasOwner -> Extension • Select ex:Dog • Walk through ex:hasOwner to ex:Person • Use extension to retrieve all persons who own dogs: ex:Bob WeST Steffen Staab staab@uni-koblenz.de 19
  20. 20. Intensional Semantics: Task 2 - Creating Person Code Type NPQL rdf:Resource > ex:Creature > ex:Dog.hasOwner -> Intension • Select ex:Person node • “Intension” to get code type based on rdf type type exCreature(uri) = class member this.hasName : String = … Member this.hasAge : int = … end type exPerson(uri) = class inherit exCreature(uri) WeST Steffen Staab end staab@uni-koblenz.de 20
  21. 21. Autocompletion Semantics: Task 1 - Exploration NPQL rdf:Resource > ex:Creature > Suggestions during query writing • Instances based on extensional semantics • Types & Props based on intensional semantics ex:Person, ex:Dog WeST Steffen Staab staab@uni-koblenz.de 21
  22. 22. Extensional Semantics: LA Conjunctive Queries NPQL ex:Dog <- ex:hasOwner Left associative conjunctive query with projection WeST Steffen Staab staab@uni-koblenz.de 22
  23. 23. Host Language Extension: Task 4 – Create Objects Create the objects, manipulation & persistence • Develop the functionality around the query that will send the reminder using LITEQ in F# Preliminary Implementation in F# http://west.uni-koblenz.de/Research/systems/liteq WeST Steffen Staab staab@uni-koblenz.de 23
  24. 24. Web Science & Technologies University of Koblenz ▪ Landau, Germany Live demo of LITEQ in Visual Studio/F#
  25. 25. Related Work Task LINQ XML Freebase Type Type Provider Provider LITEQ current version LITEQ Concept 1 Schema exploration - (✔) per doc (✔) only trees ✔ ✔ 2 Code type creation - (✔) erased types? (✔) erased types (✔) erased types ✔ full hierarchy ✔ - ((✔)) very limited expressiv. (✔) limited expressiv. ✔ no full SPARQL (✔) ✔ - ✔ no new object creation ✔ 3 Data querying 4 Object manipulation & persistence WeST Steffen Staab staab@uni-koblenz.de 26
  26. 26. Future work wrt LITEQ • Current implementation is a prototype • Current implementation uses erased types  At runtime, no type hierarchy is present • Switch to generated types in the future  Higher expressiveness in the host language exploiting type hierarchy • Optimizations of LITEQ implementation necessary • Lazy evaluation • Distinguish between design time and runtime • Not all types created at design time are needed at runtime • Formalize query language and investigate expressiveness WeST Steffen Staab staab@uni-koblenz.de 27
  27. 27. Challenge: Joint Type Inference Data modeling world Description Logics Program modeling world ML type inference RDF UML class diagrams WeST Steffen Staab staab@uni-koblenz.de 28
  28. 28. Agenda LiteQ – Language integrated types, extensions and queries for RDF graphs  Exploring  Programming, Typing Evaluation of LITEQ (NPQL) vs. SPARQL Understandability Ease of use SchemEX Where do I find relevant data? Efficient construction of a schema-level index WeST Steffen Staab staab@uni-koblenz.de 29
  29. 29. Preliminary Evaluation of LITEQ/NPQL Focused on NPQL • Reason: Test subjects lacked knowledge of F# and functional programming for evaluating LITEQ in full • Comparing NPQL against SPARQL Main Hypothesis of Evaluation • NPQL with autocompletion allows for effective query writing in more efficient manner than SPARQL Thus: some of the advantages of LITEQ cannot show up in the evaluation! WeST Steffen Staab staab@uni-koblenz.de 30
  30. 30. Evaluation Subjects Evaluation with 11 participants • 1 subject a posteriori eliminated from analysis of evaluation, because he could not deal with SPARQL at all! • 10 subjects remaining for analysis Participants • Undergraduate students • PhD students • PostDocs WeST Steffen Staab staab@uni-koblenz.de 31
  31. 31. Evaluation - Setup 1. Pre-questionaire 1. Training in RDF, SPARQL & NPQL 1. Experimental tasks to be solved by subjects 1. Post-questionaire WeST Steffen Staab staab@uni-koblenz.de 32
  32. 32. Phase 1: Pre-Questionnaire – Knowledge & skills • Programming: All • Object-orientation: 8 • Functional programming:  “Intermediate” or above  “Intermediate” or above 4 Intermediate” or above Lisp, Haskell, F# (once) 4 none” • .NET 1 Expert” 2 Beginner” 7 none” • SPARQL: 3 Intermediate” or above 7 below “intermediate” WeST Steffen Staab staab@uni-koblenz.de [Sparql Experts] [Sparql Novices] 33
  33. 33. Phase 2: Training in RDF, SPARQL, NPQL Training in RDF & SPARQL • Presentation of RDF & SPARQL (20 minutes) • Practical excercise writing SPARQL queries in the Web interface (5 minutes) Training in NPQL • Practical excercise writing NPQL queries in Visual Studio (5 minutes) WeST Steffen Staab staab@uni-koblenz.de 34
  34. 34. Phase 3: Solving experimental tasks by subjects 9 different experimental tasks to solve • Half of tasks in NPQL using Visual Studio • Other half using SPARQL and a web interface Task types • Navigation and exploration of a data source (Task 1) • Retrieving and answering questions about the data (Task 3) • 2 tasks were not solvable in NPQL • Investigating how users deal with limits of the language Evaluation measure: • Duration to complete each task WeST Steffen Staab staab@uni-koblenz.de 35
  35. 35. Evaluation across different user types WeST Steffen Staab staab@uni-koblenz.de 36
  36. 36. Evaluations per Task WeST Steffen Staab staab@uni-koblenz.de 37
  37. 37. Phase 4: Post-Questionnaire “Do you want to explore a data source in your IDE?” 4 yes” 3 no, prefer separation of steps” 3 no preference” “NPQL is easier to use than SPARQL” 7 agree” or above My conclusion Other Though LITEQ is still in a pre-alpha status, • Better supportadvantages queries in SPARQL when writing became visible in times for interactive working with • Better responsepreliminary user evaluation NPQL WeST Steffen Staab staab@uni-koblenz.de 38
  38. 38. Agenda LiteQ – Language integrated types, extensions and queries for RDF graphs  Exploring  Programming, Typing Evaluation of LITEQ (NPQL) against SPARQL Understandability Ease of use SchemEX Construction of schema-based index Schema induction WeST Steffen Staab staab@uni-koblenz.de 39
  39. 39. Searching the LOD cloud SELECT ?x foaf:Document WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . x ?y rdf:type fb:Computer_Scientist } ? WeST Steffen Staab staab@uni-koblenz.de 40 swrc:InProceedings fb:Computer_Scientist dc:creator
  40. 40. Searching the LOD cloud SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist } Index WeST Steffen Staab staab@uni-koblenz.de 41 • ACM • DBLP
  41. 41. Schema-level index Schema information on LOD Explicit Implicit Assigning class types Modelling attributes Class rdf:type Property Entity 2 Entity Entity WeST Steffen Staab staab@uni-koblenz.de 42
  42. 42. Schema-level index C1 C3 C2 P1 DS1 C1 P2 C3 C2 P1 E1 P2 WeST E2 XYZ Steffen Staab staab@uni-koblenz.de DS1 43
  43. 43. Typecluster  Entities with the same Set of types C1 C2 ... Cn ... DSm TCj DS1 WeST Steffen Staab staab@uni-koblenz.de DS2 44
  44. 44. Typecluster: Example foaf:Document swrc:InProceedings tc2309 DBLP WeST Steffen Staab staab@uni-koblenz.de ACM 45
  45. 45. Bi-Simulation  Entities are equivalent, if they refer with the same attributes to equivalent entities  Restriction: 1-Bi-Simulation P1 P2 ... Pn ... DSm BSi DS1 WeST Steffen Staab staab@uni-koblenz.de DS2 46
  46. 46. Bi-Simulation: Example dc:creator bs2608 BBC WeST Steffen Staab staab@uni-koblenz.de DBLP 47
  47. 47. SchemEX: Combination TC and Bi-Simulation  Partition of TC based on 1-Bi-Simulation with restrictions on the destination TC Schema C1 Payload ... Cn C45 C2 TCj ... Cn„ TCk BSi EQC WeST C2 DS1 EQCj DS2 P1 P2 ... Pn ... DSm Steffen Staab staab@uni-koblenz.de EQC DS 48
  48. 48. SchemEX: Example foaf:Document swrc:InProceedings fb:Computer_Scientist tc2309 tc2101 bs260 8 eqc707 DBLP WeST Steffen Staab staab@uni-koblenz.de dc:creator ... SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings ?x dc:creator ?y . ?y rdf:type fb:Computer_Scient } 49
  49. 49. SchemEX: Computation  Precise computation: Brute-Force Schema C1 Payload ... Cn C12 C2 TCj ... Cn„ TCk BSi EQC WeST C2 DS1 EQCj DS2 P1 P2 ... Pn ... DSm Steffen Staab staab@uni-koblenz.de EQC DS 50
  50. 50. Stream-based Computation of SchemEX  LOD Crawler: Stream of n-Quads (triple + data source) … Q16, Q15, Q14, Q13, Q12, Q11, Q10, Q9, Q8, Q7, Q6, Q5, Q4, Q3, Q2, Q1 FiFo 1 C3 4 C2 3 6 4 2 C2 2 3 1 5 C1 WeST Steffen Staab staab@uni-koblenz.de 51
  51. 51. Quality of Approximated Index Stream-based computation vs. brute force Data set of 11 Mio. tripel WeST Steffen Staab staab@uni-koblenz.de 52
  52. 52. SchemEX @ BTC 2011 SchemEX Allows complex queries (Star, Chain) Scalable computation High quality Index over BTC 2011 data 2.17 billion tripel Index: 55 million tripel Commodity hardware VM: 1 Core, 4 GB RAM Throughput: 39.500 tripel / second Computation of full index: 15h WeST Steffen Staab staab@uni-koblenz.de 53
  53. 53. Future work wrt SchemEX Further exploration of • schema induction • query federation Federation vs Link Traversal based query execution • Granularity of query execution • Too fine grained: URI dereferencing • Too expressive: SPARQL • Sweet spot -> NPQL?? WeST Steffen Staab staab@uni-koblenz.de 54
  54. 54. Agenda LiteQ – Language integrated types, extensions and queries for RDF graphs  Exploring  Programming, Typing Evaluation of LITEQ (NPQL) against SPARQL Understandability Ease of use SchemEX Construction of schema-based index Schema induction WeST Steffen Staab staab@uni-koblenz.de 55
  55. 55. Future 1. 2. 3. 4. Searching for distributed data Understanding distributed data Intelligent queries on distributed data Programming with distributed data • Type reuse • Type induction WeST Steffen Staab staab@uni-koblenz.de 56
  56. 56. Web Science & Technologies University of Koblenz ▪ Landau, Germany Thank you for your attention!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×