E-Commerce and Graph-driven Applications: Experiences and Optimizations while moving to Linked Data
1. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 1
E-Commerce and Graph-driven Applications:
Experiences and Optimizations while
moving to Linked Data
Andreas Both, Head of Research and Development
UNISTER GmbH, Germany
2. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 2
Unister Group
e-commerce company
founded 2002
major B2C web portals in Germany (and Europe)
verticals: travel, flights, travel packages, retail, . . .
integrated business model
10 million unique users per month (Germany, AGOF)
increased number of employees
2003: 1
2015: 1600
3. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 2
Unister Group
e-commerce company
founded 2002
major B2C web portals in Germany (and Europe)
verticals: travel, flights, travel packages, retail, . . .
integrated business model
10 million unique users per month (Germany, AGOF)
increased number of employees
2003: 1
2015: 1600
4. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 3
Use Case
Agenda for e-commerce companies:
take advantage of linked data
unchain datastores from schema
Requirements:
fast
robust
scalable
→ Users: I want it all. I want it now.
5. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 3
Use Case
Agenda for e-commerce companies:
take advantage of linked data
unchain datastores from schema
Requirements:
fast
robust
scalable
→ Users: I want it all. I want it now.
6. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 3
Use Case
Agenda for e-commerce companies:
take advantage of linked data
unchain datastores from schema
Requirements:
fast
robust
scalable
→ Users: I want it all. I want it now.
7. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 4
Typical Data Structures and Queries
hierarchical (directed) region graph
hotels and regions might have many features
typical queries: select several features of hotels
8. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 5
Example Query
PREFIX uo : <http :// ontology . u n i s t e r . de/ ontology#>
PREFIX uor : <http :// ontology . u n i s t e r . de/ r e s o u r c e/>
PREFIX u o r f : <http :// ontology . u n i s t e r . de/ h o t e l / f a c i l i t y />
PREFIX uos : <http :// ontology . u n i s t e r . de/ skos/>
SELECT d i s t i n c t ? s {
? s a uo : Hotel ;
uo : hasFeature u o r f :56 ,
u o r f :18 ,
u o r f :21 ,
u o r f :210 ,
u o r f : 5 ,
u o r f :211 ,
u o r f :34 ,
u o r f : 1 7 ;
uo : l o c a t e d I n uor : Europe ;
uo : s u i t a b l e F o r uos : Family
} LIMIT 10;
9. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 6
Experiences: standard search process
A search for attributes
...1 very selective
...2 less selective
B pick a region
C sort the results
D limit the selection
Setting:
Dataset: 71600 Hotels, resources: 278,277, literal: 3,022,583
Virtuoso: version 7.1 (fast track1
), 824 MB, buffer size: 70,000
Experiments: 20 runs, charts show median
1
https://github.com/v7fasttrack/virtuoso-opensource/tree/feature/emergent
10. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 6
Experiences: standard search process
A search for attributes
...1 very selective
...2 less selective
B pick a region
C sort the results
D limit the selection
Setting:
Dataset: 71600 Hotels, resources: 278,277, literal: 3,022,583
Virtuoso: version 7.1 (fast track1
), 824 MB, buffer size: 70,000
Experiments: 20 runs, charts show median
1
https://github.com/v7fasttrack/virtuoso-opensource/tree/feature/emergent
11. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 6
Experiences: standard search process
A search for attributes
...1 very selective
...2 less selective
B pick a region
C sort the results
D limit the selection
Setting:
Dataset: 71600 Hotels, resources: 278,277, literal: 3,022,583
Virtuoso: version 7.1 (fast track1
), 824 MB, buffer size: 70,000
Experiments: 20 runs, charts show median
1
https://github.com/v7fasttrack/virtuoso-opensource/tree/feature/emergent
12. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 7
Requirements for Industrial Applicability (in e-commerce)
requirements for replacing
traditional databases:
fast: short response time
search query refinement
→ shorter response time
robust: similar answer times
easy to scale up
system resource efficient
→ requirements not fulfilled
13. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 7
Requirements for Industrial Applicability (in e-commerce)
requirements for replacing
traditional databases:
fast: short response time
search query refinement
→ shorter response time
robust: similar answer times
easy to scale up
system resource efficient
→ requirements not fulfilled
14. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 8
Example Query
PREFIX uo : <http :// ontology . u n i s t e r . de/ ontology#>
PREFIX uor : <http :// ontology . u n i s t e r . de/ r e s o u r c e/>
PREFIX uorf : <http :// ontology . u n i s t e r . de/ h o t e l / f a c i l i t y />
PREFIX uos : <http :// ontology . u n i s t e r . de/ skos/>
SELECT d i s t i n c t ? s {
? s a uo : Hotel ;
uo : hasFeature uorf :56 ,
uorf :18 ,
uorf :21 ,
uorf :210 ,
uorf : 5 ,
uorf :211 ,
uorf :34 ,
uorf : 1 7 ;
uo : l o c a t e d I n uor : Europe ;
uo : s u i t a b l e F o r uos : Family
} LIMIT 10;
15. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 9
Data Preparation
hotel entity p1 p2 p3 . . . pn
hotel1 0 0 1 . . . 0
hotel2 1 0 1 . . . 1
hotel3 1 1 1 . . . 0
hotel4 1 0 1 . . . 1
...
...
...
...
...
...
hotelm 0 0 1 . . . 0
BitSet representation of (hotel) properties:
p ˆ= 0010...0
Advantages:
no index
very small
operations in-memory
easy update
easy insert
16. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 9
Data Preparation
hotel entity p1 p2 p3 . . . pn
hotel1 0 0 1 . . . 0
hotel2 1 0 1 . . . 1
hotel3 1 1 1 . . . 0
hotel4 1 0 1 . . . 1
...
...
...
...
...
...
hotelm 0 0 1 . . . 0
BitSet representation of (hotel) properties:
p ˆ= 0010...0
Advantages:
no index
very small
operations in-memory
easy update
easy insert
17. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 9
Data Preparation
hotel entity p1 p2 p3 . . . pn
hotel1 0 0 1 . . . 0
hotel2 1 0 1 . . . 1
hotel3 1 1 1 . . . 0
hotel4 1 0 1 . . . 1
...
...
...
...
...
...
hotelm 0 0 1 . . . 0
BitSet representation of (hotel) properties:
p ˆ= 0010...0
Advantages:
no index
very small
operations in-memory
easy update
easy insert
18. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 10
Data Preparation
BitSet Setting, Virtuoso adaptions:
16507 stored properties → 63,109,198 B RAM used
Virtuoso: 824 MB → 706 MB
Virtuoso set-up update: buffer size=60000
19. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 11
Implemented Process: Virtuoso plugin
(with kind help of the Openlink team, GeoKnow Project2)
1 interpret bif:contains (workaround!)
2 request bitsets from memcache via JNI (workaround!)
3 compute hotels using bit operations on addressed bitsets
4 map hotel IDs to Virtuoso literal IDs (workaround!)
query IDs from Virtuoso via literal selection
requires special predicate for each hotel resource
5 return cursor on result set
2
This work has been supported by grants from the
European Union’s 7th Framework Programme provided
for the project GeoKnow (GA no. 318159)), c.f.,
http://geoknow.eu
20. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 11
Implemented Process: Virtuoso plugin
(with kind help of the Openlink team, GeoKnow Project2)
1 interpret bif:contains (workaround!)
2 request bitsets from memcache via JNI (workaround!)
3 compute hotels using bit operations on addressed bitsets
4 map hotel IDs to Virtuoso literal IDs (workaround!)
query IDs from Virtuoso via literal selection
requires special predicate for each hotel resource
5 return cursor on result set
2
This work has been supported by grants from the
European Union’s 7th Framework Programme provided
for the project GeoKnow (GA no. 318159)), c.f.,
http://geoknow.eu
21. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 11
Implemented Process: Virtuoso plugin
(with kind help of the Openlink team, GeoKnow Project2)
1 interpret bif:contains (workaround!)
2 request bitsets from memcache via JNI (workaround!)
3 compute hotels using bit operations on addressed bitsets
4 map hotel IDs to Virtuoso literal IDs (workaround!)
query IDs from Virtuoso via literal selection
requires special predicate for each hotel resource
5 return cursor on result set
2
This work has been supported by grants from the
European Union’s 7th Framework Programme provided
for the project GeoKnow (GA no. 318159)), c.f.,
http://geoknow.eu
22. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 12
Preliminary Results of A: properties in BitSets
Observations:
more complex →
less response time
stable response
times
warmup required
23. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 13
Preliminary Results of B: non-selective property in Virtuoso
Observations:
less selective
feature answered
within Virtuoso
has largest impact
on computation
time
24. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 14
Preliminary Results of C: order by
Observations:
sorting is not
done in BitSet,
but might be
possible to
implement in the
future
25. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 15
Preliminary Results D: limit 10
Observations:
limit is not done
in BitSet, but
might be possible
to implement in
the future
26. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16
Discussion
Summary:
proven good performance
query time is robust
very resource efficient
no schema required
→ if a star pattern is
recognizable, then use bitset
optimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
→ Virtuoso with full-text index
cluster)
27. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16
Discussion
Summary:
proven good performance
query time is robust
very resource efficient
no schema required
→ if a star pattern is
recognizable, then use bitset
optimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
→ Virtuoso with full-text index
cluster)
28. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16
Discussion
Summary:
proven good performance
query time is robust
very resource efficient
no schema required
→ if a star pattern is
recognizable, then use bitset
optimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
→ Virtuoso with full-text index
cluster)
29. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16
Discussion
Summary:
proven good performance
query time is robust
very resource efficient
no schema required
→ if a star pattern is
recognizable, then use bitset
optimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
→ Virtuoso with full-text index
cluster)
30. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16
Discussion
Summary:
proven good performance
query time is robust
very resource efficient
no schema required
→ if a star pattern is
recognizable, then use bitset
optimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
→ Virtuoso with full-text index
cluster)
31. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 17
Take Away Messages
e-commerce use case requires short and robust request times
BitSet-driven extension has proven its value
→ basic requirements of e-commerce scenario fulfilled
→ still flexible (schemaless), but performant
taking advantage of external data structures is possible (in
Virtuoso)
Dr. Andreas Both
Head of Research
and Development
Unister GmbH,
Leipzig, Germany
andreas.both@unister.de
+49 341 65050 24496
http://www.unister.de
32. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 17
Take Away Messages
e-commerce use case requires short and robust request times
BitSet-driven extension has proven its value
→ basic requirements of e-commerce scenario fulfilled
→ still flexible (schemaless), but performant
taking advantage of external data structures is possible (in
Virtuoso)
Dr. Andreas Both
Head of Research
and Development
Unister GmbH,
Leipzig, Germany
andreas.both@unister.de
+49 341 65050 24496
http://www.unister.de
33. Dr. Andreas Both, Head of R & D, Unister — LDBC, Barcelona, 2015-03-20 Slide 17
Take Away Messages
e-commerce use case requires short and robust request times
BitSet-driven extension has proven its value
→ basic requirements of e-commerce scenario fulfilled
→ still flexible (schemaless), but performant
taking advantage of external data structures is possible (in
Virtuoso)
Dr. Andreas Both
Head of Research
and Development
Unister GmbH,
Leipzig, Germany
andreas.both@unister.de
+49 341 65050 24496
http://www.unister.de