Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva

Efﬁcient Distributed
In-Memory Processing of
RDF Datasets
Gezim Sejdiu
PhD Colloquium, Bonn 29.09.2020
Supervisor: Prof. Dr. Jens Lehmann

Introduction
Large-Scale RDF Dataset Statistics
Quality Assessment of RDF Datasets at Scale
Scalable RDF Querying
Use Cases and Applications
Conclusion & Future Directions
Outline
2

No single deﬁnition
Extremely large data sets that may be analysed computationally to
reveal patterns, trends, and associations, especially relating to human
behaviour and interactions
Big data is a term for data sets that are so large or complex that
traditional data processing application softwares are inadequate to deal
with them
What is Big Data?
4

It’s relevance is increasing drastically and Big Data Analytics is an
emerging ﬁeld to explore
Why ‘BigData’ is so important?
5
https://trends.google.com/trends/explore?date=all&q=%22big%20data%22

Big Data Europe (BDE) Platform
8https://github.com/big-data-europe
Support Layer
Init Daemon
GUIs
Monitor
App Layer
Traffic
Forecast
Satellite Image
Analysis
Platform Layer
Spark Flink Semantic Layer
Ontario SANSA Semagrow
Kafka
Real-time Stream
Monitoring
...
...
Resource Management Layer (Swarm)
Hardware Layer
Premises Cloud (AWS, GCP, MS Azure, …)
Data Layer
Hadoop NOSQL Store CassandraElasticsearch ...RDF Store

Fast and generic-purpose cluster computing engine
Apache Spark
9
Spark Core Engine (RDD)
Deploy
SparkSQL&
DataFrames
CoreAPIs&
Libraries
SparkStreaming
Local
Single
JVM
Cluster
(Standalone,
Mesos, YARN)
Containers
docker-comp
ose
MLlib
MachineLearning
GraphX
Graphprocessing
Allows for massive parallel processing of
collections of records
- RDD - Resilient Distributed Dataset
- DataFrame - Conceptually a table
- Dataset - Uniﬁed access to data as objects
and/or tables

Heterogeneity aka Variety
Key Observation From BDE
10
Banking
Finance
Our
Known
History
PurchaseEntertain
Gaming
Social
Media
VISA
CHASE
SAP
IBM
NORDSTROM
Amazon
LOWES
NETFLIX
HULU
NFb NETWORK
Zynga
XBOX 360
Facebook
Pinterest
Twitter
Customer

Modelling entities and their relationships
The RDF (Resource Description Framework) model
Knowledge Graphs
11
DPDHL Deutsche Post DHL Group
full name
Logistics
industry
Logistik
label
PostTower
headquarters
Bonn
located in

Modelling entities and their relationships
Analysis: ﬁnding underlying structure of the graph e.g. to predict
unknown relationships
Examples: Google Knowledge Graph, DBpedia, Facebook, YAGO,
Twitter, LinkedIn, MS Academic Graph, IBM Graph, WikiData
Knowledge Graphs
12

Knowledge Graphs are everywhere
13
Entity Search and Summarization
Discovering Related Entities

Tasks that are hard to solve on single machines (>1 TB memory
consumption):
- Querying and processing LinkedGeoData
- Dataset statistics and quality assessment of the LOD Cloud
- Vandalism and outlier detection in DBpedia
- Inference on life science data (e.g. UniProt, EggNOG, StringDB)
- Clustering of DBpedia data
- Large-scale enrichment and link prediction for e.g. DBpedia →
LinkedGeoData
Why Distributed RDF Data Processing?
14

Main Research Question
Is it possible to process large-scale RDF
datasets efﬁciently and effectively?
15

RQ1: How can we efﬁciently explore the structure of large-scale RDF
datasets?
RQ2: Can we scale RDF dataset quality assessment horizontally?
RQ3: Can distributed RDF datasets be queried efﬁciently and
effectively?
Research Questions
16

RC1: A Scalable Distributed Approach for Computation of RDF Dataset
Statistics
RC2: A Scalable Framework for Quality Assessment of RDF Datasets
RC3: A Scalable Framework for SPARQL Evaluation of Large RDF Data
Contributions
17

SANSA
Scalable Semantic Analytics Stack
18

SANSA [1] is a processing data flow engine that provides data
distribution, and fault tolerance for distributed computation over
large-scale RDF datasets
SANSA includes several libraries:
- Read / Write RDF / OWL library
- Querying library
- Inference library
- ML library
SANSA
19
BigDataEurope
Inference
Knowledge Distribution &
Representation
DeployCoreAPIs&Libraries
Local Cluster
Standalone Resource manager
Querying
Machine Learning

datasets?
effectively?
Research Questions
20

Large-Scale RDF Dataset
Statistics
A Scalable Distributed Approach for
Computation of RDF Dataset Statistics [2]
21

Obtaining an overview over the Web of Data, it is important to gather
statistical information describing characteristics of the internal
structure of datasets
This process is both data-intensive and computing-intensive and it is a
challenge to develop fast and efﬁcient algorithms that can handle large
scale RDF datasets
There are no approaches for RDF that computes those statistical criteria
and scales to large data sets
Motivation
22

A statistical criterion C is a triple C = (F, D, P), where:
- F is a SPARQL ﬁlter condition
- D is a derived dataset from the main dataset (RDD of triples) after
applying F
- P is a post-processing operation on the data structure D
RDDs are in-memory collections of records that can be operated in
parallel on large clusters
- We use RDDs to represent RDF triples
Approach
23

Experimental Setup
- Cluster conﬁguration
- 6 machines (1 master, 5 workers): Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
(32 Cores), 128 GB RAM, 12 TB SATA RAID-5
- Spark-2.2.0, Hadoop 2.8.0, Scala 2.11.11 and Java 8
- Datasets (all in nt format)
Evaluation
25
DBpedia BSBM
LinkedGeoData en de fr 2GB 20GB 200GB
#nr. of triples 1,292,933,812 812,545,486 336,714,883 340,849,556 8,289,484 81,980,472 817,774,057
size (GB) 191.17 114.4 48.6 49.77 2 20 200

Distributed Processing on Large-Scale Datasets
* e) = d) / c) - 1
Evaluation
26
Runtime (in hours)
LODStats DistLODStats
a) files
b)
bigfile c) local d) cluster
e) speedup
ratio
LinkedGeoData n/a n/a 36.65 4.37 7.4x
DBpedia_en 24.63 fail 25.34 2.97 7.6x
DBpedia_de n/a n/a 10.34 1.2 7.3x
DBpedia_fr n/a n/a 10.49 1.27 7.3x

Performance evaluation of DistLODStats
Evaluation
27Node scalability (BSBM-50GB) Sizeup scalability

datasets?
effectively?
Research Questions
28

Quality Assessment of RDF
Datasets at Scale
A Scalable Framework for Quality Assessment of
RDF Datasets [3]
29

Assessing data quality is of paramount importance to judge its ﬁtness
for particular use case
Existing solutions can not evaluate data quality metrics on medium /
large-scale datasets
→ This is actually where they are most important
Motivation
30

Architecture Overview
32
Definition
● Define quality dimensions
● Define quality metrics, threshold and other configurations
RDF Data
Qualityassessment
SANSA Engine
DataIngestion
Distributed Data
Structures
QAP
Results
Analyse
SANSA-NotebooksData Quality Vocabulary (DQV)

Experimental Setup
(32 Cores), 128 GB RAM, 12 TB SATA RAID-5, Spark-2.4.0, Hadoop 2.8.0, Scala
2.11.11 and Java 8
Local mode: single instance of the cluster
- Datasets (all in .nt format)
Evaluation
33
DBpedia BSBM
LinkedGeoData en de fr 2GB 20GB 200GB
#nr. of triples 1,292,933,812 812,545,486 336,714,883 340,849,556 8,289,484 81,980,472 817,774,057
size (GB) 191.17 114.4 48.6 49.77 2 20 200

Evaluation
34
Runtime (in minutes)
Luzzu DistQualityAssessment
-----> a) single b) joint c) local d) cluster
LinkedGeoData Fail Fail 446.9 7.79
DBpedia_en Fail Fail 274.31 1.99
DBpedia_de Fail Fail 61.4 0.46
DBpedia_fr Fail Fail 195.3 0.38
BSBM_200GB Fail Fail 454.46 7.27
BSBM_0.01GB 2.64 2.65 0.04 0.42
BSBM_0.05GB 16.38 15.39 0.05 0.46
BSBM_0.1GB 40.59 37.94 0.06 0.44
BSBM_0.5GB 459.19 468.64 0.15 0.48
BSBM_1GB 1454.16 1532.95 0.4 0.56
BSBM_2GB Timeout Timeout 3.19 0.62
BSBM_10GB Timeout Timeout 29.44 0.52
BSBM_20GB Fail Fail 34.32 0.75
Large-scaleSmalltomedium

Performance evaluation of DistQualityAssessment
Evaluation
35Node scalability (BSBM-200GB) Sizeup scalability

datasets?
effectively?
Research Questions
36

Sparklify: A Scalable Software Component for
Efﬁcient evaluation of SPARQL queries over
distributed RDF datasets* [4]
37* A joint work with Claus Stadler, a PhD student at the University of Leipzig.

Existing solutions are narrowed down to simple RDF constructs only
Hence they do not exploit the full potential of the knowledge i.e. RDF
terms
Can we re-use existing Ontology-Based Data Access (OBDA) tooling to
facilitate running SPARQL queries on RDF kept in Apache Spark?
Motivation
38

Sparklify: Architecture Overview
39
Sparqlify
SANSA
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
Query Layer
Sparklifying
Views Views
Distributed Data
Structures
Results
RDFData
SELECT ?s ?w WHERE {
?s a dbp:Person .
?s ex:workPage ?w .
}
SPARQL
Prefix dbp:<http://dbpedia.org/ontology/>
Prefix ex:<http://ex.org/>
Create View view_person As
Construct {
?s a dbp:Person .
?s ex:workPage ?w .
}
With
?s = uri('http://mydomain.org/person', ?id)
?w = uri(?work_page)
Constrain
?w prefix "http://my-organization.org/user/"
From
person;
SELECT id, work_page
FROM view_person ;
SQLAET
SPARQL query
SPARQL Algebra
Expression Tree (AET)
Normalize AET

Experimental Setup
- 7 nodes (1 master, 6 worker), each with Intel(R) Xeon(R) CPU E5-2620 v4 @
2.10GHz (32
- Cores), 128 GB RAM, 12 TB SATA RAID-5, connected via a Gigabit network
- Each experiment executed 3 times, avg’ed results
Datasets (all in .nt format)
Evaluation
40
LUBM WatDiv
1K 5K 10K 10M 100M 1B
#nr. of triples 138,280,374 690,895,862 1,381,692,508 10,916,457 108,997,714 1,099,208,068
size (GB) 24 116 232 1.5 15 150

Evaluation
41
Runtime (s) (mean)
SPARQLGX-SDE Sparklify
-----> a) total b) partitioning c) querying d) total
QC 103.24 134.81 61 195.84
QF 157.8 236.06 107.33 349.51
QL 102.51 241.24 134 370.3
QS 131.16 237.12 108.56 346
QC partial fail 778.62 2043.66 2829.56
QF 6734.68 1295.3 2576.52 3871.97
QL 2575.72 1275.22 610.66 1886.73
QS 4841.85 1290.72 1552.05 2845.3
Watdiv-1BWatdiv-10M

Evaluation
42
Runtime (s) (mean)
SPARQLGX-SDE Sparklify
-----> a) total b) partitioning c) querying d) total
Q1 1056.83 627.72 718.11 1346.8
Q2 fail 595.76 fail n/a
Q3 1038.62 615.95 648.63 1267.37
Q4 2761.11 632.93 1670.18 2303.18
Q5 1026.94 641.53 564.13 1206.67
Q6 537.65 695.74 267.48 963.62
Q7 2080.67 630.44 1331.13 1967.25
Q8 2636.12 639.93 1647.57 2288.48
Q9 3124.52 583.86 2126.03 2711.24
Q10 1002.56 593.68 693.73 1287.71
Q11 1023.32 594.41 522.24 1118.58
Q12 2027.59 576.31 1088.25 1665.87
Q13 1007.39 626.57 6.66 633.26
Q14 526.15 633.39 258.32 891.89
LUBM-10K

Performance evaluation of Sparklify
Evaluation
43Node scalability (WatDiv 100M) Sizeup scalability

Sparklify vs SPARQLGX-SDE per query type performance on WatDiv
100M
Evaluation
44Query Types: (QS: Star pattern, QL: Linear pattern, QF: Snowflake, QC: Complex pattern)

Towards A Scalable Semantic-based Distributed
Approach for SPARQL query evaluation [5]
45

Are existing solutions more effective i.e. using property tables which
leads to reducing the number of necessary joins and unions?
What happens when not all subjects in a cluster will use all properties?
- Wide property tables may be very sparse containing many NULL
values and thus impose a large storage overhead
How about using a more flatten approach? i.e. partition into
subject-based grouping (e.g. all entities which are associated with a
unique subject)
Motivation
46

Semantic-Based: Architecture Overview
47
SANSA Engine
RDF Layer
Data Ingestion
Partitioning
Query Layer
Semantic
map map
Distributed Data
Structures
Results
RDFData
SELECT ?p WHERE {
?p :owns ?c .
?c :madeIn
?Ingolstadt .
}
SPARQL
Joy :owns Car1
Joy :livesIn Bonn
Car1 :typeOf Car
Car1 :madeBy Audi
Car1 :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany
Joy :owns Car1 :livesIn Bonn
Car1 :typeOf Car :madeBy Audi :madeIn Ingolstadt
Bonn :cityOf Germany
Audi :memeberOf Volkswagen
Ingolstadt :cityOf Germany

Experimental Setup
(32 Cores), 128 GB RAM, 12 TB SATA RAID-5, Spark-2.4.0, Hadoop 2.8.0, Scala
2.11.11 and Java 8
- Datasets (all in nt format)
- Distributed SPARQL query evaluators we compare with:
- SHARD, SPARQLGX-SDE, and Sparklify
Evaluation
48
LUBM WatDiv
1K 2K 3K 10M 100M
#nr. of triples 138,280,374 276,349,040 414,493,296 10,916,457 108,997,714
size (GB) 24 49 70 1.5 15

Evaluation
49
Runtime (s) (mean)
Queries SHARD
SPARQLGX-SD
E SANSA.Sparklify SANSA.Semantic
C3 n/a 38.79 72.94 90.48
F3 n/a 38.41 74.69 n/a
L3 n/a 21.05 73.16 72.84
S3 n/a 26.27 70.1 79.7
C3 n/a 181.51 96.59 300.82
F3 n/a 162.86 91.2 n/a
L3 n/a 84.09 82.17 189.89
S3 n/a 123.6 93.02 176.2
Watdiv-10MWatdiv-100M

Evaluation
50
Runtime (s) (mean)
Queries SHARD
SPARQLGX-SD
E SANSA.Sparklify SANSA.Semantic
Q1 774.93 103.74 103.57 226.21
Q2 fail fail 3348.51 329.69
Q3 772.55 126.31 107.25 235.31
Q4 988.28 182.52 111.89 294.8
Q5 771.69 101.05 100.37 226.21
Q6 fail 73.05 100.72 207.06
Q7 fail 160.94 113.03 277.08
Q8 fail 179.56 114.83 309.39
Q9 fail 204.62 114.25 326.29
Q10 780.05 106.26 110.18 232.72
Q11 783.2 112.23 105.13 231.36
Q12 fail 159.65 105.86 283.53
Q13 778.16 100.06 90.87 220.28
Q14 688.44 74.64 100.58 204.43
LUBM-1K

Performance evaluation of Semantic-based approach
Evaluation
51Node scalability (LUBM-1K) Sizeup scalability

Powered By
Project and Organizations using our proposed
approaches
52

53
<https://aleth.io/>
Blockchain – Alethio
Use Case
Alethio is using SANSA in order to
perform large-scale batch
analytics, e.g. computing the
asset turnover for sets of
accounts, computing attack
pattern frequencies and Opcode
usage statistics. SANSA was run
on a 100 node cluster with 400
cores
<https://www.big-data-europe.eu/>
Big Data Platform –
BDE
SANSA is used for computing
statistics over those logs within
the BDE platform. BDE uses the Mu
Swarm Logger service for
detecting docker events and
convert their representation to
RDF. In order to generate
visualisations of log statistics,
BDE then calls DistLODStats from
SANSA-Notebooks
<http://slipo.eu/>
Categorizing Areas
of Interests (AOI)
SLIPO focuses on designing
efficient pipelines dealing with
large semantic datasets of POIs.
In this project, Sparklify is used
through the SANSA query layer
to reﬁne, ﬁlter and select the
relevant POIs which are needed
by the pipelines
10+ more use cases
http://sansa-stack.net/powered-by/
Powered By

The Hubs and Authorities Transaction
Network Analysis
54
Amazon S3
buckets
EthOn RDF
triples
Connected Components
SANSA Engine
Data ingestion
Data partition
Querying (SPARQL)
Hubs & Authorities
entities
PageRank
Connected
Components
Top Accounts, Hubs & Authorities, Wallet
Exchange behaviorData visualization using the
Databricks notebooks or SANSA
notebooks
More than 18,000,000,000 facts*
*https://medium.com/alethio/ethereum-linked-data-b72e6283812f

Analyze game performance and customer behaviors at scale
Profiting from Kitties on Ethereum
55

Pipe different clustering algorithms at once
Scalable Integration of Big POI Data
56
RDF POI
Data
Pre
processing
SPARQL
Filtering
POI_ID Cat1 Cat2
1 0 1
2 1 0
3 0 1
4 1 1
Word Embedding
Semantic Clustering
Geo
Clustering

Conclusion and Future
Directions
57

datasets?
- First algorithm for computing RDF dataset statistics at scale using
Apache Spark
- An analysis of the complexity of the computational steps and the
data exchange between nodes in the cluster
- Integrated the approach into the SANSA framework
- A REST Interface for triggering RDF statistics calculation
Review of the Contributions
58

- A Quality Assessment Pattern QAP to characterize scalable quality
metrics
- A distributed (open source) implementation of quality metrics using
Apache Spark
- Analysis of the complexity of the metric evaluation
- Evaluate our approach and demonstrate empirically its superiority
over a previous centralized approach
- Integrated the approach into the SANSA framework
59

effectively?
- A novel approach for vertical partitioning including RDF terms and a
scalable query system (Sparklify) using SPARQL-to-SQL rewriter on
top of Apache Spark
- A scalable semantic-based partitioning and semantic-based query
engine (SANSA.Semantic) on top of Apache Spark
- Evaluation of the proposed approaches with state-of-the-art
engines and demonstrate it empirically
- Integrated the approaches into the SANSA framework
60

Large-scale RDF Dataset Statistics
- Our approach is purely batch processing, in which the data chunks
are normally very large, therefore we plan to investigate additional
techniques for lowering the network overhead and I/O footprint i.e.
HDT compression
- Near real-time computation of RDF dataset statistics using Spark
Streaming
Limitations and Future Directions
61

Assessment of RDF Datasets at Scale
- Intelligent partitioning strategies and perform dependency analysis
in order to evaluate multiple metrics simultaneously
- Real-time interactive quality assessment of large-scale RDF data
using Spark Streaming
- A declarative plugin using Quality Metric Language (QML), with the
ability to express, customize and enhance quality metrics
- Quality Assessment As a Service
- Quality check over LODStats
62

- Combine OBDA tools with dictionary encoding of RDF terms as
integers and evaluate the effects
- Extend our parser to support more SPARQL fragments and adding
statistics to the query engine while evaluating queries
- Investigate the re-ordering of the BGPs and evaluate the effects on
query execution time
- Consider other management operations i.e. additions, updates,
deletions i.e. DeltaLake solution as an alternative for storage layer
that brings ACID transactions to RDF data management solutions
63

Adaptive Distributed RDF Querying
- Optimize index structures and distribute data based on anticipated
query workloads of particular inference or ML algorithms
Efﬁcient Recommendation System for RDF Partitioners
- A recommender to suggest the “best partitioner” for our SPARQL
query evaluators based on the structure of the data (statistics)
A Powerful Benchmarking Suite
64

With the increasing amount of the RDF data, processing large-scale RDF
datasets are constantly facing challenges
We have shown the beneﬁts of using distributed computing frameworks
for a scalable and efﬁcient processing of RDF datasets
Future research work can build upon the contributions presented during
this thesis for a comprehensive scalable processing of RDF datasets
The main contributions of this thesis have been integrated within the
SANSA framework making an impact on the semantic web community
Closing Remarks
65

66
@Gezim_Sejdiu
https://gezimsejdiu.github.io/
That’s all folks
>> SANSA: https://github.com/SANSA-Stack

[1]. Distributed Semantic Analytics using the SANSA Stack. Jens Lehmann; Gezim Sejdiu; Lorenz Bühmann; Patrick
Westphal; Claus Stadler; Ivan Ermilov; Simon Bin; Nilesh Chakraborty; Muhammad Saleem; Axel-Cyrille Ngomo Ngonga;
and Hajira Jabeen. In Proceedings of 16th International Semantic Web Conference - Resources Track (ISWC'2017), 2017.
[2]. DistLODStats: Distributed Computation of RDF Dataset Statistics. Gezim Sejdiu; Ivan Ermilov; Jens Lehmann; and
Mohamed Nadjib-Mami. In Proceedings of 17th International Semantic Web Conference, 2018.
[3]. A Scalable Framework for Quality Assessment of RDF Datasets. Gezim Sejdiu; Anisa Rula; Jens Lehmann; and Hajira
Jabeen. In Proceedings of 18th International Semantic Web Conference, 2019.
[4]. Sparklify: A Scalable Software Component for Efficient evaluation of SPARQL queries over distributed RDF datasets.
Claus Stadler; Gezim Sejdiu; Damien Graux; and Jens Lehmann. In Proceedings of 18th International Semantic Web
Conference, 2019.
[5]. Towards A Scalable Semantic-based Distributed Approach for SPARQL query evaluation. Gezim Sejdiu; Damien
Graux; Imran Khan; Ioanna Lytra; Hajira Jabeen; and Jens Lehmann. In 15th International Conference on Semantic
Systems (SEMANTiCS), 2019.
References
67

SPARQL is a standard query language for retrieving and manipulating
RDF data
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?hq ?location
WHERE {
dbr:Deutsche_Post foaf:name ?name.
dbr:Deutsche_Post dbo:location ?hq.
?hq foaf:name ?location.
}
Querying Knowledge Graphs
69

Over the last years, the size of the Semantic Web has increased and
several large-scale datasets were published
> As of March 2019
~10, 000 datasets
Openly available online
using Semantic Web standards
+ many datasets
RDFized and kept private
Motivation
70
Source: LOD-Cloud (http://lod-cloud.net/ )

Speedup Ratio and Efﬁciency of DistLODStats
Evaluation
71

Overall Breakdown of DistLODStats by Criterion Analysis (log scale)
Evaluation
72

STATisfy: A REST Interface for DistLODStats
73
CollaborativeAnalyticsServices
Marketplace
REST
Server
BigDataEurope
Local Cluster
Standalone Resource manager
Master
Worker 1 Worker 2 Worker n
SANSA DistLODStats

QAP: consists of transformations and actions
- Transformation: Rule set or a union/intersection of transformations
- Rules: deﬁnes conditional criteria for a triple e.g. isIRI()
- Filter: retrieves a subset of an RDF triple, e.g. getPredicates
- Shortcuts ?s, ?p, ?o are frequently used for ﬁlters
- Action: maps a triple set to a numerical value, e.g. count(r)
Quality Assessment Patterns (QAPs)
74
Metric Transformation τ Action α
External Linkage r_1 = isIRI(?s)∩internal(?s)∩isIRI(?o)∩external(?o) α_1 = count(r_3)
r_2 = isIRI(?s)∩external(?s)∩isIRI(?o)∩internal(?o) α_2 = count(triples)
r_3 = r_1∪r_2 α= a_1/a_2

Overall analysis of DistQualityAssessment by metric in the cluster mode
(log scale)
Evaluation
75

Overall analysis of queries on LUBM-1K dataset (cluster mode) using
Semantic-based approach
Evaluation
76

Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva

More Related Content

What's hot

Similar to Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva

Recently uploaded

Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva