SlideShare a Scribd company logo
1 of 67
Scaling up Linked Data
Presented by:
Marin Dimitrov (Ontotext)
EUCLID Objective
2
Visualization
Module
Metadata
Streaming providers
Physical Wrapper
Downloads
Dataacquisition
R2R Transf.LD Wrapper
Musical Content
Application
Analysis &
Mining Module
LDDatasetAccess
LD Wrapper
RDF/
XML
Integrated
Dataset
Interlinking Cleansing
Vocabulary
Mapping
SPARQL
Endpoint
Publishing
RDFa
Other content
EUCLID – Scaling up Linked Data
• Our aim: build a music-based portal using Linked
Data technologies
• So far, we have studied different mechanisms for:
• Linked Data management via SPARQL queries
• Reasoning over Linked Data
• Linked Data access (RDF dumps, endpoints, RDFa)
• Linked Data storage in repositories
• In this chapter, we will study current research and
technologies to scale up to very large volumes of
Linked Data
Motivation: Music!
EUCLID – Scaling up Linked Data 3
CH 2
CH 3
CH 1
CH 5
Agenda
1. Introduction to Big (Linked) Data
2. NoSQL databases for Linked Data
3. Hadoop for Linked Data
4. Stream processing for Linked Data
5. … and more
4EUCLID – Scaling up Linked Data
INTRODUCTIONTO BIG (LINKED)
DATA
5EUCLID – Scaling up Linked Data
Introduction to Big Data
6
Big
Data
Management of data which is “too
complex” for being processed with
traditional solutions
• Big does not stand primarily for size,
but as an analogy for “overwhelming”
• Big can mean “high variety”, “high
volume” or “high velocity”
EUCLID – Scaling up Linked Data
The 3Vs of Big Data
7
Big
Data
Variety
Velocity
Volume
Different forms of data
Petabytes of data
Real-time data streams
Big
Data
EUCLID – Scaling up Linked Data
Variety Volume Velocity
Data
characteristic
Structured, semi-
structured and
unstructured
Large volumes of
data
Streams, sensors,
near real-time
data, IoT
Challenge Data integration Reasoning and
querying
Reasoning &
querying
Solution Semantic
technologies are
a good fit
Distributed
storage &
processing,
parallel
processing
Stream reasoning
& querying
The 3Vs of Big Data
8
time
EUCLID – Scaling up Linked Data
The ExtendedVs of Big Data
9
• Veracity: Uncertainty of the data
• Variability: Variation in meaning in different contexts
• Value: turning data into information into insight
• Not easy measure
• Depend on context and intended use
• Linked Data & Semantic Technologies can help
Variety VelocityVolume
EUCLID – Scaling up Linked Data
Beyond Big Data
10EUCLID – Scaling up Linked Data
11
Source: Gartner Inc. “Gartner Identifies Top Technology Trends Impacting Information
Infrastructure in 2013”
EUCLID – Scaling up Linked Data
Semantic Technologies
Semantic technologies extract meaning from data, ranging from quantitative
data and text, to video, voice and images. Many of these techniques have
existed for years and are based on advanced statistics, data mining, machine
learning and knowledge management. One reason they are garnering more
interest is the renewed business requirement for monetizing information as a
strategic asset. Even more pressing is the technical need. Increasing volumes,
variety and velocity — big data — in IM and business operations, requires
semantic technology that makes sense out of data for humans, or
automates decisions
Beyond Big Data (2)
Towards Big Linked Data
12
• This characteristic is the most inherent to Linked Data
• Agile data model
• Different vocabularies
Variety
Velocity
Volume
2007 2008 2009 2010 2011
• RDF Streams
• Semantic Sensors
EUCLID – Scaling up Linked Data
Towards Big Linked Data (2)
13EUCLID – Scaling up Linked Data
Big Linked Data &
Linked Big Data
14
• Exponential growth of Linked
Data in the last five years
• Big Data approach adopted by
the Linked Data community,
especially to handle
Source: M. Dimitrov. “Semantic Technologies for Big Data”
VelocityVolume
Big Linked Data Linked Big Data
• Linked Data approach
adopted by the Big Data
community
• RDF data model for
• Enrich Big Data with metadata
and semantics
• Interlink Big Data sets &
reduce duplication
• Simplify data access,
discovery & integration
Variety
EUCLID – Scaling up Linked Data
NOSQL DATABASES FOR
LINKED DATA
15EUCLID – Scaling up Linked Data
RDF Databases
16
• Native or RDBMS based RDF databases
– OWLIM (http://www.ontotext.com/owlim)
– Virtuoso Universal Server (http://virtuoso.openlinksw.com/ )
– Stardog (http://stardog.com)
– AllegroGraph (http://www.franz.com/agraph/allegrograph/ )
– Systap Bigdata (http://www.systap.com/)
– Jena TDB (http://jena.apache.org/documentation/tdb/)
– Oracle, DB2
EUCLID – Scaling up Linked Data
RDF Database Advantages
17
• RDF (graph) based data model
– Global identifies of resources/entities
– Agile schema
• Inference of implicit facts
– Forward, backward, hybrid reasoning strategy
• Expressive query language (SPARQL)
• Compliance to standards
EUCLID – Scaling up Linked Data
NoSQL Databases
18
• “Not Only SQL”
• a group of databases technologies which don’t
follow the relational data model
• Typical requirements
– Distributed
– High availability
– Handle big data & query volumes (scalability)
– Hierarchical or graph data structures
– Flexible schema
EUCLID – Scaling up Linked Data
NoSQLTaxonomy
19
• Key/value stores
– Each key associated with a value (DHT)
• Wide-column stores
– Each key is associated with many attributes,
columns are stored together
• Document databases
– Each key associated with a complex data
structure
• Graph databases
– Data is represented as nodes and edges
EUCLID – Scaling up Linked Data
ValueKey
Data Data
Relationship
Structured-
document
Key
Structured-
document
Key
Conceptual structures
Artist Album Song
The
Beatles
Let it be Get back
Queen Jazz Fun it
Key/Value Stores
20
• Efficient key/value lookups
• Schema-less
• Simpler read/write operations
– Low latency & high throughput
• Examples
– DynamoDB, Azure Table Storage, Riak, Redis, MemcacheDB,
Voldemort
EUCLID – Scaling up Linked Data
ValueKey
Wide-Column Stores
21
• A key is associated with several attributes
• Data in the same column is stored together
• Efficient for complex aggregations over data
• Schema-less / dynamic schema
• Easy to add new columns
• Columns can be grouped together (column family)
• Examples:
– HBase (http://hbase.apache.org)
– Cassandra (http://cassandra.apache.org)
Artist Album Song
The
Beatles
Let it be Get back
Queen Jazz Fun it
EUCLID – Scaling up Linked Data
HBase
22
• Open source column-oriented store
• Based on Google’s BigTable
• Built on top of HDFS and Hadoop
• Horizontally scalable, automatic sharding
• high availability / automatic failover
• Strongly consistent reads/writes
• Java/REST API
EUCLID – Scaling up Linked Data
Document Databases
23
• Each key associated with a complex data structure
(document)
• Documents can contain key/value pairs, key/array
pairs, or even nested structures
• Schema-less / dynamic schema
– New fields can be easily added to the document structure
• Typical document formats
– JSON, XML
• Examples:
– Couchbase (http://www.couchbase.com)
– MongoDB (http://www.mongodb.org)
Structured-
document
Key
Structured-
document
Key
EUCLID – Scaling up Linked Data
Document Databases (2)
24
Example:
{
Homepage: "thebeatles.com",
Origin: "Liverpool",
Albums: [
{Title: "Let it be", Year: "1970", Duration: "35:16"},
{Title: "Help!", Year: "1965"},
{Title: "Revolver", Year: "1966", Duration: "35:01"}
]
}
The Beatles
{
FullName: "Elvis Aaron Presley",
Homepage: "elvis.com",
Origin: "Memphis"
Albums: [
{Title: "Blue Hawaii", Year: "1961", Duration:
"32:02"}
]
}
Elvis Presley
EUCLID – Scaling up Linked Data
Couchbase
25
• Document-oriented database
– Documents are stored as JSON
• Flexible schema
– Document structure easy to change
• Optimised to run in-memory and on several
nodes
– Ejection and eventual persistence
• Incremental views & indexes
• Scalability, rebalancing, replication, failover
• RESTful API
EUCLID – Scaling up Linked Data
Network of Friends in a High School
26
Graph Databases
Motivation
Relationship among artists in Last.fm
http://sixdegrees.hu/last.fm/
A Fragment of Facebook Relationships between Tweets
Graphs: Representation of highly connected data
EUCLID – Scaling up Linked Data
Graph Databases
27
• Based on the property graph model
• Support for query languages and core graph-based
tasks
– reachability, traversal, adjacency and pattern matching
• Examples
– Neo4j (http://neo4j.org)
– Dex (http://sparsity-technologies.com/dex.php)
– HyperGraphDB (http://www.hypergraphdb.org)
Data Data
Relationship
EUCLID – Scaling up Linked Data
Graph Databases
28
Example: Property Graph Model
• Nodes and edges may have properties
• Properties: Key-value pairs
The Beatles
Let it be
Revolver
Help!
created
Year: 1970
Duration: 35:16
Year: 1965
Year: 1966
Duration: 35:01
Homepage:
thebeatles.com
Origin: Liverpool
Elvis Presley Revolver
created
Year: 1961
Duration: 32:02
Fullname: Elvis Aaron
Presley
Homepage: elvis.com
Origin: Memphis
EUCLID – Scaling up Linked Data
Neo4j
29
• Graph database
– Nodes, Relationships, Properties, Paths
– Indexes over properties
• Flexible schema
• Cypher graph query language
• ACID transactions
• High availability, distributed clusters
• RESTful and Java APIs
EUCLID – Scaling up Linked Data
Rya
30
• RDF store based on Accumulo
– Column-store, HDFS
– Sesame query parser, SAIL
implementation
• 3 table index
– SPO, POS, OSP
– Sufficient for all triple patterns
– All triple parts (S, P, O) encoded in
the RowID
– Clustered index
EUCLID – Scaling up Linked Data
Source: R. Punnoose, A. Crainiceanu, D. Rapp “Rya: A Scalable RDF Triple Store for the Clouds”
Rya (2)
31
• Query processing
– Sesame (SPARQL) query plan translated to Accumulo range
scans & lookups
– Parallel scans for joins (x10-20 speedup)
– Batch scans (Accumulo) to reduce number of range scans
– Statistics for triple patterns selectivity, query re-ordering
• Performance evaluation (LUBM)
– No significant degradation when data grows with 2-3 orders
of magnitude
EUCLID – Scaling up Linked Data
Source: R. Punnoose, A. Crainiceanu, D. Rapp “Rya: A Scalable RDF Triple Store for the Clouds”
“NoSQL Databases f0r RDF: An
Empirical Evaluation”
32
• Goal
– Store RDF data in HBase, Couchbase, Hive & Cassandra
– Benchmark query performance against a native
distributed RDF database (4store)
• HBase prototype
– Jena for SPARQL queries
– 3 index tables (SPO, POS, OSP)
– Row key encodes S+P+O, cells are empty
– Jena query plan translated to HBase filters & lookups
EUCLID – Scaling up Linked Data
Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
“NoSQL Databases f0r RDF: An
Empirical Evaluation” (2)
33
• Hive+HBase prototype
– SPARQL to HiveQL translation
– Property table
• Row key is S
• a column for each P
• cell value stores O
• Multi-valued attributes have different timestamps
EUCLID – Scaling up Linked Data
Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
“NoSQL Databases f0r RDF: An
Empirical Evaluation” (3)
34
• CumulusRDF prototype
– Sesame for SPARQL queries, Cassandra for data management
– 3 index tables (SPO, POS, OSP)
– Sesame query plan translated to Cassandra index lookups
• Couchbase prototype
– Map RDF into JSON documents
• all triples with the same S stored in the same document (molecule)
• 2 JSON arrays for Ps and Os
– Jena as a SPARQL query engine
– 3 indexes (Couchbase views): SPO, POS, OSP
EUCLID – Scaling up Linked Data
Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
“NoSQL Databases f0r RDF: An
Empirical Evaluation” (4)
35
• Benchmarks
– BSBM 10M, 100M
and 1B triples
– 1, 2, 4, 8, 16 node
cluster
– AWS cost & query
execution time
EUCLID – Scaling up Linked Data
Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
“NoSQL Databases f0r RDF: An
Empirical Evaluation” (5)
36
• Results
– Simple SPARQL queries can be executed more
efficiently on a NoSQL datastore
– Data loading time for some NoSQL datastores
comparable or better than the native RDF store
– Complex SPARQL queries perform significantly slower
on NoSQL systems
• Query optimisations are required
– MapReduce operations (Hive & Couchbase) introduce
high latency for view maintenance / query execution
EUCLID – Scaling up Linked Data
Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
HADOOP FOR LINKED DATA
37EUCLID – Scaling up Linked Data
• Apache Hadoop (http://hadoop.apache.org) is an open source
implementation of MapReduce
• MapReduce
– Distributed batch processing
– Map phase partitions the input set (K/V pairs), Reduce phase performs
aggregated processing over the partitions in parallel
– Shuffle intermediate results (from Map nodes to Reduce nodes)
• Allows for the processing of distributed large data sets across
clusters of computers
– On a distributed file system (HDFS)
– Scales up to thousands of nodes, each offering local processing power
and storage
38
Working with Distributed Data
EUCLID – Scaling up Linked Data
“Scalable Distributed Reasoning
with MapReduce”
39
• Goal
– Utilise Hadoop for large scale reasoning
• Approach
– Implement each RDFS rule (join) via a Map & Reduce function
– Map outputs original triple as value, and the join term as key
– Reducer receives all needed triples to perform the join
EUCLID – Scaling up Linked Data
Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce”
“Scalable Distributed Reasoning
with MapReduce” (2)
40EUCLID – Scaling up Linked Data
Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce”
“Scalable Distributed Reasoning
with MapReduce” (3)
41
• Challenge
– Too many duplicates (unique to derived
triple ratio of 1:50)
• Optimisations
– Replicate schema triples on each mode
(in memory)
• Needed for each join; usually a small set
– Rule re-ordering
• Which rule may be triggered by another
rule?
• Reduce the number of required iterations
EUCLID – Scaling up Linked Data
Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce”
“Scalable Distributed Reasoning
with MapReduce” (4)
42
• Results
– Throughput of 4.5M triples / sec on a 16-node cluster
– 16+ nodes do not improve the performance
significantly
EUCLID – Scaling up Linked Data
Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce”
Lessons Learned from Large-
scale Reasoning (J. Urbani)
43
• 1st Law: Treat schema triples differently
– Replicate on all nodes to minimise subsequent data transfer
• 2nd Law: Data skew dominates data distribution
– No universal partitioning scheme for input data
– Computation tasks moved to the nodes storing the data
(data locality)
• 3rd Law: Certain problems only appear at a very large
scale
– Proof-of-concept prototypes are often not representative
EUCLID – Scaling up Linked Data
Source: Jacopo Urbani “Three Laws Learned from Web-scale Reasoning”
STREAM PROCESSING FOR
LINKED DATA
45EUCLID – Scaling up Linked Data
Streaming Data
• A large amount of new data is constantly being created or
data is being updated at a rapid rate
– Traffic data, sensor networks, social networks, financial markets
• Many data sources create a constant “stream of information”
– Not always practical to store all data and then query it
– Continuous queries over transient data
• More recent data is more important
– Describes the current state of a dynamic system
46
time
EUCLID – Scaling up Linked Data
Stream Processing
• Streams are observed through windows
• Continuous queries can be registered over the stream
• Continuous queries are iteratively evaluated over the data in the
current window
– Can leverage static background knowledge (e.g., schema information)
• Generates a stream of answers
47
Window
Stream of answers
Background
Knowledge
time
Continuous
Query
EUCLID – Scaling up Linked Data
Linked Stream Data
48
• A representation of sensor/stream data following the
Linked Data principles
– Sensor data can be enriched with semantics
– Facilitates data discovery and integration of heterogeneous data
sources
• Challenges
– RDF Triples must be annotated with timestamps
– Extensions to the SPARQL language – windows, continuous queries,
streaming operators
– Continuous semantics
– Scalability (Volume)
– High throughput and low latency (Velocity)
– Approximate reasoning
EUCLID – Scaling up Linked Data
Querying Streams with
SPARQL Extensions
49
• The mechanism to evaluate queries over streaming data is the
specification of continuous queries
• The corresponding results to the continuous query are
updated while new data arrives
• Several SPARQL extensions with streaming operators based on
CQL (Continuous Query Language)
– C-SPARQL
– SPARQLStream
– EP-SPARQL, CQELS, Instants
EUCLID – Scaling up Linked Data
C-SPARQL (1)
50
C-SPARQL is an extension of SPARQL 1.1
FromStrClause  'FROM' ['NAMED'] 'STREAM' StreamIRI
' [ RANGE' Window ']'
Window  LogicalWindow | PhysicalWindow
LogicalWindow  Number TimeUnit WindowOverlap
TimeUnit  'MSEC' | 'SEC' | 'MIN' | 'HOUR' |
'DAY'
WindowOverlap  'STEP' Number TimeUnit | 'TUMBLING'
PhysicalWindow  'TRIPLES' Number
1. RDF Streams: Sequence of RDF triples annotated with timestamps:
<(s,p,o), timestamp>
2. FROM STREAM extension for stream sources and windows
EUCLID – Scaling up Linked Data
C-SPARQL (2)
51
3. Registration
• Creates a continuous query over the data source
• The query output is variable bindings, RDF graph, or a
new stream
Registration  'REGISTER' ('QUERY'|'STREAM') QName 'AS' Query
EUCLID – Scaling up Linked Data
C-SPARQL (3)
52
Example
REGISTER QUERY CarsEnteringInDistricts AS
SELECT DISTINCT ?district ?car
FROM STREAM <www.uc.eu/tollgates.trdf> [RANGE 40 SEC STEP 10 SEC]
WHERE {
?toll t:registers ?car .
?toll c:placedIn ?street .
?district c:contains ?street . }
Query: Retrieve the cars and districts, where the car was registered in a toll.
Source: Barbieri, Davide Francesco, et al. "Querying rdf streams with c-sparql." ACM SIGMOD
Record 39.1 (2010): 20-26.
EUCLID – Scaling up Linked Data
C-SPARQL (4)
53EUCLID – Scaling up Linked Data
Source: M. Balduini et al. “Tutorial on Stream Reasoning for Linked Data (ISWC’2013)”
SPARQLStream(1)
54
• Utilizes the same definition of RDF streams as in C-SPARQL:
• The language is defined as follows:
<(s,p,o), timestamp>
NamedStream  'FROM' ['NAMED'] 'STREAM' StreamIRI ' [' Window ']'
Window  'NOW-' Integer TimeUnit [UpperBound] [Slide]
UpperBound  'TO NOW-' Integer TimeUnit
Slide  'SLIDE' Integer TimeUnit
TimeUnit  'MS' | 'S' | 'MINUTES' | 'HOURS' | 'DAY'
Select  'SELECT' [XStream] [DISTINCT | REDUCED] …
Xstream  'ISTREAM' | 'DSTREAM' | 'RSTREAM'
Source: Jean-Paul Calbimonte and Oscar Corcho. ”SPARQLStream: Ontology-based access to data
streams." Tutorial at ISWC 2013
EUCLID – Scaling up Linked Data
SPARQLStream(2)
55
Example
Query: Retrieve a rstream with the observations captured by all sensors in the last
10 minutes.
PREFIX ssn: <http://purl.oclc.org/NET/ssnx/ssn>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns/#>
SELECT RSTREAM ?sensor ?observation
FROM STREAM <www.semsorgrid4env.eu/SensorReadings.srdf>
[FROM NOW – 10 MINUTES TO NOW STEP 1 MINUTE]
WHERE {
?observation a ssn:Observation;
ssn:observedBy ?sensor .
}
EUCLID – Scaling up Linked Data
Classification of Existing
Systems
56EUCLID – Scaling up Linked Data
Source: M. Balduini et al. “Tutorial on Stream Reasoning for Linked Data (ISWC’2013)”
W3C Semantic Sensor Networks
57
• SSN Ontology
– http://www.w3.org/2005/Incubator/ssn/ssnx/ssn
– OWL DL ontology
– used to semantically describe sensors and sensor networks & data
– Recommendations for applying the ontology for Linked Sensor Data
EUCLID – Scaling up Linked Data
W3C Semantic Sensor Networks
(2)
58
• Different perspectives
– Sensor, data/observation, system
EUCLID – Scaling up Linked Data
… AND MORE
59EUCLID – Scaling up Linked Data
ATrillion RDFTriples
60
• Use case
– Use RDF and Linked Data for the customer management
database of a big telecom
– Franz Inc / AllegroGraph
EUCLID – Scaling up Linked Data
uRiKA Appliance
61
• YarcData
• Big Data appliance for graph
analytics
– 8K processors, 1TB RAM
– In-memory RDF database
– SPARQL 1.1 support
EUCLID – Scaling up Linked Data
RDFS Reasoning on GPUs
62
• Similar approach to Urbani et al. for large scale
reasoning with Hadoop
– Handle rules with 2 antecedents
– Rule reordering
– Dictionary encoding
• Shared-memory architecture
– Efficient GPU algorithm implementation is challenging
EUCLID – Scaling up Linked Data
Source: Norman Heino & Jeff Z. Pan ”RDFS Reasoning on Massively Parallel Hardware" ISWC 2012
RDFS Reasoning on GPUs (2)
63
• Data parallelism
– Apply one rule (thread) on one instance triple, join to a schema triple
if possible
– Hundreds / thousands of threads working on parallel
• Challenge
– Duplicate removal
• Benchmark
– x5 speedup of computation
– But… memory transfer overhead is significant
EUCLID – Scaling up Linked Data
Source: Norman Heino & Jeff Z. Pan ”RDFS Reasoning on Massively Parallel Hardware" ISWC 2012
Benchmarks
64
• BSBM v3.1 (April 2013)
– http://wifo5-03.informatik.uni-
mannheim.de/bizer/berlinsparqlbenchmark/results/V7/
– Includes benchmarks with up to 150 billion triples
– x750 scale increase since the last BSBM result (200M triples)
• LDBC
– Industry neutral, non-profit organisation
– Benchmarks for RDF and graph databases, similar to TPC
– Big data volume, complex queries
EUCLID – Scaling up Linked Data
SUMMARY
65EUCLID – Scaling up Linked Data
Summary
66
• Linked Data is a good fit for the Variety
challenge of Big Data
• Linked Data can simplify data discovery, data
access, data integration challenges for Big Data
• Exponential growth of Linked Data
• Linked Data benchmarks target bigger
workloads
EUCLID – Scaling up Linked Data
Summary (2)
67
• Ongoing R&D towards scaling up Linked Data
for high data Volume and Velocity
– NoSQL datastores for RDF data management
– Hadoop for scalable RDF reasoning
– GPUs for scalable RDF reasoning
• Adapting Linked Data & SPARQL for streaming
data scenarios
EUCLID – Scaling up Linked Data
For exercises, quiz and further material visit our website:
68
@euclid_project euclidproject euclidproject
http://www.euclid-project.eu
Other channels:
eBook Course
EUCLID – Scaling up Linked Data

More Related Content

What's hot

DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublinm_ackermann
 
LDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionLDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionSören Auer
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4Marin Dimitrov
 
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Marcus Smith
 
Linked Open Data and DANS
Linked Open Data and DANSLinked Open Data and DANS
Linked Open Data and DANSvty
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsJoshua Shinavier
 
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryVisual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryPeter Haase
 
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Humphrey Southall
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureMichele Pasin
 
LODLAM Landscape NOTES
LODLAM Landscape NOTESLODLAM Landscape NOTES
LODLAM Landscape NOTESShana McDanold
 
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.Matthias Arnold
 
Linked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemLinked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemUldis Bojars
 
Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingSimeon Warner
 
Structured Data for the Financial Industry
Structured Data for the Financial Industry Structured Data for the Financial Industry
Structured Data for the Financial Industry sopekmir
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportPascal-Nicolas Becker
 

What's hot (20)

Graph database
Graph database Graph database
Graph database
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
 
LDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and DiscussionLDOW2015 Position Talk and Discussion
LDOW2015 Position Talk and Discussion
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4
 
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
 
Linked Open Data and DANS
Linked Open Data and DANSLinked Open Data and DANS
Linked Open Data and DANS
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of Agents
 
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryVisual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
 
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...
 
ResourceSync Tutorial
ResourceSync TutorialResourceSync Tutorial
ResourceSync Tutorial
 
LODLAM Landscape
LODLAM LandscapeLODLAM Landscape
LODLAM Landscape
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
LODLAM Landscape NOTES
LODLAM Landscape NOTESLODLAM Landscape NOTES
LODLAM Landscape NOTES
 
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
Embedding Linked Data Invisibly into Web Pages: Strategies and Workflows for ...
 
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
NISO/DCMI Webinar: Metadata for Managing Scientific Research DataNISO/DCMI Webinar: Metadata for Managing Scientific Research Data
NISO/DCMI Webinar: Metadata for Managing Scientific Research Data
 
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
VRA Core 4 in Transcultural Studies - Adopting Core 4 XML in a DH Environment.
 
Linked Data from a Digital Object Management System
Linked Data from a Digital Object Management SystemLinked Data from a Digital Object Management System
Linked Data from a Digital Object Management System
 
Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvesting
 
Structured Data for the Financial Industry
Structured Data for the Financial Industry Structured Data for the Financial Industry
Structured Data for the Financial Industry
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data Support
 

Viewers also liked

Accumulo Summit 2016: Effective Testing of Apache Accumulo Iterators
Accumulo Summit 2016: Effective Testing of Apache Accumulo IteratorsAccumulo Summit 2016: Effective Testing of Apache Accumulo Iterators
Accumulo Summit 2016: Effective Testing of Apache Accumulo IteratorsAccumulo Summit
 
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...Accumulo Summit
 
From Big Linked Data to Linked Big Data - DBpedia as a framework for data int...
From Big Linked Data to Linked Big Data - DBpedia as a framework for data int...From Big Linked Data to Linked Big Data - DBpedia as a framework for data int...
From Big Linked Data to Linked Big Data - DBpedia as a framework for data int...giuseppe_futia
 
ORAM: A Brief Overview
ORAM: A Brief OverviewORAM: A Brief Overview
ORAM: A Brief OverviewDev Nath
 
OWLIM@AWS - On-demand RDF Data Management in the Cloud
OWLIM@AWS - On-demand RDF Data Management in the CloudOWLIM@AWS - On-demand RDF Data Management in the Cloud
OWLIM@AWS - On-demand RDF Data Management in the CloudMarin Dimitrov
 
Low-cost Open Data As-a-Service
Low-cost Open Data As-a-ServiceLow-cost Open Data As-a-Service
Low-cost Open Data As-a-ServiceMarin Dimitrov
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceMarin Dimitrov
 
Ontotext in EC Funded Projects 2002-2012
Ontotext in EC Funded Projects 2002-2012Ontotext in EC Funded Projects 2002-2012
Ontotext in EC Funded Projects 2002-2012Marin Dimitrov
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
 
S4: The Self-Service Semantic Suite
S4: The Self-Service Semantic SuiteS4: The Self-Service Semantic Suite
S4: The Self-Service Semantic SuiteMarin Dimitrov
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
 
Text Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-ServiceText Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-ServiceMarin Dimitrov
 
Hackconf 2016 - Да пишем код за хиляди сървъри
Hackconf 2016 - Да пишем код за хиляди сървъриHackconf 2016 - Да пишем код за хиляди сървъри
Hackconf 2016 - Да пишем код за хиляди сървъриNikolay Stoitsev
 
Delivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science PractitionersDelivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science PractitionersMarin Dimitrov
 
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the CloudScaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the CloudMarin Dimitrov
 
Fast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARNFast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARNDataWorks Summit
 
GraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesGraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesMarin Dimitrov
 
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit
 

Viewers also liked (20)

Accumulo Summit 2016: Effective Testing of Apache Accumulo Iterators
Accumulo Summit 2016: Effective Testing of Apache Accumulo IteratorsAccumulo Summit 2016: Effective Testing of Apache Accumulo Iterators
Accumulo Summit 2016: Effective Testing of Apache Accumulo Iterators
 
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
 
From Big Linked Data to Linked Big Data - DBpedia as a framework for data int...
From Big Linked Data to Linked Big Data - DBpedia as a framework for data int...From Big Linked Data to Linked Big Data - DBpedia as a framework for data int...
From Big Linked Data to Linked Big Data - DBpedia as a framework for data int...
 
ORAM: A Brief Overview
ORAM: A Brief OverviewORAM: A Brief Overview
ORAM: A Brief Overview
 
OWLIM@AWS - On-demand RDF Data Management in the Cloud
OWLIM@AWS - On-demand RDF Data Management in the CloudOWLIM@AWS - On-demand RDF Data Management in the Cloud
OWLIM@AWS - On-demand RDF Data Management in the Cloud
 
Low-cost Open Data As-a-Service
Low-cost Open Data As-a-ServiceLow-cost Open Data As-a-Service
Low-cost Open Data As-a-Service
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
 
Ontotext in EC Funded Projects 2002-2012
Ontotext in EC Funded Projects 2002-2012Ontotext in EC Funded Projects 2002-2012
Ontotext in EC Funded Projects 2002-2012
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and Reuse
 
S4: The Self-Service Semantic Suite
S4: The Self-Service Semantic SuiteS4: The Self-Service Semantic Suite
S4: The Self-Service Semantic Suite
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the Cloud
 
Text Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-ServiceText Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-Service
 
Hackconf 2016 - Да пишем код за хиляди сървъри
Hackconf 2016 - Да пишем код за хиляди сървъриHackconf 2016 - Да пишем код за хиляди сървъри
Hackconf 2016 - Да пишем код за хиляди сървъри
 
Delivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science PractitionersDelivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science Practitioners
 
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the CloudScaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
 
From Python to Java
From Python to JavaFrom Python to Java
From Python to Java
 
Fast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARNFast, Scalable Graph Processing: Apache Giraph on YARN
Fast, Scalable Graph Processing: Apache Giraph on YARN
 
GraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesGraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL Queries
 
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
 
Analyzing Bitcoin Security
Analyzing Bitcoin SecurityAnalyzing Bitcoin Security
Analyzing Bitcoin Security
 

Similar to Scaling up Linked Data

Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphsSören Auer
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Gautier Poupeau
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summitOpen Analytics
 
Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentationmlang222
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and TechniquesBernhard Haslhofer
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data ApplicationsEUCLID project
 
SFScon21 - Sander Van Dooren - Joinup: Maintaining an Open catalogue of reusa...
SFScon21 - Sander Van Dooren - Joinup: Maintaining an Open catalogue of reusa...SFScon21 - Sander Van Dooren - Joinup: Maintaining an Open catalogue of reusa...
SFScon21 - Sander Van Dooren - Joinup: Maintaining an Open catalogue of reusa...South Tyrol Free Software Conference
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQLPhilippe Julio
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftRuleML
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked DataAdrian Stevenson
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
 

Similar to Scaling up Linked Data (20)

Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Linked Data
Linked DataLinked Data
Linked Data
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
 
Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentation
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
Apache drill
Apache drillApache drill
Apache drill
 
The Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web InitiativeThe Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web Initiative
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
SFScon21 - Sander Van Dooren - Joinup: Maintaining an Open catalogue of reusa...
SFScon21 - Sander Van Dooren - Joinup: Maintaining an Open catalogue of reusa...SFScon21 - Sander Van Dooren - Joinup: Maintaining an Open catalogue of reusa...
SFScon21 - Sander Van Dooren - Joinup: Maintaining an Open catalogue of reusa...
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraft
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 

More from Marin Dimitrov

Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Marin Dimitrov
 
Mapping Your Career Journey
Mapping Your Career JourneyMapping Your Career Journey
Mapping Your Career JourneyMarin Dimitrov
 
Trust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & OrganisationsTrust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & OrganisationsMarin Dimitrov
 
Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018Marin Dimitrov
 
Machine Learning @ Uber
Machine Learning @ UberMachine Learning @ Uber
Machine Learning @ UberMarin Dimitrov
 
Career Advice for My Younger Self
Career Advice for My Younger SelfCareer Advice for My Younger Self
Career Advice for My Younger SelfMarin Dimitrov
 
Scaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed SitesScaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed SitesMarin Dimitrov
 
Building, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance TeamsBuilding, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance TeamsMarin Dimitrov
 
Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)Marin Dimitrov
 
Crossing the Chasm with Semantic Technology
Crossing the Chasm with Semantic TechnologyCrossing the Chasm with Semantic Technology
Crossing the Chasm with Semantic TechnologyMarin Dimitrov
 
From Big Data to Smart Data
From Big Data to Smart DataFrom Big Data to Smart Data
From Big Data to Smart DataMarin Dimitrov
 
Semantic Technologies for Big Data
Semantic Technologies for Big DataSemantic Technologies for Big Data
Semantic Technologies for Big DataMarin Dimitrov
 
Career Days 2012 @ Sofia University
Career Days 2012 @ Sofia UniversityCareer Days 2012 @ Sofia University
Career Days 2012 @ Sofia UniversityMarin Dimitrov
 
Linked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and ChallengesLinked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and ChallengesMarin Dimitrov
 
Linked Data Marketplaces
Linked Data MarketplacesLinked Data Marketplaces
Linked Data MarketplacesMarin Dimitrov
 
Linked Data Management
Linked Data ManagementLinked Data Management
Linked Data ManagementMarin Dimitrov
 

More from Marin Dimitrov (17)

Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
 
Mapping Your Career Journey
Mapping Your Career JourneyMapping Your Career Journey
Mapping Your Career Journey
 
Open Source @ Uber
Open Source @ Uber Open Source @ Uber
Open Source @ Uber
 
Trust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & OrganisationsTrust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & Organisations
 
Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018
 
Machine Learning @ Uber
Machine Learning @ UberMachine Learning @ Uber
Machine Learning @ Uber
 
Career Advice for My Younger Self
Career Advice for My Younger SelfCareer Advice for My Younger Self
Career Advice for My Younger Self
 
Scaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed SitesScaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed Sites
 
Building, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance TeamsBuilding, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance Teams
 
Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)
 
Crossing the Chasm with Semantic Technology
Crossing the Chasm with Semantic TechnologyCrossing the Chasm with Semantic Technology
Crossing the Chasm with Semantic Technology
 
From Big Data to Smart Data
From Big Data to Smart DataFrom Big Data to Smart Data
From Big Data to Smart Data
 
Semantic Technologies for Big Data
Semantic Technologies for Big DataSemantic Technologies for Big Data
Semantic Technologies for Big Data
 
Career Days 2012 @ Sofia University
Career Days 2012 @ Sofia UniversityCareer Days 2012 @ Sofia University
Career Days 2012 @ Sofia University
 
Linked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and ChallengesLinked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and Challenges
 
Linked Data Marketplaces
Linked Data MarketplacesLinked Data Marketplaces
Linked Data Marketplaces
 
Linked Data Management
Linked Data ManagementLinked Data Management
Linked Data Management
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Scaling up Linked Data

  • 1. Scaling up Linked Data Presented by: Marin Dimitrov (Ontotext)
  • 2. EUCLID Objective 2 Visualization Module Metadata Streaming providers Physical Wrapper Downloads Dataacquisition R2R Transf.LD Wrapper Musical Content Application Analysis & Mining Module LDDatasetAccess LD Wrapper RDF/ XML Integrated Dataset Interlinking Cleansing Vocabulary Mapping SPARQL Endpoint Publishing RDFa Other content EUCLID – Scaling up Linked Data
  • 3. • Our aim: build a music-based portal using Linked Data technologies • So far, we have studied different mechanisms for: • Linked Data management via SPARQL queries • Reasoning over Linked Data • Linked Data access (RDF dumps, endpoints, RDFa) • Linked Data storage in repositories • In this chapter, we will study current research and technologies to scale up to very large volumes of Linked Data Motivation: Music! EUCLID – Scaling up Linked Data 3 CH 2 CH 3 CH 1 CH 5
  • 4. Agenda 1. Introduction to Big (Linked) Data 2. NoSQL databases for Linked Data 3. Hadoop for Linked Data 4. Stream processing for Linked Data 5. … and more 4EUCLID – Scaling up Linked Data
  • 5. INTRODUCTIONTO BIG (LINKED) DATA 5EUCLID – Scaling up Linked Data
  • 6. Introduction to Big Data 6 Big Data Management of data which is “too complex” for being processed with traditional solutions • Big does not stand primarily for size, but as an analogy for “overwhelming” • Big can mean “high variety”, “high volume” or “high velocity” EUCLID – Scaling up Linked Data
  • 7. The 3Vs of Big Data 7 Big Data Variety Velocity Volume Different forms of data Petabytes of data Real-time data streams Big Data EUCLID – Scaling up Linked Data
  • 8. Variety Volume Velocity Data characteristic Structured, semi- structured and unstructured Large volumes of data Streams, sensors, near real-time data, IoT Challenge Data integration Reasoning and querying Reasoning & querying Solution Semantic technologies are a good fit Distributed storage & processing, parallel processing Stream reasoning & querying The 3Vs of Big Data 8 time EUCLID – Scaling up Linked Data
  • 9. The ExtendedVs of Big Data 9 • Veracity: Uncertainty of the data • Variability: Variation in meaning in different contexts • Value: turning data into information into insight • Not easy measure • Depend on context and intended use • Linked Data & Semantic Technologies can help Variety VelocityVolume EUCLID – Scaling up Linked Data
  • 10. Beyond Big Data 10EUCLID – Scaling up Linked Data
  • 11. 11 Source: Gartner Inc. “Gartner Identifies Top Technology Trends Impacting Information Infrastructure in 2013” EUCLID – Scaling up Linked Data Semantic Technologies Semantic technologies extract meaning from data, ranging from quantitative data and text, to video, voice and images. Many of these techniques have existed for years and are based on advanced statistics, data mining, machine learning and knowledge management. One reason they are garnering more interest is the renewed business requirement for monetizing information as a strategic asset. Even more pressing is the technical need. Increasing volumes, variety and velocity — big data — in IM and business operations, requires semantic technology that makes sense out of data for humans, or automates decisions Beyond Big Data (2)
  • 12. Towards Big Linked Data 12 • This characteristic is the most inherent to Linked Data • Agile data model • Different vocabularies Variety Velocity Volume 2007 2008 2009 2010 2011 • RDF Streams • Semantic Sensors EUCLID – Scaling up Linked Data
  • 13. Towards Big Linked Data (2) 13EUCLID – Scaling up Linked Data
  • 14. Big Linked Data & Linked Big Data 14 • Exponential growth of Linked Data in the last five years • Big Data approach adopted by the Linked Data community, especially to handle Source: M. Dimitrov. “Semantic Technologies for Big Data” VelocityVolume Big Linked Data Linked Big Data • Linked Data approach adopted by the Big Data community • RDF data model for • Enrich Big Data with metadata and semantics • Interlink Big Data sets & reduce duplication • Simplify data access, discovery & integration Variety EUCLID – Scaling up Linked Data
  • 15. NOSQL DATABASES FOR LINKED DATA 15EUCLID – Scaling up Linked Data
  • 16. RDF Databases 16 • Native or RDBMS based RDF databases – OWLIM (http://www.ontotext.com/owlim) – Virtuoso Universal Server (http://virtuoso.openlinksw.com/ ) – Stardog (http://stardog.com) – AllegroGraph (http://www.franz.com/agraph/allegrograph/ ) – Systap Bigdata (http://www.systap.com/) – Jena TDB (http://jena.apache.org/documentation/tdb/) – Oracle, DB2 EUCLID – Scaling up Linked Data
  • 17. RDF Database Advantages 17 • RDF (graph) based data model – Global identifies of resources/entities – Agile schema • Inference of implicit facts – Forward, backward, hybrid reasoning strategy • Expressive query language (SPARQL) • Compliance to standards EUCLID – Scaling up Linked Data
  • 18. NoSQL Databases 18 • “Not Only SQL” • a group of databases technologies which don’t follow the relational data model • Typical requirements – Distributed – High availability – Handle big data & query volumes (scalability) – Hierarchical or graph data structures – Flexible schema EUCLID – Scaling up Linked Data
  • 19. NoSQLTaxonomy 19 • Key/value stores – Each key associated with a value (DHT) • Wide-column stores – Each key is associated with many attributes, columns are stored together • Document databases – Each key associated with a complex data structure • Graph databases – Data is represented as nodes and edges EUCLID – Scaling up Linked Data ValueKey Data Data Relationship Structured- document Key Structured- document Key Conceptual structures Artist Album Song The Beatles Let it be Get back Queen Jazz Fun it
  • 20. Key/Value Stores 20 • Efficient key/value lookups • Schema-less • Simpler read/write operations – Low latency & high throughput • Examples – DynamoDB, Azure Table Storage, Riak, Redis, MemcacheDB, Voldemort EUCLID – Scaling up Linked Data ValueKey
  • 21. Wide-Column Stores 21 • A key is associated with several attributes • Data in the same column is stored together • Efficient for complex aggregations over data • Schema-less / dynamic schema • Easy to add new columns • Columns can be grouped together (column family) • Examples: – HBase (http://hbase.apache.org) – Cassandra (http://cassandra.apache.org) Artist Album Song The Beatles Let it be Get back Queen Jazz Fun it EUCLID – Scaling up Linked Data
  • 22. HBase 22 • Open source column-oriented store • Based on Google’s BigTable • Built on top of HDFS and Hadoop • Horizontally scalable, automatic sharding • high availability / automatic failover • Strongly consistent reads/writes • Java/REST API EUCLID – Scaling up Linked Data
  • 23. Document Databases 23 • Each key associated with a complex data structure (document) • Documents can contain key/value pairs, key/array pairs, or even nested structures • Schema-less / dynamic schema – New fields can be easily added to the document structure • Typical document formats – JSON, XML • Examples: – Couchbase (http://www.couchbase.com) – MongoDB (http://www.mongodb.org) Structured- document Key Structured- document Key EUCLID – Scaling up Linked Data
  • 24. Document Databases (2) 24 Example: { Homepage: "thebeatles.com", Origin: "Liverpool", Albums: [ {Title: "Let it be", Year: "1970", Duration: "35:16"}, {Title: "Help!", Year: "1965"}, {Title: "Revolver", Year: "1966", Duration: "35:01"} ] } The Beatles { FullName: "Elvis Aaron Presley", Homepage: "elvis.com", Origin: "Memphis" Albums: [ {Title: "Blue Hawaii", Year: "1961", Duration: "32:02"} ] } Elvis Presley EUCLID – Scaling up Linked Data
  • 25. Couchbase 25 • Document-oriented database – Documents are stored as JSON • Flexible schema – Document structure easy to change • Optimised to run in-memory and on several nodes – Ejection and eventual persistence • Incremental views & indexes • Scalability, rebalancing, replication, failover • RESTful API EUCLID – Scaling up Linked Data
  • 26. Network of Friends in a High School 26 Graph Databases Motivation Relationship among artists in Last.fm http://sixdegrees.hu/last.fm/ A Fragment of Facebook Relationships between Tweets Graphs: Representation of highly connected data EUCLID – Scaling up Linked Data
  • 27. Graph Databases 27 • Based on the property graph model • Support for query languages and core graph-based tasks – reachability, traversal, adjacency and pattern matching • Examples – Neo4j (http://neo4j.org) – Dex (http://sparsity-technologies.com/dex.php) – HyperGraphDB (http://www.hypergraphdb.org) Data Data Relationship EUCLID – Scaling up Linked Data
  • 28. Graph Databases 28 Example: Property Graph Model • Nodes and edges may have properties • Properties: Key-value pairs The Beatles Let it be Revolver Help! created Year: 1970 Duration: 35:16 Year: 1965 Year: 1966 Duration: 35:01 Homepage: thebeatles.com Origin: Liverpool Elvis Presley Revolver created Year: 1961 Duration: 32:02 Fullname: Elvis Aaron Presley Homepage: elvis.com Origin: Memphis EUCLID – Scaling up Linked Data
  • 29. Neo4j 29 • Graph database – Nodes, Relationships, Properties, Paths – Indexes over properties • Flexible schema • Cypher graph query language • ACID transactions • High availability, distributed clusters • RESTful and Java APIs EUCLID – Scaling up Linked Data
  • 30. Rya 30 • RDF store based on Accumulo – Column-store, HDFS – Sesame query parser, SAIL implementation • 3 table index – SPO, POS, OSP – Sufficient for all triple patterns – All triple parts (S, P, O) encoded in the RowID – Clustered index EUCLID – Scaling up Linked Data Source: R. Punnoose, A. Crainiceanu, D. Rapp “Rya: A Scalable RDF Triple Store for the Clouds”
  • 31. Rya (2) 31 • Query processing – Sesame (SPARQL) query plan translated to Accumulo range scans & lookups – Parallel scans for joins (x10-20 speedup) – Batch scans (Accumulo) to reduce number of range scans – Statistics for triple patterns selectivity, query re-ordering • Performance evaluation (LUBM) – No significant degradation when data grows with 2-3 orders of magnitude EUCLID – Scaling up Linked Data Source: R. Punnoose, A. Crainiceanu, D. Rapp “Rya: A Scalable RDF Triple Store for the Clouds”
  • 32. “NoSQL Databases f0r RDF: An Empirical Evaluation” 32 • Goal – Store RDF data in HBase, Couchbase, Hive & Cassandra – Benchmark query performance against a native distributed RDF database (4store) • HBase prototype – Jena for SPARQL queries – 3 index tables (SPO, POS, OSP) – Row key encodes S+P+O, cells are empty – Jena query plan translated to HBase filters & lookups EUCLID – Scaling up Linked Data Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
  • 33. “NoSQL Databases f0r RDF: An Empirical Evaluation” (2) 33 • Hive+HBase prototype – SPARQL to HiveQL translation – Property table • Row key is S • a column for each P • cell value stores O • Multi-valued attributes have different timestamps EUCLID – Scaling up Linked Data Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
  • 34. “NoSQL Databases f0r RDF: An Empirical Evaluation” (3) 34 • CumulusRDF prototype – Sesame for SPARQL queries, Cassandra for data management – 3 index tables (SPO, POS, OSP) – Sesame query plan translated to Cassandra index lookups • Couchbase prototype – Map RDF into JSON documents • all triples with the same S stored in the same document (molecule) • 2 JSON arrays for Ps and Os – Jena as a SPARQL query engine – 3 indexes (Couchbase views): SPO, POS, OSP EUCLID – Scaling up Linked Data Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
  • 35. “NoSQL Databases f0r RDF: An Empirical Evaluation” (4) 35 • Benchmarks – BSBM 10M, 100M and 1B triples – 1, 2, 4, 8, 16 node cluster – AWS cost & query execution time EUCLID – Scaling up Linked Data Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
  • 36. “NoSQL Databases f0r RDF: An Empirical Evaluation” (5) 36 • Results – Simple SPARQL queries can be executed more efficiently on a NoSQL datastore – Data loading time for some NoSQL datastores comparable or better than the native RDF store – Complex SPARQL queries perform significantly slower on NoSQL systems • Query optimisations are required – MapReduce operations (Hive & Couchbase) introduce high latency for view maintenance / query execution EUCLID – Scaling up Linked Data Source: Cudre-Mauroux et al. “NoSQL Databases for RDF: An Empirical Evaluation”
  • 37. HADOOP FOR LINKED DATA 37EUCLID – Scaling up Linked Data
  • 38. • Apache Hadoop (http://hadoop.apache.org) is an open source implementation of MapReduce • MapReduce – Distributed batch processing – Map phase partitions the input set (K/V pairs), Reduce phase performs aggregated processing over the partitions in parallel – Shuffle intermediate results (from Map nodes to Reduce nodes) • Allows for the processing of distributed large data sets across clusters of computers – On a distributed file system (HDFS) – Scales up to thousands of nodes, each offering local processing power and storage 38 Working with Distributed Data EUCLID – Scaling up Linked Data
  • 39. “Scalable Distributed Reasoning with MapReduce” 39 • Goal – Utilise Hadoop for large scale reasoning • Approach – Implement each RDFS rule (join) via a Map & Reduce function – Map outputs original triple as value, and the join term as key – Reducer receives all needed triples to perform the join EUCLID – Scaling up Linked Data Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce”
  • 40. “Scalable Distributed Reasoning with MapReduce” (2) 40EUCLID – Scaling up Linked Data Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce”
  • 41. “Scalable Distributed Reasoning with MapReduce” (3) 41 • Challenge – Too many duplicates (unique to derived triple ratio of 1:50) • Optimisations – Replicate schema triples on each mode (in memory) • Needed for each join; usually a small set – Rule re-ordering • Which rule may be triggered by another rule? • Reduce the number of required iterations EUCLID – Scaling up Linked Data Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce”
  • 42. “Scalable Distributed Reasoning with MapReduce” (4) 42 • Results – Throughput of 4.5M triples / sec on a 16-node cluster – 16+ nodes do not improve the performance significantly EUCLID – Scaling up Linked Data Source: Urbani et al. “Scalable Distributed Reasoning with MapReduce”
  • 43. Lessons Learned from Large- scale Reasoning (J. Urbani) 43 • 1st Law: Treat schema triples differently – Replicate on all nodes to minimise subsequent data transfer • 2nd Law: Data skew dominates data distribution – No universal partitioning scheme for input data – Computation tasks moved to the nodes storing the data (data locality) • 3rd Law: Certain problems only appear at a very large scale – Proof-of-concept prototypes are often not representative EUCLID – Scaling up Linked Data Source: Jacopo Urbani “Three Laws Learned from Web-scale Reasoning”
  • 44. STREAM PROCESSING FOR LINKED DATA 45EUCLID – Scaling up Linked Data
  • 45. Streaming Data • A large amount of new data is constantly being created or data is being updated at a rapid rate – Traffic data, sensor networks, social networks, financial markets • Many data sources create a constant “stream of information” – Not always practical to store all data and then query it – Continuous queries over transient data • More recent data is more important – Describes the current state of a dynamic system 46 time EUCLID – Scaling up Linked Data
  • 46. Stream Processing • Streams are observed through windows • Continuous queries can be registered over the stream • Continuous queries are iteratively evaluated over the data in the current window – Can leverage static background knowledge (e.g., schema information) • Generates a stream of answers 47 Window Stream of answers Background Knowledge time Continuous Query EUCLID – Scaling up Linked Data
  • 47. Linked Stream Data 48 • A representation of sensor/stream data following the Linked Data principles – Sensor data can be enriched with semantics – Facilitates data discovery and integration of heterogeneous data sources • Challenges – RDF Triples must be annotated with timestamps – Extensions to the SPARQL language – windows, continuous queries, streaming operators – Continuous semantics – Scalability (Volume) – High throughput and low latency (Velocity) – Approximate reasoning EUCLID – Scaling up Linked Data
  • 48. Querying Streams with SPARQL Extensions 49 • The mechanism to evaluate queries over streaming data is the specification of continuous queries • The corresponding results to the continuous query are updated while new data arrives • Several SPARQL extensions with streaming operators based on CQL (Continuous Query Language) – C-SPARQL – SPARQLStream – EP-SPARQL, CQELS, Instants EUCLID – Scaling up Linked Data
  • 49. C-SPARQL (1) 50 C-SPARQL is an extension of SPARQL 1.1 FromStrClause  'FROM' ['NAMED'] 'STREAM' StreamIRI ' [ RANGE' Window ']' Window  LogicalWindow | PhysicalWindow LogicalWindow  Number TimeUnit WindowOverlap TimeUnit  'MSEC' | 'SEC' | 'MIN' | 'HOUR' | 'DAY' WindowOverlap  'STEP' Number TimeUnit | 'TUMBLING' PhysicalWindow  'TRIPLES' Number 1. RDF Streams: Sequence of RDF triples annotated with timestamps: <(s,p,o), timestamp> 2. FROM STREAM extension for stream sources and windows EUCLID – Scaling up Linked Data
  • 50. C-SPARQL (2) 51 3. Registration • Creates a continuous query over the data source • The query output is variable bindings, RDF graph, or a new stream Registration  'REGISTER' ('QUERY'|'STREAM') QName 'AS' Query EUCLID – Scaling up Linked Data
  • 51. C-SPARQL (3) 52 Example REGISTER QUERY CarsEnteringInDistricts AS SELECT DISTINCT ?district ?car FROM STREAM <www.uc.eu/tollgates.trdf> [RANGE 40 SEC STEP 10 SEC] WHERE { ?toll t:registers ?car . ?toll c:placedIn ?street . ?district c:contains ?street . } Query: Retrieve the cars and districts, where the car was registered in a toll. Source: Barbieri, Davide Francesco, et al. "Querying rdf streams with c-sparql." ACM SIGMOD Record 39.1 (2010): 20-26. EUCLID – Scaling up Linked Data
  • 52. C-SPARQL (4) 53EUCLID – Scaling up Linked Data Source: M. Balduini et al. “Tutorial on Stream Reasoning for Linked Data (ISWC’2013)”
  • 53. SPARQLStream(1) 54 • Utilizes the same definition of RDF streams as in C-SPARQL: • The language is defined as follows: <(s,p,o), timestamp> NamedStream  'FROM' ['NAMED'] 'STREAM' StreamIRI ' [' Window ']' Window  'NOW-' Integer TimeUnit [UpperBound] [Slide] UpperBound  'TO NOW-' Integer TimeUnit Slide  'SLIDE' Integer TimeUnit TimeUnit  'MS' | 'S' | 'MINUTES' | 'HOURS' | 'DAY' Select  'SELECT' [XStream] [DISTINCT | REDUCED] … Xstream  'ISTREAM' | 'DSTREAM' | 'RSTREAM' Source: Jean-Paul Calbimonte and Oscar Corcho. ”SPARQLStream: Ontology-based access to data streams." Tutorial at ISWC 2013 EUCLID – Scaling up Linked Data
  • 54. SPARQLStream(2) 55 Example Query: Retrieve a rstream with the observations captured by all sensors in the last 10 minutes. PREFIX ssn: <http://purl.oclc.org/NET/ssnx/ssn> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns/#> SELECT RSTREAM ?sensor ?observation FROM STREAM <www.semsorgrid4env.eu/SensorReadings.srdf> [FROM NOW – 10 MINUTES TO NOW STEP 1 MINUTE] WHERE { ?observation a ssn:Observation; ssn:observedBy ?sensor . } EUCLID – Scaling up Linked Data
  • 55. Classification of Existing Systems 56EUCLID – Scaling up Linked Data Source: M. Balduini et al. “Tutorial on Stream Reasoning for Linked Data (ISWC’2013)”
  • 56. W3C Semantic Sensor Networks 57 • SSN Ontology – http://www.w3.org/2005/Incubator/ssn/ssnx/ssn – OWL DL ontology – used to semantically describe sensors and sensor networks & data – Recommendations for applying the ontology for Linked Sensor Data EUCLID – Scaling up Linked Data
  • 57. W3C Semantic Sensor Networks (2) 58 • Different perspectives – Sensor, data/observation, system EUCLID – Scaling up Linked Data
  • 58. … AND MORE 59EUCLID – Scaling up Linked Data
  • 59. ATrillion RDFTriples 60 • Use case – Use RDF and Linked Data for the customer management database of a big telecom – Franz Inc / AllegroGraph EUCLID – Scaling up Linked Data
  • 60. uRiKA Appliance 61 • YarcData • Big Data appliance for graph analytics – 8K processors, 1TB RAM – In-memory RDF database – SPARQL 1.1 support EUCLID – Scaling up Linked Data
  • 61. RDFS Reasoning on GPUs 62 • Similar approach to Urbani et al. for large scale reasoning with Hadoop – Handle rules with 2 antecedents – Rule reordering – Dictionary encoding • Shared-memory architecture – Efficient GPU algorithm implementation is challenging EUCLID – Scaling up Linked Data Source: Norman Heino & Jeff Z. Pan ”RDFS Reasoning on Massively Parallel Hardware" ISWC 2012
  • 62. RDFS Reasoning on GPUs (2) 63 • Data parallelism – Apply one rule (thread) on one instance triple, join to a schema triple if possible – Hundreds / thousands of threads working on parallel • Challenge – Duplicate removal • Benchmark – x5 speedup of computation – But… memory transfer overhead is significant EUCLID – Scaling up Linked Data Source: Norman Heino & Jeff Z. Pan ”RDFS Reasoning on Massively Parallel Hardware" ISWC 2012
  • 63. Benchmarks 64 • BSBM v3.1 (April 2013) – http://wifo5-03.informatik.uni- mannheim.de/bizer/berlinsparqlbenchmark/results/V7/ – Includes benchmarks with up to 150 billion triples – x750 scale increase since the last BSBM result (200M triples) • LDBC – Industry neutral, non-profit organisation – Benchmarks for RDF and graph databases, similar to TPC – Big data volume, complex queries EUCLID – Scaling up Linked Data
  • 65. Summary 66 • Linked Data is a good fit for the Variety challenge of Big Data • Linked Data can simplify data discovery, data access, data integration challenges for Big Data • Exponential growth of Linked Data • Linked Data benchmarks target bigger workloads EUCLID – Scaling up Linked Data
  • 66. Summary (2) 67 • Ongoing R&D towards scaling up Linked Data for high data Volume and Velocity – NoSQL datastores for RDF data management – Hadoop for scalable RDF reasoning – GPUs for scalable RDF reasoning • Adapting Linked Data & SPARQL for streaming data scenarios EUCLID – Scaling up Linked Data
  • 67. For exercises, quiz and further material visit our website: 68 @euclid_project euclidproject euclidproject http://www.euclid-project.eu Other channels: eBook Course EUCLID – Scaling up Linked Data