SlideShare a Scribd company logo
1 of 30
The Promise of a Better Connected Digital World:

Data Registry Systems Without the Web

Philippe Cudré-Mauroux

eXascale Infolab, University of Fribourg
Switzerland

Christophe Guéret
VU University / DANS
The Netherlands

Verisign Labs Distinguished Speakers Series
Verisign Labs, Reston–USA
December 13, 2013
Entities
2
Entity Data
• Semi-structured,
interlinked
descriptions of
shared instances
–
–
–
–
–
–

Persons
Objects
Software
Locations
Sensors
…
3
Entities as Mediation
• Rising paradigm
– Store information at the entity granularity
– Integrate information by inter-linking entities

• Advantages?
– Coarser granularity compared to keywords
• More natural, e.g., brain functions similarly (or is it the other way
around?)

– Denormalized information compared to RDBMSs
• Schema-later, heterogeneity, sparsity
• Pre-computed joins, “Semantic” linking

• Drawbacks?
4
Prominence of Entity-Powered Apps
–
–
–
–
–
–
–
–
–
–

Collaborative Editing (Wikipedia’s wikidata)
Social Networks (Facebook’s Open Graph)
Serious Networks (LinkedIn’s Business Graph)
Web Search (Google’s Knowledge Graph)
Software Integration (Yahoo!’s WOO)
Question Answering (IBM’s Watson)
Dynamic Websites (BBC’s London Olympics)
Open Data (data.gov.uk, linkeddata.org)
Most of our own applications (exascale.info)
etc. etc.

5
Problem: Limited Access to Entities (1)
• 70+% of the world’s population has no or
very limited access to the Web

[Ahmed Shams 2013]

6
Problem: Limited Access to Entities (2)
• Even in developed countries, deploying
collaborative entity-editing platforms is
technically exceedingly challenging
– Local/Global QoS to serve arbitrary entity data
• Performance, scale-out

– Collaborative aspects
• Transactions, versioning, integration

– Offline / mobile concerns
• Caching / replication / serializability

7
Potential Building Blocks?
• … for a hybrid online/offline, collaborative entity
registry:
–
–
–
–
–
–
–

DNS3 (never meant for entity data)
DOA (awkward Web integration, limited features)
RDBMSs (ACID? Impedance mismatch, limited perf.)
P2P / decentralized CDNs (performance issues)
Native RDF Stores (too expressive; scalability / perf. issues)
(Structured) Inverted Indices (no transactions; slow updates)
noSQL key-value / document stores
(wrong PACELC trade-offs; (some) performance issues)

[Iliya Enchev 2012
ISWC 2013]
8
Our Solution: ERS, the
Entity Registry System
• Three-tier solution to deploy entity-powered apps
– Flexible
• Seamlessly reconcile entities in local / ad-hoc / global modes

– Collaborative
• Transactional consistency, data versioning

– Scalable
• Bridges, scale-out servers, tunable consistency

– Open-source
• https://github.com/ers-devs
9
ERS Architecture (1)
• Contributors: Contributors read and edit the contents of
the registry. They may create and delete entities, look for
entities, and contribute to the entities’ descriptions.
• Bridges: Bridges do not directly contribute to the
contents of the registry. They are used to connect
isolated closed networks and improve the availability of
the descriptions shared by the contributors.

• Aggregators: Some use-cases may require the
presence of global servers that contains a copy of all the
data provided individually by the contributors. The global
server provides a single entry point to the registry.
10
ERS Architecture (2)

www

www

11
Sample Deployment [Videos]

12
ERS Data & API
• Data: flexible RDF quads serialized as
– JSON documents (contributors, bridges)
– Key, value pairs (aggregators)

• Atomic & serializable operations through
various locking granularities
– Insert entity (IE), Insert property (IP), Update property (UP), Delete
property (DP), Delete entity (DE), Shallow entity copy (SC), Deep entity
copy (DC), Insert link between two entities (IL), Delete link between two
entities (DL)
=> Consistency
13
Unique Technical Features
I.

Seamless, best-effort entity synchronization
– Local, ad-hoc, global modes

II.

Fault-tolerance and decentralization
– Property replication, no single point of failure

III. Built-in versioning and provenance
– Collaborative entity editing made easy

IV. Linear scalability
– Tunable consistency levels
14
Performance: Distributed Locking (1)
• Decentralized, multi-granular locking protocol
for transactional consistency on top of
persistency layers

15
Performance: Distributed Locking (2)
• Fault-tolerant, though Paxos-like algorithms
limit horizontal scalability

16
Performance: Optimistic Concurrency (1)
• ERS typically operates on insert-heavy low-conflict workloads
– Most of the time new entities are inserted and properties added

• Goal: separate validation from write operations
– Per worker TX management
– Distributed ID generator for consistent commits

17
Tunable Consistency in ERS
• Weak Writes: For each TX a CID is acquired and a new record
is written, write is not validated
• Strong Writes: For each TX a CID is acquired and a new
record is written. After the write, a read verifies the visibility; if
the record is not visible the write is performed again
• Write validation: Forward chaining of records based on the
highest CID, last writer wins
Inserting 111 is
possible using weak
111
writes, but the write
cannot be validated

18
OC – Execution Stack
• Breakdown of a single
write operation with
tunable consistency
Performance: Optimistic Concurrency (2)

=> Linear scalability even for write-heavy workloads
20
Ongoing Deployments (1)
• Swiss-Dutch local/global social messaging
Cloud hosted Aggregator

Bridge in
Amsterdam
(VUA)
Contributors on
VUA internal
network

Bridge in
Switzerland
(Exascale)
Contributors on
Exascale Infolab
internal network

21
Ongoing Deployments (2)
• Test deployments on new affordable
devices

SmilePlug
(cloud-based learning)

Wandboard
(ultra low power computer)

Earl (backcountry survival tablet)
Ongoing Deployments (3)
• Entity-powered apps for the Sugar Learning
Platform

23
Ongoing Deployments (4)
• ERS for Ambient Assisted Living of elderly
persons in tropical environments
[AAL research group @ VU]

24
Conclusions
• The Web is becoming entity-centric
–
–

Land of opportunities for new registries
Urgent needs for developing countries

• ERS is a unique, open-source entity registry solution
supporting
–
–
–
–

Local / ad-hoc / global modes
Collaborative editing and entity versioning
Tunable consistency levels
Linear scalability

• Series of ongoing deployments
–

Stay tuned for more results and lessons learnt
Big Thanks to the whole ERS Team
Dutch Team @ DANS
Dr. Christophe Guéret

Swiss team @ XI
Prof. Dr. Philippe
Cudré-Mauroux

Dr. Marat Charlaganov

C. Dinu & Pepijn Kroes

Dr. Martin Grund

Teodor Macicas

… and to our MSc students:
–

Iliya Enchev and Ahmed Shams
And Special Thanks to…
• Scott Hollenbeck, Debra Anderson, Allison
Mankin & the Internet Infrastructures Grant
team
• Dr. Burt Kaliski and his team
• Vincenzo Russo, Benoit Perroud, Romain
Cholat and the whole Verisign Fribourg office

… for their continued support
References
•

P. Cudré-Mauroux, G. Demartini, D.E. Difallah, A.E. Mostafa, V. Russo, and M. Thomas.
A Demonstration of DNS3: a Semantic-Aware DNS Service. ISWC 2011.

•

P. Cudré-Mauroux, G. Demartini, I. Enchev, C. Gueret and B. Perroud: Downscaling
Entity Registries for Ad-Hoc Environments. Downscale 2012.

•

M. Charlaganov, P. Cudré-Mauroux, C. Dinu, C. Guéret, M. Grund, T. Macicas:
Demonstrating The Entity Registry System: Implementing 5-Star Linked Data Without the
Web. ISWC 2013.

•

P. Cudré-Mauroux, I. Enchev, S. Fundatureanu, P.T. Groth, A. Haque, A. Harth, F.
Keppmann, D.P. Miranker, J. Sequeda, M. Wylot: NoSQL Databases for RDF: An
Empirical Evaluation. ISWC 2013.

•

M. Charlaganov, P. Cudré-Mauroux, C. Dinu, C. Guéret, M. Grund, T. Macicas: The Entity
Registry System: Implementing 5-Star Linked Data Without the Web. CoRR abs 2013.

•

M. Charlaganov, P. Cudré-Mauroux, C. Dinu, C. Guéret, M. Grund, P. Kroes, and T.
Macicas: Collaboratively Editing an Entity Registry in Poorly Connected Environments.
CAiSE 2014 [submitted].

28
Further Entity Research @ XI
•

R. Prokofyev, G. Demartini and P. Cudré-Mauroux: Effective Named Entity Recognition
for Idiosyncratic Web Collections. WWW 2014.

•

G. Demartini, D.E. Difallah., and P. Cudré-Mauroux: Large-scale linked data integration
using probabilistic reasoning and crowdsourcing. The VLDB Journal, 2013.

•

A. Tonon, M. Catasta, G. Demartini, P. Cudré-Mauroux, and K. Aberer: TRank: Ranking
Entity Types Using the Web of Data. ISWC 2013.

•

A. Tonon, G. Demartini, and P. Cudré-Mauroux: Combining inverted indices and
structured search for ad-hoc object retrieval. SIGIR 2012.

•

G. Demartini, D.E. Difallah, and P. Cudré-Mauroux: ZenCrowd: leveraging probabilistic
reasoning and crowdsourcing techniques for large-scale entity linking. WWW 2012.

29
Thanks a lot for your attention
http://exascale.info

30

More Related Content

What's hot

07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representationsMarco Quartulli
 
Introduction NL-HUG (April)
Introduction NL-HUG (April)Introduction NL-HUG (April)
Introduction NL-HUG (April)Evert Lammerts
 
WoSC19: Serverless Workflows for Indexing Large Scientific Data
WoSC19: Serverless Workflows for Indexing Large Scientific DataWoSC19: Serverless Workflows for Indexing Large Scientific Data
WoSC19: Serverless Workflows for Indexing Large Scientific DataUniversity of Chicago
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASAIan Foster
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Ian Foster
 
Vitus Masters Defense
Vitus Masters DefenseVitus Masters Defense
Vitus Masters DefensederDoc
 
My Final Project
My Final ProjectMy Final Project
My Final Projectaskkathir
 

What's hot (8)

07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
Introduction NL-HUG (April)
Introduction NL-HUG (April)Introduction NL-HUG (April)
Introduction NL-HUG (April)
 
WoSC19: Serverless Workflows for Indexing Large Scientific Data
WoSC19: Serverless Workflows for Indexing Large Scientific DataWoSC19: Serverless Workflows for Indexing Large Scientific Data
WoSC19: Serverless Workflows for Indexing Large Scientific Data
 
World Wide Web
World Wide WebWorld Wide Web
World Wide Web
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Vitus Masters Defense
Vitus Masters DefenseVitus Masters Defense
Vitus Masters Defense
 
My Final Project
My Final ProjectMy Final Project
My Final Project
 

Similar to The Entity Registry System @ Verisign Labs, 2013

The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...Christophe Guéret
 
OrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsOrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsFabrizio Fortino
 
Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs vty
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
 
Designing High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCDesigning High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCObject Automation
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking FunctionalityNicholas Loulloudes
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution vty
 
Scalable Parallel Programming in Python with Parsl
Scalable Parallel Programming in Python with ParslScalable Parallel Programming in Python with Parsl
Scalable Parallel Programming in Python with ParslGlobus
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudOla Spjuth
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud ComputingDavid Wallom
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataversevty
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloudNational Institute of Informatics
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Dr. Anita Goel
 
GridComputing-an introduction.ppt
GridComputing-an introduction.pptGridComputing-an introduction.ppt
GridComputing-an introduction.pptNileshkuGiri
 
Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Research Data Alliance
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Frederic Desprez
 

Similar to The Entity Registry System @ Verisign Labs, 2013 (20)

The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
 
OrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsOrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data Relationships
 
Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 
Designing High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPCDesigning High performance & Scalable Middleware for HPC
Designing High performance & Scalable Middleware for HPC
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionality
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
Scalable Parallel Programming in Python with Parsl
Scalable Parallel Programming in Python with ParslScalable Parallel Programming in Python with Parsl
Scalable Parallel Programming in Python with Parsl
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
grid computing
grid computinggrid computing
grid computing
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloud
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017
 
Grid computing
Grid computingGrid computing
Grid computing
 
GridComputing-an introduction.ppt
GridComputing-an introduction.pptGridComputing-an introduction.ppt
GridComputing-an introduction.ppt
 
slides15-1.pdf
slides15-1.pdfslides15-1.pdf
slides15-1.pdf
 
Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
 

More from eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictioneXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapeXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...eXascale Infolab
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceanseXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutioneXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataeXascale Infolab
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingeXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingeXascale Infolab
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 

More from eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 

Recently uploaded

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jNeo4j
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 

Recently uploaded (20)

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 

The Entity Registry System @ Verisign Labs, 2013

  • 1. The Promise of a Better Connected Digital World: Data Registry Systems Without the Web Philippe Cudré-Mauroux eXascale Infolab, University of Fribourg Switzerland Christophe Guéret VU University / DANS The Netherlands Verisign Labs Distinguished Speakers Series Verisign Labs, Reston–USA December 13, 2013
  • 3. Entity Data • Semi-structured, interlinked descriptions of shared instances – – – – – – Persons Objects Software Locations Sensors … 3
  • 4. Entities as Mediation • Rising paradigm – Store information at the entity granularity – Integrate information by inter-linking entities • Advantages? – Coarser granularity compared to keywords • More natural, e.g., brain functions similarly (or is it the other way around?) – Denormalized information compared to RDBMSs • Schema-later, heterogeneity, sparsity • Pre-computed joins, “Semantic” linking • Drawbacks? 4
  • 5. Prominence of Entity-Powered Apps – – – – – – – – – – Collaborative Editing (Wikipedia’s wikidata) Social Networks (Facebook’s Open Graph) Serious Networks (LinkedIn’s Business Graph) Web Search (Google’s Knowledge Graph) Software Integration (Yahoo!’s WOO) Question Answering (IBM’s Watson) Dynamic Websites (BBC’s London Olympics) Open Data (data.gov.uk, linkeddata.org) Most of our own applications (exascale.info) etc. etc. 5
  • 6. Problem: Limited Access to Entities (1) • 70+% of the world’s population has no or very limited access to the Web [Ahmed Shams 2013] 6
  • 7. Problem: Limited Access to Entities (2) • Even in developed countries, deploying collaborative entity-editing platforms is technically exceedingly challenging – Local/Global QoS to serve arbitrary entity data • Performance, scale-out – Collaborative aspects • Transactions, versioning, integration – Offline / mobile concerns • Caching / replication / serializability 7
  • 8. Potential Building Blocks? • … for a hybrid online/offline, collaborative entity registry: – – – – – – – DNS3 (never meant for entity data) DOA (awkward Web integration, limited features) RDBMSs (ACID? Impedance mismatch, limited perf.) P2P / decentralized CDNs (performance issues) Native RDF Stores (too expressive; scalability / perf. issues) (Structured) Inverted Indices (no transactions; slow updates) noSQL key-value / document stores (wrong PACELC trade-offs; (some) performance issues) [Iliya Enchev 2012 ISWC 2013] 8
  • 9. Our Solution: ERS, the Entity Registry System • Three-tier solution to deploy entity-powered apps – Flexible • Seamlessly reconcile entities in local / ad-hoc / global modes – Collaborative • Transactional consistency, data versioning – Scalable • Bridges, scale-out servers, tunable consistency – Open-source • https://github.com/ers-devs 9
  • 10. ERS Architecture (1) • Contributors: Contributors read and edit the contents of the registry. They may create and delete entities, look for entities, and contribute to the entities’ descriptions. • Bridges: Bridges do not directly contribute to the contents of the registry. They are used to connect isolated closed networks and improve the availability of the descriptions shared by the contributors. • Aggregators: Some use-cases may require the presence of global servers that contains a copy of all the data provided individually by the contributors. The global server provides a single entry point to the registry. 10
  • 13. ERS Data & API • Data: flexible RDF quads serialized as – JSON documents (contributors, bridges) – Key, value pairs (aggregators) • Atomic & serializable operations through various locking granularities – Insert entity (IE), Insert property (IP), Update property (UP), Delete property (DP), Delete entity (DE), Shallow entity copy (SC), Deep entity copy (DC), Insert link between two entities (IL), Delete link between two entities (DL) => Consistency 13
  • 14. Unique Technical Features I. Seamless, best-effort entity synchronization – Local, ad-hoc, global modes II. Fault-tolerance and decentralization – Property replication, no single point of failure III. Built-in versioning and provenance – Collaborative entity editing made easy IV. Linear scalability – Tunable consistency levels 14
  • 15. Performance: Distributed Locking (1) • Decentralized, multi-granular locking protocol for transactional consistency on top of persistency layers 15
  • 16. Performance: Distributed Locking (2) • Fault-tolerant, though Paxos-like algorithms limit horizontal scalability 16
  • 17. Performance: Optimistic Concurrency (1) • ERS typically operates on insert-heavy low-conflict workloads – Most of the time new entities are inserted and properties added • Goal: separate validation from write operations – Per worker TX management – Distributed ID generator for consistent commits 17
  • 18. Tunable Consistency in ERS • Weak Writes: For each TX a CID is acquired and a new record is written, write is not validated • Strong Writes: For each TX a CID is acquired and a new record is written. After the write, a read verifies the visibility; if the record is not visible the write is performed again • Write validation: Forward chaining of records based on the highest CID, last writer wins Inserting 111 is possible using weak 111 writes, but the write cannot be validated 18
  • 19. OC – Execution Stack • Breakdown of a single write operation with tunable consistency
  • 20. Performance: Optimistic Concurrency (2) => Linear scalability even for write-heavy workloads 20
  • 21. Ongoing Deployments (1) • Swiss-Dutch local/global social messaging Cloud hosted Aggregator Bridge in Amsterdam (VUA) Contributors on VUA internal network Bridge in Switzerland (Exascale) Contributors on Exascale Infolab internal network 21
  • 22. Ongoing Deployments (2) • Test deployments on new affordable devices SmilePlug (cloud-based learning) Wandboard (ultra low power computer) Earl (backcountry survival tablet)
  • 23. Ongoing Deployments (3) • Entity-powered apps for the Sugar Learning Platform 23
  • 24. Ongoing Deployments (4) • ERS for Ambient Assisted Living of elderly persons in tropical environments [AAL research group @ VU] 24
  • 25. Conclusions • The Web is becoming entity-centric – – Land of opportunities for new registries Urgent needs for developing countries • ERS is a unique, open-source entity registry solution supporting – – – – Local / ad-hoc / global modes Collaborative editing and entity versioning Tunable consistency levels Linear scalability • Series of ongoing deployments – Stay tuned for more results and lessons learnt
  • 26. Big Thanks to the whole ERS Team Dutch Team @ DANS Dr. Christophe Guéret Swiss team @ XI Prof. Dr. Philippe Cudré-Mauroux Dr. Marat Charlaganov C. Dinu & Pepijn Kroes Dr. Martin Grund Teodor Macicas … and to our MSc students: – Iliya Enchev and Ahmed Shams
  • 27. And Special Thanks to… • Scott Hollenbeck, Debra Anderson, Allison Mankin & the Internet Infrastructures Grant team • Dr. Burt Kaliski and his team • Vincenzo Russo, Benoit Perroud, Romain Cholat and the whole Verisign Fribourg office … for their continued support
  • 28. References • P. Cudré-Mauroux, G. Demartini, D.E. Difallah, A.E. Mostafa, V. Russo, and M. Thomas. A Demonstration of DNS3: a Semantic-Aware DNS Service. ISWC 2011. • P. Cudré-Mauroux, G. Demartini, I. Enchev, C. Gueret and B. Perroud: Downscaling Entity Registries for Ad-Hoc Environments. Downscale 2012. • M. Charlaganov, P. Cudré-Mauroux, C. Dinu, C. Guéret, M. Grund, T. Macicas: Demonstrating The Entity Registry System: Implementing 5-Star Linked Data Without the Web. ISWC 2013. • P. Cudré-Mauroux, I. Enchev, S. Fundatureanu, P.T. Groth, A. Haque, A. Harth, F. Keppmann, D.P. Miranker, J. Sequeda, M. Wylot: NoSQL Databases for RDF: An Empirical Evaluation. ISWC 2013. • M. Charlaganov, P. Cudré-Mauroux, C. Dinu, C. Guéret, M. Grund, T. Macicas: The Entity Registry System: Implementing 5-Star Linked Data Without the Web. CoRR abs 2013. • M. Charlaganov, P. Cudré-Mauroux, C. Dinu, C. Guéret, M. Grund, P. Kroes, and T. Macicas: Collaboratively Editing an Entity Registry in Poorly Connected Environments. CAiSE 2014 [submitted]. 28
  • 29. Further Entity Research @ XI • R. Prokofyev, G. Demartini and P. Cudré-Mauroux: Effective Named Entity Recognition for Idiosyncratic Web Collections. WWW 2014. • G. Demartini, D.E. Difallah., and P. Cudré-Mauroux: Large-scale linked data integration using probabilistic reasoning and crowdsourcing. The VLDB Journal, 2013. • A. Tonon, M. Catasta, G. Demartini, P. Cudré-Mauroux, and K. Aberer: TRank: Ranking Entity Types Using the Web of Data. ISWC 2013. • A. Tonon, G. Demartini, and P. Cudré-Mauroux: Combining inverted indices and structured search for ad-hoc object retrieval. SIGIR 2012. • G. Demartini, D.E. Difallah, and P. Cudré-Mauroux: ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. WWW 2012. 29
  • 30. Thanks a lot for your attention http://exascale.info 30

Editor's Notes

  1. The correct path is 101-105-110-112-113If 111 is inserted it cannot be found, thus the transaction has to be repeated
  2. Scalability