Talk on the Entity Registry System (ERS)
Verisign Headquarters, Distinguished Speakers Series
https://www.verisigninc.com/en_US/why-verisign/innovation-initiatives/labs/news/distinguished-speaker-series/advancing-technologies/index.xhtml
http://iswc2013.semanticweb.org/sites/default/files/iswc_demo_4.pdf
https://github.com/ers-devs
API Governance and Monetization - The evolution of API governance
The Entity Registry System @ Verisign Labs, 2013
1. The Promise of a Better Connected Digital World:
Data Registry Systems Without the Web
Philippe Cudré-Mauroux
eXascale Infolab, University of Fribourg
Switzerland
Christophe Guéret
VU University / DANS
The Netherlands
Verisign Labs Distinguished Speakers Series
Verisign Labs, Reston–USA
December 13, 2013
4. Entities as Mediation
• Rising paradigm
– Store information at the entity granularity
– Integrate information by inter-linking entities
• Advantages?
– Coarser granularity compared to keywords
• More natural, e.g., brain functions similarly (or is it the other way
around?)
– Denormalized information compared to RDBMSs
• Schema-later, heterogeneity, sparsity
• Pre-computed joins, “Semantic” linking
• Drawbacks?
4
5. Prominence of Entity-Powered Apps
–
–
–
–
–
–
–
–
–
–
Collaborative Editing (Wikipedia’s wikidata)
Social Networks (Facebook’s Open Graph)
Serious Networks (LinkedIn’s Business Graph)
Web Search (Google’s Knowledge Graph)
Software Integration (Yahoo!’s WOO)
Question Answering (IBM’s Watson)
Dynamic Websites (BBC’s London Olympics)
Open Data (data.gov.uk, linkeddata.org)
Most of our own applications (exascale.info)
etc. etc.
5
6. Problem: Limited Access to Entities (1)
• 70+% of the world’s population has no or
very limited access to the Web
[Ahmed Shams 2013]
6
7. Problem: Limited Access to Entities (2)
• Even in developed countries, deploying
collaborative entity-editing platforms is
technically exceedingly challenging
– Local/Global QoS to serve arbitrary entity data
• Performance, scale-out
– Collaborative aspects
• Transactions, versioning, integration
– Offline / mobile concerns
• Caching / replication / serializability
7
8. Potential Building Blocks?
• … for a hybrid online/offline, collaborative entity
registry:
–
–
–
–
–
–
–
DNS3 (never meant for entity data)
DOA (awkward Web integration, limited features)
RDBMSs (ACID? Impedance mismatch, limited perf.)
P2P / decentralized CDNs (performance issues)
Native RDF Stores (too expressive; scalability / perf. issues)
(Structured) Inverted Indices (no transactions; slow updates)
noSQL key-value / document stores
(wrong PACELC trade-offs; (some) performance issues)
[Iliya Enchev 2012
ISWC 2013]
8
9. Our Solution: ERS, the
Entity Registry System
• Three-tier solution to deploy entity-powered apps
– Flexible
• Seamlessly reconcile entities in local / ad-hoc / global modes
– Collaborative
• Transactional consistency, data versioning
– Scalable
• Bridges, scale-out servers, tunable consistency
– Open-source
• https://github.com/ers-devs
9
10. ERS Architecture (1)
• Contributors: Contributors read and edit the contents of
the registry. They may create and delete entities, look for
entities, and contribute to the entities’ descriptions.
• Bridges: Bridges do not directly contribute to the
contents of the registry. They are used to connect
isolated closed networks and improve the availability of
the descriptions shared by the contributors.
• Aggregators: Some use-cases may require the
presence of global servers that contains a copy of all the
data provided individually by the contributors. The global
server provides a single entry point to the registry.
10
13. ERS Data & API
• Data: flexible RDF quads serialized as
– JSON documents (contributors, bridges)
– Key, value pairs (aggregators)
• Atomic & serializable operations through
various locking granularities
– Insert entity (IE), Insert property (IP), Update property (UP), Delete
property (DP), Delete entity (DE), Shallow entity copy (SC), Deep entity
copy (DC), Insert link between two entities (IL), Delete link between two
entities (DL)
=> Consistency
13
14. Unique Technical Features
I.
Seamless, best-effort entity synchronization
– Local, ad-hoc, global modes
II.
Fault-tolerance and decentralization
– Property replication, no single point of failure
III. Built-in versioning and provenance
– Collaborative entity editing made easy
IV. Linear scalability
– Tunable consistency levels
14
15. Performance: Distributed Locking (1)
• Decentralized, multi-granular locking protocol
for transactional consistency on top of
persistency layers
15
17. Performance: Optimistic Concurrency (1)
• ERS typically operates on insert-heavy low-conflict workloads
– Most of the time new entities are inserted and properties added
• Goal: separate validation from write operations
– Per worker TX management
– Distributed ID generator for consistent commits
17
18. Tunable Consistency in ERS
• Weak Writes: For each TX a CID is acquired and a new record
is written, write is not validated
• Strong Writes: For each TX a CID is acquired and a new
record is written. After the write, a read verifies the visibility; if
the record is not visible the write is performed again
• Write validation: Forward chaining of records based on the
highest CID, last writer wins
Inserting 111 is
possible using weak
111
writes, but the write
cannot be validated
18
19. OC – Execution Stack
• Breakdown of a single
write operation with
tunable consistency
24. Ongoing Deployments (4)
• ERS for Ambient Assisted Living of elderly
persons in tropical environments
[AAL research group @ VU]
24
25. Conclusions
• The Web is becoming entity-centric
–
–
Land of opportunities for new registries
Urgent needs for developing countries
• ERS is a unique, open-source entity registry solution
supporting
–
–
–
–
Local / ad-hoc / global modes
Collaborative editing and entity versioning
Tunable consistency levels
Linear scalability
• Series of ongoing deployments
–
Stay tuned for more results and lessons learnt
26. Big Thanks to the whole ERS Team
Dutch Team @ DANS
Dr. Christophe Guéret
Swiss team @ XI
Prof. Dr. Philippe
Cudré-Mauroux
Dr. Marat Charlaganov
C. Dinu & Pepijn Kroes
Dr. Martin Grund
Teodor Macicas
… and to our MSc students:
–
Iliya Enchev and Ahmed Shams
27. And Special Thanks to…
• Scott Hollenbeck, Debra Anderson, Allison
Mankin & the Internet Infrastructures Grant
team
• Dr. Burt Kaliski and his team
• Vincenzo Russo, Benoit Perroud, Romain
Cholat and the whole Verisign Fribourg office
… for their continued support
28. References
•
P. Cudré-Mauroux, G. Demartini, D.E. Difallah, A.E. Mostafa, V. Russo, and M. Thomas.
A Demonstration of DNS3: a Semantic-Aware DNS Service. ISWC 2011.
•
P. Cudré-Mauroux, G. Demartini, I. Enchev, C. Gueret and B. Perroud: Downscaling
Entity Registries for Ad-Hoc Environments. Downscale 2012.
•
M. Charlaganov, P. Cudré-Mauroux, C. Dinu, C. Guéret, M. Grund, T. Macicas:
Demonstrating The Entity Registry System: Implementing 5-Star Linked Data Without the
Web. ISWC 2013.
•
P. Cudré-Mauroux, I. Enchev, S. Fundatureanu, P.T. Groth, A. Haque, A. Harth, F.
Keppmann, D.P. Miranker, J. Sequeda, M. Wylot: NoSQL Databases for RDF: An
Empirical Evaluation. ISWC 2013.
•
M. Charlaganov, P. Cudré-Mauroux, C. Dinu, C. Guéret, M. Grund, T. Macicas: The Entity
Registry System: Implementing 5-Star Linked Data Without the Web. CoRR abs 2013.
•
M. Charlaganov, P. Cudré-Mauroux, C. Dinu, C. Guéret, M. Grund, P. Kroes, and T.
Macicas: Collaboratively Editing an Entity Registry in Poorly Connected Environments.
CAiSE 2014 [submitted].
28
29. Further Entity Research @ XI
•
R. Prokofyev, G. Demartini and P. Cudré-Mauroux: Effective Named Entity Recognition
for Idiosyncratic Web Collections. WWW 2014.
•
G. Demartini, D.E. Difallah., and P. Cudré-Mauroux: Large-scale linked data integration
using probabilistic reasoning and crowdsourcing. The VLDB Journal, 2013.
•
A. Tonon, M. Catasta, G. Demartini, P. Cudré-Mauroux, and K. Aberer: TRank: Ranking
Entity Types Using the Web of Data. ISWC 2013.
•
A. Tonon, G. Demartini, and P. Cudré-Mauroux: Combining inverted indices and
structured search for ad-hoc object retrieval. SIGIR 2012.
•
G. Demartini, D.E. Difallah, and P. Cudré-Mauroux: ZenCrowd: leveraging probabilistic
reasoning and crowdsourcing techniques for large-scale entity linking. WWW 2012.
29
30. Thanks a lot for your attention
http://exascale.info
30
Editor's Notes
The correct path is 101-105-110-112-113If 111 is inserted it cannot be found, thus the transaction has to be repeated