Unlocking the Potential of the Cloud for IBM Power Systems
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Connected Environments
1. Data Archiving and Networked Services
The Entity Registry System
Collaborative Editing of Entity Data in
Poorly Connected Environments
Christophe Guéret (@cgueret)
Philippe Cudré-Mauroux
AAAI Spring Symposium #SD4HumTech15
March 23-25, 2015 Stanford University
2. The big question
“This symposium aims to address the
question of whether the technology is
mature enough to warrant further
investigation or whether the disadvantages
outweight the utility of SD for this domain”
3. And the answer (for Linked Data) is…
Yes, it is mature enough !
But Linked Data platforms need to be
downscaled before they can deliver their full
potential in the specific context. So far most of
what the community has to offer does not fit
4. On the upscaling of platforms
●
General design approach
– Design a “one size fits all” data model for the common space
– Make a centralised store in the cloud
– Connect users to the store
●
Scale up to cater for more users
●
Have a hard time trying to fit in users when
– Limited or no infrastructures (connectivity, electricity, ...)
– Limited agreement on models / data heterogeneity
– Different level of (computer) literacy
5. On the opposite
●
Downscaling platforms to make them fit specific,
challenging, usage contexts and use-cases
http://worldwidesemanticweb.org/
6. Other WWSW aspects
●
Interfaces : non text-centric
interaction with data (SPARQL-
Voice, Icons, …)
●
Relevancy: find the subset of
structure data that is the most
relevant, contextualised
reasoning, local+global data
●
Data: publication of development
related data as Linked Open Data
(IATI, IDS, ...)
Short video on our website in “About”
9. Collaboratively describing entities
●
A single information space can be useful
●
But even when not done in a challenging context,
deploying collaborative entity-editing platforms is
technically exceedingly challenging
– Local/Global QoS to serve arbitrary entity data
●
Performance, scale-out
– Collaborative aspects
●
Transactions, versioning, integration
– Offline / mobile concerns
●
Caching / replication / serializability
10. One solution: ERS
●
Web-less Linked Data
●
Three-tier solution to deploy entity-powered apps
– Flexible
●
Seamlessly reconcile entities in local / ad-hoc / global modes
– Collaborative
●
Transactional consistency, data versioning
– Scalable
●
Shared data store, tunable completeness
– Open-source
●
https://github.com/ers-devs
12. Introducing the “Contributors”
●
The central store is removed
●
Contributors are they own trusted data store
●
They can cache content from other contributors
●
They have a private store for private data
13. Adding a “Bridge”
●
Can only cache content from Contributors
●
Useful for asynchronous messaging
●
Convenient for groups (schools, clusters, ...)
14. And put it on a bus, or something else
●
Can be used to implement a sneakernet
●
Contributors can also do this when visiting different
bridges
15. Need to get all the data in one place ?
●
Use the third component of ERS : Aggregator
●
An Aggregator aggregates the content coming from
several Bridges
16. About consistency of statements
●
Different point of view are, by design, found in
separated containers
●
Provenance data is available for all containers
●
Voting/concensus can resolve conflicts
<house1> “#people” “1”
<house1> “#people” “2”
<house1> “#people” “2”
<house1> “#people” “1”
17. About updates and suppressions
●
Statements containers are uniquely identified
●
Updates
– New versions of documents get automatically replicated
●
Deletes
– Only the creator of a given container can delete it
– Deletion in cache store do not get replicated
18. What ERS does not solve (yet)
●
Minting of identifiers
– Every contributor can create their own identifiers. There is no
enforced scheme
●
Global search for existing identifiers
– Only local search is possible
●
Modeling of data
– Selection of vocabulary comes from the applications using ERS
19. Take away message
●
Linked Data is a good way to create a globally
integrated, yet decentralised, information space for
describing entities
●
ERS is provides simple Linked Data without the
Web, without HTTP, without SPARQL, ...
●
Reference implementation is open source, based on
CouchDB/JSON-LD/Python/Avahi, lightweight, and
compatible with HXL hashtags approach ;-)