This document provides a summary of a presentation on using MongoDB as a high performance graph database. The presentation discusses:
1) The speaker's company Talis Education using MongoDB for 8 months to store and query graph data, and implementing a software wrapper on top of MongoDB to improve scalability and performance.
2) Common graph data models using nodes and edges or resources and properties. Basic graph concepts like undirected vs directed graphs are explained.
3) Storing graph data as RDF triples, with the standard subject-predicate-object structure. Benefits of RDF such as reusing common schemas and easy data merging are highlighted.
4) SPARQL, the query language for RDF,
Webinar: Working with Graph Data in MongoDBMongoDB
With the release of MongoDB 3.4, the number of applications that can take advantage of MongoDB has expanded. In this session we will look at using MongoDB for representing graphs and how graph relationships can be modeled in MongoDB.
We will also look at a new aggregation operation that we recently implemented for graph traversal and computing transitive closure. We will include an overview of the new operator and provide examples of how you can exploit this new feature in your MongoDB applications.
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
The popularity of dedicated graph technologies has risen greatly in recent years, at least partly fuelled by the explosion in social media and similar systems, where a friend network or recommendation engine is often a critical component when delivering a successful application. MongoDB 3.4 introduces a new Aggregation Framework graph operator, $graphLookup, to enable some of these types of use cases to be built easily on top of MongoDB. We will see how semantic relationships can be modelled inside MongoDB today, how the new $graphLookup operator can help simplify this in 3.4, and how $graphLookup can be used to leverage these relationships and build a commercially focused news article recommendation system.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
Webinar: Working with Graph Data in MongoDBMongoDB
With the release of MongoDB 3.4, the number of applications that can take advantage of MongoDB has expanded. In this session we will look at using MongoDB for representing graphs and how graph relationships can be modeled in MongoDB.
We will also look at a new aggregation operation that we recently implemented for graph traversal and computing transitive closure. We will include an overview of the new operator and provide examples of how you can exploit this new feature in your MongoDB applications.
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
The popularity of dedicated graph technologies has risen greatly in recent years, at least partly fuelled by the explosion in social media and similar systems, where a friend network or recommendation engine is often a critical component when delivering a successful application. MongoDB 3.4 introduces a new Aggregation Framework graph operator, $graphLookup, to enable some of these types of use cases to be built easily on top of MongoDB. We will see how semantic relationships can be modelled inside MongoDB today, how the new $graphLookup operator can help simplify this in 3.4, and how $graphLookup can be used to leverage these relationships and build a commercially focused news article recommendation system.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
JSON-LD is a set of W3C standards track specifications for representing Linked Data in JSON. It is fully compatible with the RDF data model, but allows developers to work with data entirely within JSON.
More information on JSON-LD can be found at http://json-ld.org/
Intro to MongoDB
Get a jumpstart on MongoDB, use cases, and next steps for building your first app with Buzz Moschetti, MongoDB Enterprise Architect.
@BuzzMoschetti
Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.
Presented by Tom Schreiber, Senior Consulting Engineer, MongoDB
MongoDB supports a wide range of indexing options to enable fast querying of your data, but what are the right strategies for your application? In this talk we’ll cover how indexing works, the various indexing options, and cover use cases where each might be useful. We'll dive into common pitfalls using real-world examples to ensure that you're ready for scale. We'll show you the tools and techniques for diagnosing and tuning the performance of your MongoDB deployment. Whether you're running into problems or just want to optimize your performance, these skills will be useful.
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...Databricks
Tuning a Spark ML model with cross-validation can be an extremely computationally expensive process. As the number of hyperparameter combinations increases, so does the number of models being evaluated. The default configuration in Spark is to evaluate each of these models one-by-one to select the best performing. When running this process with a large number of models, if the training and evaluation of a model does not fully utilize the available cluster resources then that waste will be compounded for each model and lead to long run times.
Enabling model parallelism in Spark cross-validation, from Spark 2.3, will allow for more than one model to be trained and evaluated at the same time and make better use of cluster resources. We will go over how to enable this setting in Spark, what effect this will have on an example ML pipeline and best practices to keep in mind when using this feature.
Additionally, we will discuss ongoing work in progress to reduce the amount of computation required when tuning ML pipelines by eliminating redundant transformations and intelligently caching intermediate datasets. This can be combined with model parallelism to further reduce the run time of cross-validation for complex machine learning pipelines.
In a real life almost any project deals with the
tree structures. Different kinds of taxonomies,
site structures etc require modeling of
hierarchy relations.
Typical approaches used
● Model Tree Structures with Child References
● Model Tree Structures with Parent References
● Model Tree Structures with an Array of Ancestors
● Model Tree Structures with Materialized Paths
● Model Tree Structures with Nested Sets
Hydra: A Vocabulary for Hypermedia-Driven Web APIsMarkus Lanthaler
Presentation of the paper "Hydra: A Vocabulary for Hypermedia-Driven Web APIs" at the 6th Workshop on Linked Data on the Web (LDOW2013) at the WWW2013 in Rio de Janeiro, Brazil
In this presentation, Raghavendra BM of Valuebound has discussed the basics of MongoDB - an open-source document database and leading NoSQL database.
----------------------------------------------------------
Get Socialistic
Our website: http://valuebound.com/
LinkedIn: http://bit.ly/2eKgdux
Facebook: https://www.facebook.com/valuebound/
Twitter: http://bit.ly/2gFPTi8
JSON-LD is a set of W3C standards track specifications for representing Linked Data in JSON. It is fully compatible with the RDF data model, but allows developers to work with data entirely within JSON.
More information on JSON-LD can be found at http://json-ld.org/
Intro to MongoDB
Get a jumpstart on MongoDB, use cases, and next steps for building your first app with Buzz Moschetti, MongoDB Enterprise Architect.
@BuzzMoschetti
Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.
Presented by Tom Schreiber, Senior Consulting Engineer, MongoDB
MongoDB supports a wide range of indexing options to enable fast querying of your data, but what are the right strategies for your application? In this talk we’ll cover how indexing works, the various indexing options, and cover use cases where each might be useful. We'll dive into common pitfalls using real-world examples to ensure that you're ready for scale. We'll show you the tools and techniques for diagnosing and tuning the performance of your MongoDB deployment. Whether you're running into problems or just want to optimize your performance, these skills will be useful.
Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan ...Databricks
Tuning a Spark ML model with cross-validation can be an extremely computationally expensive process. As the number of hyperparameter combinations increases, so does the number of models being evaluated. The default configuration in Spark is to evaluate each of these models one-by-one to select the best performing. When running this process with a large number of models, if the training and evaluation of a model does not fully utilize the available cluster resources then that waste will be compounded for each model and lead to long run times.
Enabling model parallelism in Spark cross-validation, from Spark 2.3, will allow for more than one model to be trained and evaluated at the same time and make better use of cluster resources. We will go over how to enable this setting in Spark, what effect this will have on an example ML pipeline and best practices to keep in mind when using this feature.
Additionally, we will discuss ongoing work in progress to reduce the amount of computation required when tuning ML pipelines by eliminating redundant transformations and intelligently caching intermediate datasets. This can be combined with model parallelism to further reduce the run time of cross-validation for complex machine learning pipelines.
In a real life almost any project deals with the
tree structures. Different kinds of taxonomies,
site structures etc require modeling of
hierarchy relations.
Typical approaches used
● Model Tree Structures with Child References
● Model Tree Structures with Parent References
● Model Tree Structures with an Array of Ancestors
● Model Tree Structures with Materialized Paths
● Model Tree Structures with Nested Sets
Hydra: A Vocabulary for Hypermedia-Driven Web APIsMarkus Lanthaler
Presentation of the paper "Hydra: A Vocabulary for Hypermedia-Driven Web APIs" at the 6th Workshop on Linked Data on the Web (LDOW2013) at the WWW2013 in Rio de Janeiro, Brazil
In this presentation, Raghavendra BM of Valuebound has discussed the basics of MongoDB - an open-source document database and leading NoSQL database.
----------------------------------------------------------
Get Socialistic
Our website: http://valuebound.com/
LinkedIn: http://bit.ly/2eKgdux
Facebook: https://www.facebook.com/valuebound/
Twitter: http://bit.ly/2gFPTi8
I gave this presentation at the cetis 2009 Find and Seek workshop. It discusses how we could bring the worlds of OER and linked data together to aid discovery and reuse. I discuss two initiatives Talis is doing in this area - the Talis Aspire resource list management system and the Talis Incubator fund for Open Eduction
Using Linked Data as the basis for Learning Resource RecommendationChris Clarke
This presentation was part of the SemHE workshop at ECTEL09. It told the story about how we have built Talis Aspire to be a native linked data application and how we are to use that as the basis for learning resource recommendation. The presentation goes on to warn the audience of four major areas they should think about when building applications on linked data - sustainability, provenance, licensing and reliability.
A Resource List Management Tool based on Linked Open Data PrinciplesChris Clarke
This is the paper I presented at the eswc2009 In-Use Track. It describes a system that allows users to create linked open data describing course resource lists using a drag and drop user interface metaphor. The paper explores and critiques the technical approach used to build the system.
I gave this talk at the 2009 Association of Subscription Agents. It describes the concept of technology waves, and how we are at the start of the semantic web waves. The presentation describes two projects Talis is undertaking with scholarly content around the semantic web - a society social networking prototype called Xiphos Network and a product called Talis Aspire to deliver eContent to end users via scholarly resource lists.
2. Using MongoDB as a high
performance graph
database
MongoDB UK, 20th June 2012
Chris Clarke
CTO, Talis Education Limited
Thursday, 21 June 12
Who is talis?
Using mongo about 8 months (since 2.0)
5 months in production
3. What this talk not about
Thursday, 21 June 12
A blueprint for what you should do
A pitch to encourage you to take our approach
Providing or proving performance benchmarks
Evangelism for the semantic web or linked data
Encouraging you to contribute/download/use an open source
project
Optimised for your use case
Although we can talk to you about any of the above (see me
after)
4. So, what is this talk about?
Thursday, 21 June 12
Our journey of using MongoDB as a high performance graph
database
Specifically the software wrapper we implemented on top of
Mongo to give us a leg up in terms of scalability and performance
To give you some ideas for how to work with graph data models
if you’d like to use document databases
5. GRAPHS 101
Thursday, 21 June 12
Apologies
Nodes and edges
or
Resources and properties
Really easy to represents facts
6. John knows Jane
John knows Jane
Thursday, 21 June 12
Ball and stick diagrams
This is an undirected graph. It implies that John knows Jane and
Jane knows John. The property has no directional significance.
7. John knows Jane
Jane knows John
John knows Jane
Thursday, 21 June 12
This is an undirected graph. It implies that John knows Jane and
Jane knows John. The property has no directional significance.
8. John knows Jane
Jane ? John
John knows Jane
Thursday, 21 June 12
This is a directed graph. The relationship is one way. To add Jane
knows John we need a second property.
We will only use directed graphs from herein as they are more
specific
9. John knows Jane
Jane knows John
knows
John Jane
knows
Thursday, 21 June 12
11. Subject Property Object
John knows Jane
Thursday, 21 June 12
This is a triple
Property = predicate
12. Subject Property Object
John knows Jane
Jane knows John
Thursday, 21 June 12
This is a second triple
The same resource can be a subject or an object
13. Subject Property Object
http://example.com/John http://xmlns.com/foaf/0.1/knows http://example.com/Jane
Thursday, 21 June 12
RDF
Resources and properties as URIs
URIs can be dereferenced
Can share common property descriptions (RDF Schemas)
Here using FOAF - billions if not trillions of triples defined using
FOAF
14. Subject Property Object
http://example.com/John foaf:knows http://example.com/Jane
http://example.com/John foaf:name “John”
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
Thursday, 21 June 12
Namespaces for readability
In RDF subjects are always uris
But objects can be literals i.e. plain text
Many RDF/graph databases allow you to further type literals as
dates, numbers, etc.
15. Subject Property Object
http://example.com/John rdf:type foaf:Person
http://example.com/John foaf:name “John”
http://example.com/John foaf:knows http://example.com/Jane
http://example.com/Jane rdf:type foaf:Person
http://example.com/Jane foaf:name “Jane”
http://example.com/Jane foaf:knows http://example.com/John
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
Thursday, 21 June 12
Here we type John and Jane as foaf:Person using rdf:type
Note both John and Jane appear as subjects and resources
This RDF graph represents six facts
16. foaf:Person
rdf:type rdf:type
foaf:knows
example:John example:Jane
foaf:knows
“John” “Jane”
Thursday, 21 June 12
Here it is in ball and stick
17. FFS! I can do that in two
minutes in BSON
Thursday, 21 June 12
18. > db.people.find()
{
_id: ObjectID(‘123’),
name: ‘John’
knows: [ObjectID(‘456’)]
},
{
_id: ObjectID(‘456’),
name: ‘Jane’
knows: [ObjectID(‘123’)]
}
Thursday, 21 June 12
Yes, you can!
Data only makes sense inside your db though
20. Some useful stuff, using RDF
Thursday, 21 June 12
Lets look at some reasons why we think RDF is good
21. attribution
Thursday, 21 June 12
This is the linked open data cloud
Linked data is a way RDF published on the open web
Search linked data TED to hear why Tim Burness Lee cares about
this
Each blob on this diagram represents an open, interlinked
dataset. The lines between them represent the interlinking
between data sets
Billions of public “facts” and growing exponentially from sites
such as BBC, governments, Last.fm, Wikipedia
22. Merging data from different
sources is really easy
Thursday, 21 June 12
Because the format is subject, predicate, object the shape of RDF
is always the same.
Because schemas are public and widely shared the same
properties are used all over the place.
Really easy to use this data in your own app and remix
23. Dataset A Dataset B
example:John example:John
rdf:type foaf:name
“John”
foaf:Person
Thursday, 21 June 12
24. Dataset A+B
example:John
rdf:type foaf:name
“John”
foaf:Person
Thursday, 21 June 12
Really easy to merge graphs
“Designed in” to the data format
Lots of existing tooling to do this
26. PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?email
WHERE {
?person a foaf:Person.
?person foaf:name ?name.
?person foaf:mbox ?email.
}
ORDER BY ?name
LIMIT 50
Thursday, 21 June 12
SPARQL is mega flexible. Lots of functions for grouping, walking
graphs, pattern matching, inference, UNIONS, Geo extensions
etc. etc. - all that shit.
Most if not all of those datasets will have a SPARQL endpoint you
can query
27. SELECT Tabular
DESCRIBE Graph
ASK Boolean
CONSTRUCT Graph
Thursday, 21 June 12
4 main query types
28. PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?email
WHERE {
FFS! That looks like SQL!
?person a foaf:Person.
?person foaf:name ?name.
?person foaf:mbox ?email.
}
ORDER BY ?name
LIMIT 50
Thursday, 21 June 12
Yes it does. The WHERE clause is basically doing a shit load of
joins. I’ll come back to that.
29. Application DB Triple store +
(SQL or other) SPARQL
Offline conversion process
Thursday, 21 June 12
Most datasets on the LOD diagram don’t exist natively as Linked
data and RDF. They are post-produced.
Data not held natively - so conversion script - needs to be
maintained and updated every time app schema changes
Data not up to date (1 hour, 1 day, 1 month behind?)
30. Our innovation:
Native Linked Data
Applications
Thursday, 21 June 12
We started working on these applications back in 2008
They are natively linked data so solve the conversion+currency
issue
There is no other “format” or schema the data is stored in, it’s
native RDF
When you have no schema, and you can integrate data from
elsewhere on the web, it’s addictive
31. Our problem:
FFS! For applications, we
need humongous scale and
performance
Thursday, 21 June 12
Those applications becoming rather popular with our users...
sub 50ms query time
Modern web apps need speed and data scale
Out-grown triple store and SPARQL
SPARQL is very flexible and expressive. It’s also expensive
SPARQL is great for data sets where the questions you can ask are
limitless, but our applications need a data layer where speed is
measured in single digit ms.
Complex caching (w/Memcache) to achieve performance and
scalability
90:10 read:write
32. Tripod
Thursday, 21 June 12
It’s a pod for our triples
A triple store designed for applications and scalability
Based on Mongo
33. Functional requirements:
• Order magnitude increase in perf/scale
• Graph-orientated interface
Non-functional requirements:
• Strong community
Thursday, 21 June 12
Existing code very graph orientated
34. Core data format
Tripod API
Dealing with complex queries
TripodTables
Free text search
Thursday, 21 June 12
Walk through Tripod looking at 5 areas
35. {
‘http://example.com/John’ : {
‘http://purl.org/dc/elements/1.1/name’ : [
{
value: ‘John’,
type: ‘literal’
}
],
‘http://purl.org/dc/elements/1.1/knows’ : [
{
value: ‘http://example.com/Jane’,
type: ‘uri’
}
]
},
‘http://example.com/Jane’ : {
‘http://purl.org/dc/elements/1.1/name’ : [
{
value: ‘Jane’,
type: ‘literal’
}
],
‘http://purl.org/dc/elements/1.1/knows’ : [
{
value: ‘http://example.com/John’,
type: ‘uri’
},
{
value: ‘http://example.com/James’,
type: ‘uri’
}
]
}
}
Thursday, 21 June 12
RDF/JSON - a serialisation of RDF in JSON
Neither disk space efficient or readable
full-formed properties not compatible with Mongo (dot notation)
Even single values inside an array (problems for compound
indexing)
36. > db.CBD_people.find()
{
_id: ‘http://example.com/John’,
‘foaf:name’: {l: ‘John’},
‘foaf:knows’: {u: ‘http://example.com/Jane’}
},
{
_id: ‘http://example.com/Jane’,
‘foaf:name’: {l: ‘Jane’},
‘foaf:knows’: [
{u:‘http://example.com/John’},
{u:‘http://example.com/James’}
]
}
Thursday, 21 June 12
Same semantics
2 documents here
Concise bound descriptions - all data known about a subject,
one relationship deep
One document per subject per collection, keyed (and thus
enforced) by Subject URI
Property names are namespaced
CBD collections are deemed as read/write in Tripod
37. class MongoGraph extends SimpleGraph {
function add_tripod_array($tarray)
function to_tripod_array($docId)
}
Thursday, 21 June 12
All of our app already uses SimpleGraph from a library called
Moriarty (Google Code)
Simple extension which can ingest/output the data format on
prev slide
38. Core data format
Tripod API
Dealing with complex queries
TripodTables
Free text search
Thursday, 21 June 12
Walk through Tripod looking at 5 areas
39. interface ITripod
{
public function select($query,$fields,$sortBy=null,$limit=null);
public function describeResource($resource);
public function describeResources(Array $resources);
public function saveChanges($oldGraph, $newGraph);
public function search($query);
}
Thursday, 21 June 12
Almost the same as our existing data access API onto generic
triple store
All of these methods return graphs, all are mega-simple queries
on the CBD collections
None of these methods support joins (WHERE clause in SPARQL)
40. public function describeResource($resource)
{
$query = array(“_id”=>$resource);
$bson = $this->getCollection()->findOne($query);
$graph = new MongoGraph();
$graph->add_tripod_data($bson);
return $graph;
}
Thursday, 21 June 12
These methods mega simple to implement as they translate to
really simple Mongo Queries on the CBD collections returning
single objects
41. interface ITripod
{
public function select($query,$fields,$sortBy=null,$limit=null);
public function describeResource($resource);
public function describeResources(Array $resources);
public function saveChanges($oldGraph, $newGraph);
public function search($query);
public function getViewForResource($resource,$viewType);
public function getViewForResources(Array $resources,$viewType);
public function getViews(Array $filter,$viewType);
}
Thursday, 21 June 12
Some extra methods to deal with complex queries involving joins
42. Core data format
Tripod API
Dealing with complex queries
TripodTables
Free text search
Thursday, 21 June 12
2 things we realised when looking at our applications
44. DESCRIBE <http://example.com/foo> ?sectionOrItem ?resource ?document ?
authorList ?author ?usedBy ?creator ?libraryNote ?publisher
WHERE
{
OPTIONAL
{
<http://example.com/foo> resource:contains ?sectionOrItem .
OPTIONAL
{
?sectionOrItem resource:resource ?resource .
OPTIONAL { ?resource dcterms:isPartOf ?document . }
OPTIONAL
{
?resource bibo:authorList ?authorList .
OPTIONAL { ?authorList ?p ?author . }
}
OPTIONAL { ?resource dcterms:publisher ?publisher . }
}
OPTIONAL { ?libraryNote bibo:annotates ?sectionOrItem }
} .
OPTIONAL { <http://example.com/foo> resource:usedBy ?usedBy } .
OPTIONAL { <http://example.com/foo> sioc:has_creator ?creator }
}
Thursday, 21 June 12
Only thing that changes at run time in this query is this URI
Flexibility of SPARQL great for developer but terrible here for
system performance
Query engine needs to join 9 times! Flexibility costs us every
time we run this query!
This is why we hid it behind a cache
45. join
count
follow sequences (n times)
join across databases
All the above with a condition
include certain properties
include all properties
Thursday, 21 June 12
2nd thing
We only make use of minimal SPARQL
And some of these aren’t even well supported in SPARQL
(sequences + join across databases)
46. Materialised views, generated
infrequently, read often
Thursday, 21 June 12
Remember 90:10 read:update
View specifications based on a subset of SPARQL
Views are for DESCRIBE like queries where all the data is brought
back in one hit (not tabular data)
47. {
_id: "v_resource_brief",
from: "CBD_harvest",
type: "http://talisaspire.com/schema#Resource",
include: ["rdf:type", "dct:subject", "dct:isVersionOf",
"searchterms:usedAt", "dc:identifier"],
joins: {
"acorn:preferredMetadata": [],
"acorn:listReferences": {
include: ["acorn:list"]
},
"acorn:bookmarkReferences": {
include: ["acorn:bookmark"]
},
"dcterms:isPartOf": [],
"acorn:partReferences": {
include: ["dct:hasPart"],
joins: {
"dct:hasPart": {
joins: {
"acorn:preferredMetadata": []
}
}
}
}
}
}
Thursday, 21 June 12
A view specification - itself a document that can be stored in
Mongo
8 keywords:
type from include joins
ttl followSequence maxJoins counts
48. Generated by incremental
MapReduce when:
1) Data is changed
2) TTL expires
Thursday, 21 June 12
Tripod can take these specifications and manage views in a
special collection within the DB.
They expire and are regenerated automatically (and
incrementally)
Incremental map reduce inside the DB
Fast, interleaves with reads
49. > db.views.findOne()
{
"_id" : {
"rdf:resource" : "http://talisaspire.com/examples/1",
"type" : "v_resource_full"
},
"value" : {
"graphs" : [
{
"_id" : "http://talisaspire.com/examples/1",
"rdf:type" : {
"type" : "uri",
"value" : "http://talisaspire.com/schema#Resource"
}
}
],
"impactIndex" : [
"rdf:resource" : "http://talisaspire.com/examples/1"
]
}
}
Thursday, 21 June 12
This is what a view looks like
ID is a composite key of the view type and root resource
Graphs is a collection of CBDs
MongoGraph we displayed earlier can take this and represent it
as a unified graph to the application
Impact index - A watch list of resources. When resources are
saved the impact index is queried to find views that need
invalidating
TTL is an alternative. If in viewspec timestamp is stored in view to
determine when it can be invalidated
50. 1 2
3
4
attribution
Thursday, 21 June 12
Match views to data update rate
51. Core data format
Tripod API
Dealing with complex queries
TripodTables
Free text search
Thursday, 21 June 12
Tripod Tables are for larger datasets which cannot be brought
back in one hit
They can be paged or have individual columns indexed for fast
sort capability
52. SELECT ?listName ?listUri!
WHERE
{
! ?resource bibo:isbn10 "$isbn"
! UNION
! {
! ! ?resource bibo:isbn10 "$isbnLowerCase" .
! }
! ?item resource:resource ?resource .
! UNION
! {
! ! ?resourcePartOf bibo:isbn10 "$isbn" .
! ! UNION
! ! {
! ! ! ?resourcePartOf bibo:isbn10 "$isbnLowerCase" .
! ! }
! ! ?resourcePartOf dct:hasPart ?resource .
! ! ?item resource:resource ?resource .
}
?listUri resource:contains ?item .
?listUri sioc:name ?listName .
?listUri rdf:type resource:List
}
LIMIT 10
OFFSET 40
Thursday, 21 June 12
This is a select query that brings back a two col document
OFFSET
LIMIT
54. > db.t_resource.findOne()
{
"_id" : "http://talisaspire.com/resources/3SplCtWGPqEyXcDiyhHQpA-2",
"value" : {
"type" : [
"http://purl.org/ontology/bibo/Book",
"http://talisaspire.com/schema#Resource"
],
"isbn" : "9780393929690",
"isbn13" : [
"9780393929691",
"9780393929691-2",
! "9780393929691-3"
],
"impactIndex" : [
"http://talisaspire.com/works/4d101f63c10a6",
]
}
}
Thursday, 21 June 12
This time our map reduce doesn’t create one doc as with
materialised views
We get one doc per row
55. Core data format
Tripod API
Dealing with complex queries
TripodTables
Free text search
Thursday, 21 June 12
Our triple store included free text search
We wanted to stream updates into Elastic Search or A N Other
search solution
When documents saved, same specification language used to
build Search Document Format docs and submit them to an
endpoint
We like ElasticSearch but you could use Amazon CloudSearch
56. Limitations
Thursday, 21 June 12
Map Reduce as a non-blocking db.eval() and also to work around
sync PHP programming model
PHP only for now - our web apps were PHP
To get a SPARQL endpoint we are exporting data out to Fueski -
solved the mapping not the currency (for SPARQL)
57. Future
Thursday, 21 June 12
Node JS port
Use as a server not a library
Eliminate dependancy on map reduce
Specification version control
Tap into op log for stream approach into Fuseki and other
locations
Named graph support
Further optimisation of data model
Maybe open source