• Perler since 2000
• CPAN contributor (MAJENSEN) since 2009
• BioPerl Core Developer
• Director, Genomic Data Programs,
Leidos Biomedical Research Inc (FNLCR)
• @thinkinator, LinkedIn
2
Not my sponsor, but could be yours!
• http://www.perlfoundation.org/how_to_write_a_proposal
3
Motivation
• Cancer Genomics:
Biospecimen, Clinical, Analysis Data
– complex
– growing
– evolving technologies
– evolving policies
– need for precise accounting
• Graph models are well-suited to this world
4
7
select bar.name
from bar, bar_baz, baz, baz_goob, goob,
goob_squirrel, squirrel, squirrel_spam, spam, spam_eggs,
eggs, eggs_foo, foo
where bar.id = bar_baz.bar_id and bar_baz.baz_id = baz.id and
baz.id = baz_goob.baz_id and baz_goob.goob_id = goob.id and
goob.id = goob_squirrel.goob_id and goob_squirrel.id = squirrel.id
and
squirrel.id = squirrel_spam.squirrel_id and
squirrel_spam.spam_id = spam.id and spam.id = spam_eggs.spam_id and
spam_eggs.eggs_id = eggs.id and eggs_foo.eggs_id = eggs.id and
eggs_foo.foo_id = foo.id and foo.name = 'zloty';
match (f:foo)-[*5..8]-(b:bar) where f.name = 'zloty' return b.name
Neo4j
• “Native” graph DB engine (currently in v2.2)
– Written in Java, but
– Very complete REST API
– Custom query language: Cypher
– Free community edition
– Lots of community support, including many
“language drivers”
• Not the only one out there, but probably the
most widely used (certainly the best
marketed)
8
Design Goals
• "OGM" – Perl 5 objects backed by the graph
• User should never have to deal with a REST endpoint*
*Unless she wants to.
• User should never/only have to deal with Cypher
queries†
†Unless he wants/doesn’t want to.
• Robust enough for production code
– System should approach complete coverage of the REST
service
– System should be robust to REST API changes and server
backward-compatible (or at least version-aware)
• Take advantage of the self-describing features of the API
13
REST::Neo4p core objects
• Are Node, Relationship, Index
– Index objects represent legacy (v1.0) indexes
– v2.0 “background” indexes handled in Schema
• Are blessed scalar refs : "Inside-out object" pattern
– the scalar value is the item ID (or index name)
– For any object $obj, $$obj (the ID) is exactly what you need
for constructing the API calls
• Are subclasses of Entity
– Entity does the object table handling, JSON-to-object
conversion and HTTP agent calls
– Isolates most of the kludges necessary to handle the few
API inconsistencies that exist(ed)
14
Batch Calls
• Certain situations (database loading, e.g.)
make sense to batch : do many things in one
API call rather than many single calls
• REST API provides this functionality
• How to make it "natural" in the context of
working with objects?
– Use Perl prototyping sugar to create a "batch
block"
15
Example:
Rather than call the server for every line, you can mix in
REST::Neo4p::Batch, and then use a batch {} block:
16
Calls within
block are
collected and
deferred
17
You can execute more complex logic within the
batch block, and keep the objects beyond it:
How does that work?
• Agent module isolates all bona fide calls
– very few kludges to core object modules req'd
• batch() puts the agent into “batch mode” and
executes wrapped code
– agent stores incoming calls as JSON in a queue
• After wrapped code is executed, batch() switches
agent back to normal mode and has it call the
batch endpoint with the queue contents
• Batch processes the response and creates objects
if requested
19
Agent
• Is transparent
– But can always see it with REST::Neo4p->agent
– Agent module alone meant to be useful and independent
• Elicits and uses the API self-discovery feature on
connect()
• Isolates all HTTP requests and responses
• Captures and distinguishes API and HTTP errors
– emits REST::Neo4p::Exceptions objects
• [Instance] Is a subclass of a "real" user agent:
– LWP::UserAgent
– Mojo::UserAgent, or
– HTTP::Thin
21
Working within API Self-Description
22
• Get the list of actions with
– $agent->available_actions
• And AUTOLOAD will provide (see pod for args):
– $agent->get_<action>()
– $agent->put_<action>()
– $agent->post_<action>()
– $agent->delete_<action>()
• Other accessors, e.g. node(), return the
appropriate URL for your server
Schemas - Use Case
You start out with a set of well categorized things, that
have some well defined relationships.
Each thing will be represented as a node, that's fine. But,
You want to guarantee (to your client, for example) that
1. You can classify every node you add or read
unambiguously into a well-defined group
(you know everything that’s in there);
2. You never relate two nodes belonging to particular
groups in a way that doesn't make sense according to
your well-defined relationships
(you can find everything that’s in there).
23
Schema Helps
• REST::Neo4p::Schema – Access the (limited)
schema functionality of Neo4j server
– Create indexes
– Maintain uniqueness of nodes within Label classes
• REST::Neo4p::Constrain - An add-in for
constraining (or validating)
– property values
– connections (relationships) based on node properties
– relationship types
according to flexible specifications
24
Constrain/Constraint
• Multiple modes:
– Automatic (throws exception if constraint
violated)
– Manual (validation function returns false if
constraint violated)
– Suspended (lift constraint processing when
desired)
• Freeze/Thaw (in JSON) constraint
specifications for reuse
28
Cypher Queries
• REST::Neo4p::Query takes a familiar, DBI-like
approach
– Prepare, execute, fetch
– "rows" returned are arrays containing scalars,
Node objects, and/or Relationship objects
• Simple Perl data structures can be requested instead if
desired
– If a query returns a path, a Path object (a simple
container) is returned
29
Cypher Queries
• Transactions are supported when you have
v2.0.1 server or greater
– started with REST::Neo4p->begin_work()
– committed with REST::Neo4p->commit()
– canceled with REST::Neo4p->rollback()
(here, the class looks like the database handle in
DBI, in fact…)
32
Future Directions/Contribution Ideas
• Test on v2.2 server and fix any issues
• Make Neo4p closer to an ORM (require explicit
push/pull from backend server)
• Sunset v1.0 support
– Completely touch-free testing within transactions
– Integrate node labels better
• Make batch response parsing more efficient
– e.g., don't stream if response is not huge
• Add traversal functionality
• Beautify and deodorize
35
Real (high level) model for a cancer genomics study.
Think of every node as representing a table, and every edge as a foreign key (and potentially a linking table). Imagine the join you would have to write to find records on the far left that are related to records on the far lower right. Because of that complexity, you would probably not build the structure you see here in a RDBMS. But then your model is serving the technology at the expense of representing the real world relationships and items.
the batch {} sugar indicates that the calls that would have been made immediately are deferred and kept in a queue, to be emitted after the code inside is done. 'discard_objs' means don't preserve the object information in memory.
From the connect() method of REST::Neo4p::Agent.
This is a little against the NoSQL and graph grain. But the fact is that data stewardship requirements may put you the position of explaining how your apps will positively maintain the integrity and connectivity of the data.
Schemas are not dead. If your team is >1 member, there needs to be some externally established and consultable way to know what is in your datastore. If you have a client that wants to make sure you have stored everything she wants stored, you have to be able to report, validate and verify that.
Right thing to do in principle, also not creating thousands of query objects, plus the server tends to bork on thousands of new queries.
SomaFM DefCon radio snip: “Devs won’t write parameterized queries unless there’s a gun to their head. We know, we hold the gun.”
The transaction REST endpoint is different from the cypher REST endpoint. Neo4p pays attention to whether you're in a transaction or not, and informs the Agent which endpoint to use.
In adding transaction support to Neo4p, identified a bug in 2.0.0. Submitted a ticket and it got fixed! Now, my responsibility not to lead users down the rosy path – check server version and throw if user has a server <2.0.1.