Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Neo4p dcbpw-2015


Published on

Overview of Neo4p : A Perl driver for the graph database engine Neo4j

Talk at the DC/Baltimore Perl Workshop 2015

Published in: Software
  • Be the first to comment

Neo4p dcbpw-2015

  1. 1. (Perl)-[:speaks]->(Neo4j) Mark A. Jensen 1
  2. 2. • Perler since 2000 • CPAN contributor (MAJENSEN) since 2009 • BioPerl Core Developer • Director, Genomic Data Programs, Leidos Biomedical Research Inc (FNLCR) • @thinkinator, LinkedIn 2
  3. 3. Not my sponsor, but could be yours! • 3
  4. 4. Motivation • Cancer Genomics: Biospecimen, Clinical, Analysis Data – complex – growing – evolving technologies – evolving policies – need for precise accounting • Graph models are well-suited to this world 4
  5. 5. 5 Patient Tumor Sample Clinical Extract Extract Data File Data File Normal Sample derived_from analysis_of • age • diagnosis • stage • date shipped Nodes Relationships Properties
  6. 6. Graph vs RDBMS 6 foo barbaz spam eggs squirrel goob
  7. 7. 7 select from bar, bar_baz, baz, baz_goob, goob, goob_squirrel, squirrel, squirrel_spam, spam, spam_eggs, eggs, eggs_foo, foo where = bar_baz.bar_id and bar_baz.baz_id = and = baz_goob.baz_id and baz_goob.goob_id = and = goob_squirrel.goob_id and = and = squirrel_spam.squirrel_id and squirrel_spam.spam_id = and = spam_eggs.spam_id and spam_eggs.eggs_id = and eggs_foo.eggs_id = and eggs_foo.foo_id = and = 'zloty'; match (f:foo)-[*5..8]-(b:bar) where = 'zloty' return
  8. 8. Neo4j • “Native” graph DB engine (currently in v2.2) – Written in Java, but – Very complete REST API – Custom query language: Cypher – Free community edition – Lots of community support, including many “language drivers” • Not the only one out there, but probably the most widely used (certainly the best marketed) 8
  9. 9. Neo4p 9
  10. 10. Neo4p 10
  11. 11. Neo4p 11 Create Node Label Node Create Unique Node Add a Prop Link Nodes Load/Use Index
  12. 12. Neo4p 12
  13. 13. Design Goals • "OGM" – Perl 5 objects backed by the graph • User should never have to deal with a REST endpoint* *Unless she wants to. • User should never/only have to deal with Cypher queries† †Unless he wants/doesn’t want to. • Robust enough for production code – System should approach complete coverage of the REST service – System should be robust to REST API changes and server backward-compatible (or at least version-aware) • Take advantage of the self-describing features of the API 13
  14. 14. REST::Neo4p core objects • Are Node, Relationship, Index – Index objects represent legacy (v1.0) indexes – v2.0 “background” indexes handled in Schema • Are blessed scalar refs : "Inside-out object" pattern – the scalar value is the item ID (or index name) – For any object $obj, $$obj (the ID) is exactly what you need for constructing the API calls • Are subclasses of Entity – Entity does the object table handling, JSON-to-object conversion and HTTP agent calls – Isolates most of the kludges necessary to handle the few API inconsistencies that exist(ed) 14
  15. 15. Batch Calls • Certain situations (database loading, e.g.) make sense to batch : do many things in one API call rather than many single calls • REST API provides this functionality • How to make it "natural" in the context of working with objects? – Use Perl prototyping sugar to create a "batch block" 15
  16. 16. Example: Rather than call the server for every line, you can mix in REST::Neo4p::Batch, and then use a batch {} block: 16 Calls within block are collected and deferred
  17. 17. 17 You can execute more complex logic within the batch block, and keep the objects beyond it:
  18. 18. 18 But miracles are not yet implemented: Object here doesn't really exist yet…
  19. 19. How does that work? • Agent module isolates all bona fide calls – very few kludges to core object modules req'd • batch() puts the agent into “batch mode” and executes wrapped code – agent stores incoming calls as JSON in a queue • After wrapped code is executed, batch() switches agent back to normal mode and has it call the batch endpoint with the queue contents • Batch processes the response and creates objects if requested 19
  20. 20. HTTP Agent 20
  21. 21. Agent • Is transparent – But can always see it with REST::Neo4p->agent – Agent module alone meant to be useful and independent • Elicits and uses the API self-discovery feature on connect() • Isolates all HTTP requests and responses • Captures and distinguishes API and HTTP errors – emits REST::Neo4p::Exceptions objects • [Instance] Is a subclass of a "real" user agent: – LWP::UserAgent – Mojo::UserAgent, or – HTTP::Thin 21
  22. 22. Working within API Self-Description 22 • Get the list of actions with – $agent->available_actions • And AUTOLOAD will provide (see pod for args): – $agent->get_<action>() – $agent->put_<action>() – $agent->post_<action>() – $agent->delete_<action>() • Other accessors, e.g. node(), return the appropriate URL for your server
  23. 23. Schemas - Use Case You start out with a set of well categorized things, that have some well defined relationships. Each thing will be represented as a node, that's fine. But, You want to guarantee (to your client, for example) that 1. You can classify every node you add or read unambiguously into a well-defined group (you know everything that’s in there); 2. You never relate two nodes belonging to particular groups in a way that doesn't make sense according to your well-defined relationships (you can find everything that’s in there). 23
  24. 24. Schema Helps • REST::Neo4p::Schema – Access the (limited) schema functionality of Neo4j server – Create indexes – Maintain uniqueness of nodes within Label classes • REST::Neo4p::Constrain - An add-in for constraining (or validating) – property values – connections (relationships) based on node properties – relationship types according to flexible specifications 24
  25. 25. App-level Constraints 25
  26. 26. 26
  27. 27. 27 Will throw at Record 5
  28. 28. Constrain/Constraint • Multiple modes: – Automatic (throws exception if constraint violated) – Manual (validation function returns false if constraint violated) – Suspended (lift constraint processing when desired) • Freeze/Thaw (in JSON) constraint specifications for reuse 28
  29. 29. Cypher Queries • REST::Neo4p::Query takes a familiar, DBI-like approach – Prepare, execute, fetch – "rows" returned are arrays containing scalars, Node objects, and/or Relationship objects • Simple Perl data structures can be requested instead if desired – If a query returns a path, a Path object (a simple container) is returned 29
  30. 30. 30
  31. 31. Cypher Queries • Prepare and execute with parameter substitutions 31 Do This! Not This!
  32. 32. Cypher Queries • Transactions are supported when you have v2.0.1 server or greater – started with REST::Neo4p->begin_work() – committed with REST::Neo4p->commit() – canceled with REST::Neo4p->rollback() (here, the class looks like the database handle in DBI, in fact…) 32
  33. 33. DBI – DBD::Neo4p • Yes, you can really do this: 33
  34. 34. DBI – DBD::Neo4p 34 • Row returns: choice of full objects or simple Perl structures
  35. 35. Future Directions/Contribution Ideas • Test on v2.2 server and fix any issues • Make Neo4p closer to an ORM (require explicit push/pull from backend server) • Sunset v1.0 support – Completely touch-free testing within transactions – Integrate node labels better • Make batch response parsing more efficient – e.g., don't stream if response is not huge • Add traversal functionality • Beautify and deodorize 35
  36. 36. Thanks! 36