REST::Neo4p - Talk @ DC Perl Mongers

6,417 views

Published on

Updated Slides for DC PerlMongers meetup - REST::Neo4p Perl driver for Neo4j graph database

Published in: Software
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,417
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
11
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • People are using it!
  • REST::Neo4p - Talk @ DC Perl Mongers

    1. 1. REST::Neo4p A Perl Driver for Neo4j Mark A. Jensen SRA International, Inc. 1 https://github.com/majensen/rest-neo4p.git
    2. 2. • Perler since 2000 • CPAN contributor (MAJENSEN) since 2009 • BioPerl Core Developer • Scientific Project Director, The Cancer Genome Atlas Data Coordinating Center • @thinkinator, LinkedIn 2
    3. 3. Motivation • TCGA: Biospecimen, Clinical, Genomic Data – complex – growing – evolving technologies – evolving policies – need for precise accounting • Customer suggested Neo4j – I wanted to play with it, but in Perl 3
    4. 4. Feeping Creaturism • 2012 - Some Perl experiments out there, nothing complete • Got excited • People started using it • Sucked into the open-source attractor MaintenanceGlory 4
    5. 5. Dog Food I 5
    6. 6. Dog Food II 6
    7. 7. Woof! Neo4p Classes 7
    8. 8. Design Goals • "OGM" – Perl 5 objects backed by the graph • User should never have to deal with a REST endpoint* *Unless she wants to. • User should never have to deal with a Cypher query† †Unless he wants to. • Robust enough for production code – System should approach complete coverage of the REST service – System should be robust to REST API changes and server backward-compatible (or at least version-aware) • Take advantage of the self-describing features of the API 8
    9. 9. REST::Neo4p core objects • Are Node, Relationship, Index – Index objects represent legacy (v1.0) indexes – v2.0 “background” indexes handled in Schema • Are blessed scalar refs : "Inside-out object" pattern – the scalar value is the item ID (or index name) – For any object $obj, $$obj (the ID) is exactly what you need for constructing the API calls • Are subclasses of Entity – Entity does the object table handling, JSON-to-object conversion and HTTP agent calls – Isolates most of the kludges necessary to handle the few API inconsistencies that exist(ed) 9
    10. 10. Auto-accessors • You can treat properties as object fields if desired • Caveat: this may not make sense for your application (not every node needs to have the same properties, but every object will possess the accessors currently) 10
    11. 11. Batch Calls • Certain situations (database loading, e.g.) make sense to batch : do many things in one API call rather than many single calls • REST API provides this functionality • How to make it "natural" in the context of working with objects? – Use Perl prototyping sugar to create a "batch block" 11
    12. 12. Example: Rather than call the server for every line, you can mix in REST::Neo4p::Batch, and then use a batch {} block: 12 Calls within block are collected and deferred
    13. 13. 13 You can execute more complex logic within the batch block, and keep the objects beyond it:
    14. 14. 14 But miracles are not yet implemented: Object here doesn't really exist yet…
    15. 15. How does that work? • Agent module isolates all bona fide calls – very few kludges to core object modules req'd • batch() puts the agent into “batch mode” and executes wrapped code – agent stores incoming calls as JSON in a queue • After wrapped code is executed, batch() switches agent back to normal mode and has it call the batch endpoint with the queue contents • Batch processes the response and creates objects if requested 15
    16. 16. Batch Profiling • Used Devel::NYTProf, nytprofhtml – Flame graph: 16 Vertical : unique call stack (call on top is on CPU) Horizontal : relative time spent in that call stack configuration Color : makes it look like a flame
    17. 17. Batch Profiling 17
    18. 18. Batch Profiling 18 batch keep : 1.1 of 1.2s batch discard: 1.0 of 1.1s no batch: 13.0 of 13.9 s
    19. 19. Batch/keep Batch Profiling 19 No batch
    20. 20. HTTP Agent 20
    21. 21. Agent • Is transparent – But can always see it with REST::Neo4p->agent – Agent module alone meant to be useful and independent • Elicits and uses the API self-discovery feature on connect() • Isolates all HTTP requests and responses • Captures and distinguishes API and HTTP errors – emits REST::Neo4p::Exceptions objects • [Instance] Is a subclass of a "real" user agent: – LWP::UserAgent – Mojo::UserAgent, or – HTTP::Thin 21
    22. 22. Working within API Self-Description 22 Get first level of actions Register actions Get ‘data’ level of actionsRegister more actions Kludge around missing actions
    23. 23. Working within API Self-Description 23 • Get the list of actions with – $agent->available_actions • And AUTOLOAD will provide (see pod for args): – $agent->get_<action>() – $agent->put_<action>() – $agent->post_<action>() – $agent->delete_<action>() • Other accessors, e.g. node(), return the appropriate URL for your server
    24. 24. Agent Profiling 24 lwp: 2.5 of 2.7 s mojo: 3.3 of 3.6s thin: 2.4 of 2.6s
    25. 25. App-level Constraints 25
    26. 26. Use Case You start out with a set of well categorized things, that have some well defined relationships. Each thing will be represented as a node, that's fine. But, You want to guarantee (to your client, for example) that 1. You can classify every node you add or read unambiguously into a well-defined group; 2. You never relate two nodes belonging to particular groups in a way that doesn't make sense according to your well-defined relationships. 26
    27. 27. Constrain/Constraint • Now, v2.0 allows integrated Labels and unique constraints and prevents deletion of connected nodes, but… • REST::Neo4p::Constrain - An add-in for constraining (or validating) – property values – connections (relationships) based on node properties – relationship types according to flexible specifications 27
    28. 28. Constrain/Constraint • Multiple modes: – Automatic (throws exception if constraint violated) – Manual (validation function returns false if constraint violated) – Suspended (lift constraint processing when desired) • Freeze/Thaw (in JSON) constraint specifications for reuse 28
    29. 29. 29 Open the POD now, HAL.
    30. 30. Cypher Queries • REST::Neo4p::Query takes a familiar, DBI-like approach – Prepare, execute, fetch – "rows" returned are arrays containing scalars, Node objects, and/or Relationship objects – If a query returns a path, a Path object (a simple container) is returned 30
    31. 31. 31
    32. 32. Cypher Queries • Prepare and execute with parameter substitutions 32 Do This! Not This!
    33. 33. Cypher Queries • Transactions are supported when you have v2.0.1 server or greater – started with REST::Neo4p->begin_work() – committed with REST::Neo4p->commit() – canceled with REST::Neo4p->rollback() (here, the class looks like the database handle in DBI, in fact…) 33
    34. 34. DBI – DBD::Neo4p • Yes, you can really do this: 34
    35. 35. 35 Glory! Maintenance.
    36. 36. Future Directions/Contribution Ideas • Get it onto GitHub https://github.com/majensen/rest-neo4p.git • Make batch response parsing more efficient – e.g., don't stream if response is not huge • Beautify and deodorize • Completely touch-free testing • Add traversal functionality • Could Neo4p play together with DBIx::Class? (i.e., could it be a real OGM?) 36
    37. 37. Thanks! 37 https://github.com/majensen/rest-neo4p.git

    ×