Advertisement

APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library

Nov. 5, 2018
Advertisement

More Related Content

Slideshows for you(20)

Advertisement
Advertisement

APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library

  1. APOC Pearls Michael Hunger Developer Relations Engineering, Neo4j Follow @mesirii APOC Unicorns
  2. All Images by TeeTurtle.com & Unstable Unicorns
  3. Power Up
  4. Extending Neo4j Neo4j Execution Engine User Defined Procedure Applications Bolt User Defined Procedures let you write custom code that is: • Written in any JVM language • Deployed to the Database • Accessed by applications via Cypher
  5. APOC History • My Unicorn Moment • 3.0 was about to have User Defined Procedures • Add the missing utilities • Grew quickly 50 - 150 - 450 • Active OSS project • Many contributors
  6. • Neo4j Sandbox • Neo4j Desktop • Neo4j Cloud Available On
  7. Install
  8. • Utilities & Converters • Data Integration • Import / Export • Graph Generation / Refactoring • Transactions / Jobs / TTL What's in the Box?
  9. • Videos • Documentation • Browser Guide • APOC Training • Neo4j Community Forum • apoc.help() Where can I learn more?
  10. If you learn one thing: apoc.help("keyword)")
  11. APOC Video Series Youtube Playlist
  12. APOC Docs • installation instructions • videos • searchable overview table • detailed explaination • examples neo4j-contrib.github.io/neo4j-apoc-procedures
  13. Browser Guide :play apoc • live examples
  14. The Pearls - That give you Superpowers 17
  15. Data Integration 18
  16. • Relational / Cassandra • MongoDB, Couchbase, ElasticSearch • JSON, XML, CSV, XLS • Cypher, GraphML • ... Data Integration
  17. apoc.load.json • load json from web-apis and files • JSON Path • streaming JSON • compressed data https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_load_json
  18. WITH "https://api.stackexchange.com/2.2/questions?pagesize=100..." AS url CALL apoc.load.json(url) YIELD value UNWIND value.items AS q MERGE (question:Question {id:q.question_id}) ON CREATE SET question.title = q.title, question.share_link = q.share_link, question.favorite_count = q.favorite_count MERGE (owner:User {id:q.owner.user_id}) ON CREATE SET owner.display_name = q.owner.display_name MERGE (owner)-[:ASKED]->(question) FOREACH (tagName IN q.tags | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag)) …
  19. Huge Transactions 23
  20. Run large scale updates CALL apoc.periodic.iterate( 'MATCH (n:Person) RETURN n', 'SET n.name = n.firstName + " " + n.lastName', {batchSize:10000, parallel:true})
  21. Run large scale updates CALL apoc.periodic.iterate( 'LOAD CSV … AS row', 'MERGE (n:Node {id:row.id}) SET n.name = row.name', {batchSize:10000, concurrency:10})
  22. Utilities 26
  23. Text Functions - apoc.text.* indexOf, indexesOf split, replace, regexpGroups format capitalize, decapitalize random, lpad, rpad snakeCase, camelCase, upperCase charAt, hexCode base64, md5, sha1, https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_text_functions
  24. Collection Functions - apoc.coll.* sum, avg, min,max,stdev, zip, partition, pairs sort, toSet, contains, split indexOf, .different occurrences, frequencies, flatten disjunct, subtract, union, … set, insert, remove https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/3.4/docs/overview.adoc#collection-functions
  25. Map Functions - apoc.map.* • .fromNodes, .fromPairs, .fromLists, .fromValues • .merge • .setKey,removeKey • .clean(map,[keys],[values]) • .groupBy https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/3.4/docs/overview.adoc#map-functions
  26. JSON - apoc.convert.* .toJson([1,2,3]) .fromJsonList('[1,2,3]') .fromJsonMap( '{"a":42,"b":"foo","c":[1,2,3]}') .toTree([paths],[lowerCaseRels=true]) .getJsonProperty(node,key) .setJsonProperty(node,key,complexValue)
  27. Graph Refactoring 31
  28. • .cloneNodes • .mergeNodes • .extractNode • .collapseNode • .categorize Relationship Modifications • .to(rel, endNode) • .from(rel, startNode) • .invert(rel) • .setType(rel, 'NEW-TYPE') Aggregation Function - apoc.refactor.*
  29. apoc.refactor.mergeNodes MATCH (n:Person) WITH n.email AS email, collect(n) as people WHERE size(people) > 1 CALL apoc.refactor.mergeNodes(people) YIELD node RETURN node
  30. apoc.create.addLabels MATCH (n:Movie) CALL apoc.create.addLabels( id(n), [ n.genre ] ) YIELD node REMOVE node.genre RETURN node
  31. Triggers 35
  32. Triggers CALL apoc.trigger.add( name, statement,{phase:before/after}) • pause/resume/list/remove • Transaction-Event-Handler calls cypher statement • parameters: createdNodes, assignedNodeProperties, deletedNodes,... • utility functions to extract entities/properties from update-records • stores in graph properties
  33. Time to Live 37
  34. enable in config: apoc.ttl.enabled=true Label :TTL apoc.date.expire(In)(node, time, unit) Creates Index on :TTL(ttl) Time To Live TTL
  35. background job (every 60s - configurable) that runs: MATCH (n:TTL) WHERE n.ttl > timestamp() WITH n LIMIT 1000 DET DELETE n Time To Live TTL
  36. Aggregation Functions 40
  37. Aggregation Function - apoc.agg.* • more efficient variants of collect(x)[a..b] • .nth,.first,.last,.slice • .median(x) • .percentiles(x,[0.5,0.9]) • .product(x) • .statistics() provides a full numeric statistic
  38. Graph Grouping 42
  39. Graph Grouping MATCH (p:Person) set p.decade = b.born / 10; MATCH (p1:Person)-->()<--(p2:Person) WITH p1,p2,count(*) as c MERGE (p1)-[r:INTERACTED]-(p2) ON CREATE SET r.count = c CALL apoc.nodes.group(['Person'],['decade']) YIELD node, relationship RETURN *;
  40. Cypher Procedures 44
  41. apoc.custom.asProcedure/asFunction (name,statement, columns, params) • Register statements as real procedures & functions • 'custom' namespace prefix • Pass parameters, configure result columns • Stored in graph and distributed across cluster Custom Procedures (WIP)
  42. call apoc.custom.asProcedure('neighbours', 'MATCH (n:Person {name:$name})-->(nb) RETURN neighbour', [['neighbour','NODE']],[['name','STRING']]); call custom.neighbours('Joe') YIELD neighbour; Custom Procedures (WIP)
  43. Report Issues Contribute!
  44. Ask Questions neo4j.com/slack community.neo4j.com
  45. APOC on GitHub
  46. Join the Workshop tomorrow!
  47. Any Questions?
  48. Best Question gets a box!
  49. Expand Operation 53
  50. Expand Operations Customized path expansion from start node(s) • Min/max traversals • Limit number of results • Optional (no rows removed if no results) • Choice of BFS/DFS expansion • Custom uniqueness (restrictions on visitations of nodes/rels) • Relationship and label filtering • Supports repeating sequences
  51. Expand Operations apoc.path.expand(startNode(s), relationshipFilter, labelFilter, minLevel, maxLevel) YIELD path • The original, when you don’t need much customization apoc.path.expandConfig(startNode(s), configMap) YIELD path • Most flexible, rich configuration map apoc.path.subgraphNodes(startNode(s), configMap) YIELD node • Only distinct nodes, don't care about paths apoc.path.spanningTree(startNode(s), configMap) YIELD path • Only one distinct path to each node apoc.path.subgraphAll(startNode(s), configMap) YIELD nodes, relationships • Only (collected) distinct nodes (and all rels between them)
  52. Config map values • minLevel: int • maxLevel: int • relationshipFilter • labelFilter • uniqueness: (‘RELATIONSHIP_PATH’, ’NODE_GLOBAL’, ‘NODE_PATH’, etc) • bfs: boolean, • filterStartNode: boolean • limit: int • optional: boolean • endNodes: [nodes] • terminatorNodes: [nodes] • sequence • beginSequenceAtStart: boolean
  53. Relationship Filter • '<ACTED_IN' - Incoming Rel • 'DIRECTED>' - Outgoing Rel • 'REVIEWED' - Any direction • '<ACTED_IN | DIRECTED> | REVIEWED' - Multiple, in varied directions • You can't do that with Cypher -[ACTED_IN|DIRECTED|REVIEWED]->
  54. Label Filter What is/isn't allowed during expansion, and what is/isn't returned • '-Director' – Blacklist, not allowed in path • '+Person' –Whitelist, only allowed in path (no whitelist = all allowed) • '>Reviewer' – End node, only return these, and continue expansion • '/Actor:Producer' – Terminator node, only return these, stop expansion 'Person|Movie|-Director|>Reviewer|/Actor:Producer' – Combine them
  55. Sequences Repeating sequences of relationships, labels, or both. Uses labelFilter and relationshipFilter, just add commas Or use sequence for both together labelFilter:'Post | -Blocked, Reply, >Admin' relationshipFilter:'NEXT>,<FROM,POSTED>|REPLIED>' sequence:'Post |-Blocked, NEXT>, Reply, <FROM, >Admin, POSTED>| REPLIED>'
  56. End nodes / Terminator nodes What if we already have the nodes that should end the expansion? endNodes – like filter, but takes a collection of nodes (or ids) terminatorNodes – like filter (stop expand), but also takes a collection (whitelistNodes and blacklistNodes too! ) Can be used with labelFilter or sequence, but continue or include must be unanimous
  57. End nodes / Terminator nodes What if we already have the nodes that should end the expansion? endNodes – like filter, but takes a collection of nodes (or ids) terminatorNodes – like filter (stop expand), but also takes a collection (whitelistNodes and blacklistNodes too! ) Can be used with labelFilter or sequence, but continue or include must be unanimous
  58. Bolt Connector 62
  59. Bolt Connector CALL apoc.bolt.execute(url, statement, params, config) YIELD row CALL apoc.bolt.load(url, statement, params, config) YIELD row call apoc.bolt.load("bolt://user:password@localhost:7687"," match(p:Person {name:{name}}) return p", {name:'Michael'}) supports bolt connector parameters returns: scalars, Maps (row), virtual nodes,rels,paths
  60. Connect to Community Graph "bolt://all:readonly@138.197.15.1" and load all Meetup Group 64
  61. Conversion Functions 65
  62. Turn "[1,2,3]" into a Cypher List in plain Cypher 66
  63. Turn JSON List into Cypher List with "[1,2,3]" as str with split(substring(str,1, length(str)-2),",") as numbers return [x IN numbers| toInteger(x)]
  64. apoc.convert.toJson apoc.convert.fromJsonMap apoc.convert.fromJsonList JSON Conversion Functions https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_from_tojson
  65. Conversion Functions apoc.convert.toString,.toBoolean,.toFloat,.toInteger apoc.convert.toMap apoc.convert.toList,.toSet apoc.convert.toNode,.toRelationship
  66. Logging 70
  67. Logging
  68. Triggers 72
  69. Gephi Integration 73
  70. Gephi Integration match path = (:Person)-[:ACTED_IN]->(:Movie) WITH path LIMIT 1000 with collect(path) as paths call apoc.gephi.add(null,'workspace0', paths) yield nodes, relationships, time return nodes, relationships, time incremental send to Gephi, needs Gephi Streaming extension
  71. Graph Refactorings 75
  72. Refactor the movie graph 76
  73. Cypher Execution 77
  74. apoc.cypher.run(fragment, params) apoc.cypher.doIt(fragment, params) apoc.cypher.runTimeboxed apoc.cypher.runFile(file or url,{config}) apoc.cypher.runSchemaFile(file or url,{config}) apoc.cypher.runMany('cypher;nstatements;',{params},{config}) apoc.cypher.mapParallel(fragment, params, list-to-parallelize) Cypher Execution
  75. Check out the other periodic procs Try apoc.periodic.iterate example 79
  76. Graph Grouping 80
  77. Warmup 81
  78. Warmup • load page-cache • page-skipping • new implementation based on PageCache.* • nodes + rels + rel-groups • properties • string / array properties • index pages
  79. Monitoring 83
  80. Monitoring • apoc.monitor.ids • apoc.monitor.kernel • apoc.monitor.store • apoc.monitor.tx • apoc.monitor.locks(minWaitTime long)
  81. Conditional Cypher Execution 85
  82. Conditional Cypher Execution CALL apoc.[do.]when(condition, ifQuery, elseQuery, params) CALL apoc.[do.]case([condition, query, condition, query, …​], elseQuery, params)
  83. Graph Generation 87
  84. Graph Generation • apoc.generate.er(noNodes, noEdges, 'label', 'type') Erdos-Renyi model (uniform) • apoc.generate.ws(noNodes, degree, beta, 'label', 'type') Watts-Strogatz model (clusters) • apoc.generate.ba(noNodes, edgesPerNode, 'label', 'type') Barabasi-Albert model (preferential attachment • apoc.generate.complete(noNodes, 'label', 'type') • apoc.generate.simple([degrees], 'label', 'type')
  85. Locking 89
  86. Locking call apoc.lock.nodes([nodes]) call apoc.lock.rels([relationships]) call apoc.lock.all([nodes],[relationships])
  87. JSON 91
  88. Export 92
  89. Export apoc.export.csv .all / .data / .query apoc.export.cypher apoc.export.graphml leaving off filename does stream cypher to client
  90. Data Creation 94
  91. Data Creation CALL apoc.create.node(['Label'], {key:value,…​}) CALL apoc.create.nodes(['Label'], [{key:value,…​}]) CALL apoc.create.addLabels, .removeLabels CALL apoc.create.setProperty CALL apoc.create.setProperties CALL apoc.create.relationship(from,'TYPE',{key:value,…​}, to) CALL apoc.nodes.link([nodes],'REL_TYPE')
  92. Virtual Entities 96
  93. Virtual Entities Function AND Procedure apoc.create.vNode(['Label'], {key:value,…​}) YIELD node apoc.create.vRelationship(from,TYPE,{key:value,…​}, to) apoc.create.vPattern({_labels:[Label],key:value},'TYPE', {key:value,…​}, {_labels:['LabelB'],key:value})
  94. Try apoc.date.* with datetime() text, coll, map, convert funcs 98
  95. And many more!
  96. Latest Releases Summer Release 3.4.0.2 (Aug 8) Spring Release 3.4.0.1 (May 16) Winter Release 3.3.0.2 (Feb 23)
  97. TASK Aggregation Functions 101
  98. Latest Additions • apoc.diff graph • new text similarity functions • CSV loader based on neo4j- import format • apoc.load.xls • apoc.group.nodes • Accessor functions for (virtual) entities • S3 Support • HDFS Support • apoc.index.addNodeMap • apoc.path.create • apoc.path.slice • apoc.path.combine • apoc.text.code(codepoint) • stream apoc.export.cypher • apoc.coll.combinations(), apoc.coll.frequencies() 102
  99. TASK Which of these are you interested in? Ask / Try 103
  100. Procedures / Functions from Cypher CALL apoc.custom.asProcedure('answer','RETURN 42 as answer'); CALL custom.answer(); works also with parameters, and return columns declarations CALL apoc.custom.asFunction('answer','RETURN $input','long', [['input','number']]); RETURN custom.answer(42) as answer;
  101. Neo4j Developer Surface Native LanguageDrivers BOLT User Defined Procedure 2000-2010 0.x Embedded Java API 2010-2014 1.x REST 2014-2015 2.x Cypher over HTTP 2016 3.0.x Bolt, Official Language Drivers, User Defined Procedures 2016 3.1.x User Defined Functions 2017 3.2.x User Defined Aggregation Functions
  102. Procedures Functions Aggregate Functions
  103. Can be written in any JVM language
  104. User Defined Procedures 108
  105. Callable Standalone and in Cypher Statements
  106. CALL example.search('User','name:Brook*')
  107. How to build them Developer Manual
  108. Build a procedure or function you'd like start with the template repo github.com/neo4j-examples/neo4j-procedure-template 112
  109. User Defined Procedures User-defined procedures are ● @Procedure annotated, named Java Methods ○ default name: package + method ● take @Name'ed parameters (3.1. default values) ● return a Stream of value objects ● fields are turned into columns ● can use @Context injected GraphDatabaseService etc ● run within Transaction
  110. public class FullTextIndex { @Context public GraphDatabaseService db; @Procedure( name = "example.search", mode = Procedure.Mode.READ ) public Stream<SearchHit> search( @Name("index") String index, @Name("query") String query ) { if( !db.index().existsForNodes( index )) { return Stream.empty(); } return db.index().forNodes( index ).query( query ).stream() .map( SearchHit::new ); } public static class SearchHit { public final Node node; SearchHit(Node node) { this.node = node; } } }
  111. try ( Driver driver = GraphDatabase.driver( "bolt://localhost", Config.build().toConfig() ) ) { try ( Session session = driver.session() ) { String call = "CALL example.search('User',$query)"; Map<String,Object> params = singletonMap( "query", "name:Brook*"); StatementResult result = session.run( call, params); while ( result.hasNext() { // process results } } } Deploy & Register in Neo4j Server via neo4j-harness Call & test via neo4j-java-driver
  112. Deploying User Defined Procedures Build or download (shadow) jar ● Drop jar-file into $NEO4J_HOME/plugins ● Restart server ● Procedure should be available ● Otherwise check neo4j.log / debug.log
  113. User Defined Functions 120
  114. Useable in any Cypher expression or lightweight computation
  115. RETURN example.join(['Hello', 'World'],' ') => "Hello World"
  116. public class Join { @UserFunction @Description("example.join(['s1','s2',...], delimiter) - join the given strings with the given delimiter.") public String join( @Name("strings") List<String> strings, @Name(value = "delimiter", defaultValue = ",") String delimiter ) { if ( strings == null || delimiter == null ) { return null; } return String.join( delimiter, strings ); } }
  117. public class Join { @UserFunction @Description("example.join(['s1','s2',...], delimiter) - join the given strings with the given delimiter.") public String join( @Name("strings") List<String> strings, @Name(value = "delimiter", defaultValue = ",") String delimiter ) { if ( strings == null || delimiter == null ) { return null; } return String.join( delimiter, strings ); } }
  118. public class Join { @UserFunction @Description("example.join(['s1','s2',...], delimiter) - join the given strings with the given delimiter.") public String join( @Name("strings") List<String> strings, @Name(value = "delimiter", defaultValue = ",") String delimiter ) { if ( strings == null || delimiter == null ) { return null; } return String.join( delimiter, strings ); } }
  119. try ( Driver driver = GraphDatabase.driver( "bolt://localhost", Config.build().toConfig() ) ) { try ( Session session = driver.session() ) { String query = "RETURN example.join(['Hello', 'World']) AS result"; String result = session.run( query ) .single().get( "result" ).asString(); } }
  120. User Defined Aggregation Functions 127
  121. Custom, efficient aggregations for Data Science and BI
  122. Aggregation Function In APOC • more efficient variants of collect(x)[a..b] • apoc.agg.nth, apoc.agg.first, apoc.agg.last, apoc.agg.slice • apoc.agg.median(x) • apoc.agg.percentiles(x,[0.5,0.9]) • apoc.agg.product(x) • apoc.agg.statistics() provides a full numeric statistic
  123. UNWIND ['abc', 'abcd', 'ab'] AS string RETURN example.longestString(string) => 'abcd'
  124. public class LongestString { @UserAggregationFunction @Description( "aggregates the longest string found" ) public LongStringAggregator longestString() { return new LongStringAggregator(); } public static class LongStringAggregator { private int longest; private String longestString; @UserAggregationUpdate public void findLongest( @Name( "string" ) String string ) { if ( string != null && string.length() > longest) { longest = string.length(); longestString = string; } } @UserAggregationResult public String result() { return longestString; } } }
  125. public class LongestString { @UserAggregationFunction @Description( "aggregates the longest string found" ) public LongStringAggregator longestString() { return new LongStringAggregator(); } public static class LongStringAggregator { private int longest; private String longestString; @UserAggregationUpdate public void findLongest( @Name( "string" ) String string ) { if ( string != null && string.length() > longest) { longest = string.length(); longestString = string; } } @UserAggregationResult public String result() { return longestString; } } }
  126. public class LongestString { @UserAggregationFunction @Description( "aggregates the longest string found" ) public LongStringAggregator longestString() { return new LongStringAggregator(); } public static class LongStringAggregator { private int longest; private String longestString; @UserAggregationUpdate public void findLongest( @Name( "string" ) String string ) { if ( string != null && string.length() > longest) { longest = string.length(); longestString = string; } } @UserAggregationResult public String result() { return longestString; } } }
  127. public class LongestString { @UserAggregationFunction @Description( "aggregates the longest string found" ) public LongStringAggregator longestString() { return new LongStringAggregator(); } public static class LongStringAggregator { private int longest; private String longestString; @UserAggregationUpdate public void findLongest( @Name( "string" ) String string ) { if ( string != null && string.length() > longest) { longest = string.length(); longestString = string; } } @UserAggregationResult public String result() { return longestString; } } }
  128. try ( Driver driver = GraphDatabase.driver( "bolt://localhost", Config.build().toConfig() ) ) { try ( Session session = driver.session() ) { String query = "UNWIND ['abc', 'abcd', 'ab'] AS string " + "RETURN example.longestString(string) AS result"; String result = session.run(query).single().get("result").asString(); } }
  129. One Question / Comment from each!
Advertisement