APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library

APOC Pearls
Michael Hunger
Developer Relations Engineering, Neo4j
Follow @mesirii
APOC Unicorns

All Images by TeeTurtle.com
& Unstable Unicorns

Extending Neo4j
Neo4j Execution Engine
User Defined
Procedure
Applications
Bolt
User Defined Procedures let you write
custom code that is:
• Written in any JVM language
• Deployed to the Database
• Accessed by applications via Cypher

APOC History
• My Unicorn Moment
• 3.0 was about to have
User Defined Procedures
• Add the missing utilities
• Grew quickly 50 - 150 - 450
• Active OSS project
• Many contributors

• Neo4j Sandbox
• Neo4j Desktop
• Neo4j Cloud
Available On

• Utilities & Converters
• Data Integration
• Import / Export
• Graph Generation / Refactoring
• Transactions / Jobs / TTL
What's in the Box?

• Videos
• Documentation
• Browser Guide
• APOC Training
• Neo4j Community Forum
• apoc.help()
Where can I learn more?

If you learn one thing:
apoc.help("keyword)")

APOC
Video Series
Youtube Playlist

APOC Docs
• installation instructions
• videos
• searchable overview table
• detailed explaination
• examples
neo4j-contrib.github.io/neo4j-apoc-procedures

Browser Guide
:play apoc
• live examples

The Pearls -
That give you
Superpowers
17

• Relational / Cassandra
• MongoDB, Couchbase,
ElasticSearch
• JSON, XML, CSV, XLS
• Cypher, GraphML
• ...
Data Integration

apoc.load.json
• load json from web-apis and files
• JSON Path
• streaming JSON
• compressed data
https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_load_json

WITH "https://api.stackexchange.com/2.2/questions?pagesize=100..." AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.items AS q
MERGE (question:Question {id:q.question_id})
ON CREATE SET question.title = q.title,
question.share_link = q.share_link,
question.favorite_count = q.favorite_count
MERGE (owner:User {id:q.owner.user_id})
ON CREATE SET owner.display_name = q.owner.display_name
MERGE (owner)-[:ASKED]->(question)
FOREACH (tagName IN q.tags |
MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag))
…

Run large scale updates
CALL apoc.periodic.iterate(
'MATCH (n:Person) RETURN n',
'SET n.name = n.firstName + " " + n.lastName',
{batchSize:10000, parallel:true})

Run large scale updates
CALL apoc.periodic.iterate(
'LOAD CSV … AS row',
'MERGE (n:Node {id:row.id})
SET n.name = row.name',
{batchSize:10000, concurrency:10})

Text Functions - apoc.text.*
indexOf, indexesOf
split, replace, regexpGroups
format
capitalize, decapitalize
random, lpad, rpad
snakeCase, camelCase, upperCase
charAt, hexCode
base64, md5, sha1,
https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_text_functions

Collection Functions - apoc.coll.*
sum, avg, min,max,stdev,
zip, partition, pairs
sort, toSet, contains, split
indexOf, .different
occurrences, frequencies, flatten
disjunct, subtract, union, …
set, insert, remove
https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/3.4/docs/overview.adoc#collection-functions

Map Functions - apoc.map.*
• .fromNodes, .fromPairs,
.fromLists, .fromValues
• .merge
• .setKey,removeKey
• .clean(map,[keys],[values])
• .groupBy
https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/3.4/docs/overview.adoc#map-functions

JSON - apoc.convert.*
.toJson([1,2,3])
.fromJsonList('[1,2,3]')
.fromJsonMap( '{"a":42,"b":"foo","c":[1,2,3]}')
.toTree([paths],[lowerCaseRels=true])
.getJsonProperty(node,key)
.setJsonProperty(node,key,complexValue)

• .cloneNodes
• .mergeNodes
• .extractNode
• .collapseNode
• .categorize
Relationship Modifications
• .to(rel, endNode)
• .from(rel, startNode)
• .invert(rel)
• .setType(rel, 'NEW-TYPE')
Aggregation Function - apoc.refactor.*

apoc.refactor.mergeNodes
MATCH (n:Person)
WITH n.email AS email, collect(n) as people
WHERE size(people) > 1
CALL apoc.refactor.mergeNodes(people)
YIELD node
RETURN node

apoc.create.addLabels
MATCH (n:Movie)
CALL apoc.create.addLabels( id(n), [ n.genre ] ) YIELD node
REMOVE node.genre
RETURN node

Triggers
CALL apoc.trigger.add(
name, statement,{phase:before/after})
• pause/resume/list/remove
• Transaction-Event-Handler calls cypher statement
• parameters: createdNodes, assignedNodeProperties, deletedNodes,...
• utility functions to extract entities/properties from update-records
• stores in graph properties

enable in config: apoc.ttl.enabled=true
Label :TTL
apoc.date.expire(In)(node, time, unit)
Creates Index on :TTL(ttl)
Time To Live TTL

background job (every 60s - configurable)
that runs:
MATCH (n:TTL)
WHERE n.ttl > timestamp()
WITH n LIMIT 1000
DET DELETE n
Time To Live TTL

Aggregation Function - apoc.agg.*
• more efficient variants of collect(x)[a..b]
• .nth,.first,.last,.slice
• .median(x)
• .percentiles(x,[0.5,0.9])
• .product(x)
• .statistics() provides a full
numeric statistic

Graph Grouping
MATCH (p:Person) set p.decade = b.born / 10;
MATCH (p1:Person)-->()<--(p2:Person)
WITH p1,p2,count(*) as c
MERGE (p1)-[r:INTERACTED]-(p2)
ON CREATE SET r.count = c
CALL apoc.nodes.group(['Person'],['decade'])
YIELD node, relationship RETURN *;

apoc.custom.asProcedure/asFunction
(name,statement, columns, params)
• Register statements as real procedures & functions
• 'custom' namespace prefix
• Pass parameters, configure result columns
• Stored in graph and distributed across cluster
Custom Procedures (WIP)

call apoc.custom.asProcedure('neighbours',
'MATCH (n:Person {name:$name})-->(nb)
RETURN neighbour',
[['neighbour','NODE']],[['name','STRING']]);
call custom.neighbours('Joe') YIELD neighbour;
Custom Procedures (WIP)

Ask Questions
neo4j.com/slack community.neo4j.com

Expand Operations
Customized path expansion from start node(s)
• Min/max traversals
• Limit number of results
• Optional (no rows removed if no results)
• Choice of BFS/DFS expansion
• Custom uniqueness (restrictions on visitations of nodes/rels)
• Relationship and label filtering
• Supports repeating sequences

Expand Operations
apoc.path.expand(startNode(s), relationshipFilter, labelFilter, minLevel, maxLevel) YIELD path
• The original, when you don’t need much customization
apoc.path.expandConfig(startNode(s), configMap) YIELD path
• Most flexible, rich configuration map
apoc.path.subgraphNodes(startNode(s), configMap) YIELD node
• Only distinct nodes, don't care about paths
apoc.path.spanningTree(startNode(s), configMap) YIELD path
• Only one distinct path to each node
apoc.path.subgraphAll(startNode(s), configMap) YIELD nodes, relationships
• Only (collected) distinct nodes (and all rels between them)

Config map values
• minLevel: int
• maxLevel: int
• relationshipFilter
• labelFilter
• uniqueness: (‘RELATIONSHIP_PATH’, ’NODE_GLOBAL’, ‘NODE_PATH’, etc)
• bfs: boolean,
• filterStartNode: boolean
• limit: int
• optional: boolean
• endNodes: [nodes]
• terminatorNodes: [nodes]
• sequence
• beginSequenceAtStart: boolean

Relationship Filter
• '<ACTED_IN' - Incoming Rel
• 'DIRECTED>' - Outgoing Rel
• 'REVIEWED' - Any direction
• '<ACTED_IN | DIRECTED> | REVIEWED' - Multiple, in varied directions
• You can't do that with Cypher
-[ACTED_IN|DIRECTED|REVIEWED]->

Label Filter
What is/isn't allowed during expansion, and what is/isn't returned
• '-Director' – Blacklist, not allowed in path
• '+Person' –Whitelist, only allowed in path (no whitelist = all allowed)
• '>Reviewer' – End node, only return these, and continue expansion
• '/Actor:Producer' – Terminator node, only return these, stop expansion
'Person|Movie|-Director|>Reviewer|/Actor:Producer' – Combine them

Sequences
Repeating sequences of relationships, labels, or both.
Uses labelFilter and relationshipFilter, just add commas
Or use sequence for both together
labelFilter:'Post | -Blocked, Reply, >Admin'
relationshipFilter:'NEXT>,<FROM,POSTED>|REPLIED>'
sequence:'Post |-Blocked, NEXT>, Reply, <FROM, >Admin,
POSTED>| REPLIED>'

End nodes / Terminator nodes
What if we already have the nodes that should end the expansion?
endNodes – like filter, but takes a collection of nodes (or ids)
terminatorNodes – like filter (stop expand), but also takes a collection
(whitelistNodes and blacklistNodes too! )
Can be used with labelFilter or sequence, but continue or include must be unanimous

Bolt Connector
CALL apoc.bolt.execute(url, statement, params, config) YIELD row
CALL apoc.bolt.load(url, statement, params, config) YIELD row
call apoc.bolt.load("bolt://user:password@localhost:7687","
match(p:Person {name:{name}}) return p", {name:'Michael'})
supports bolt connector parameters
returns: scalars, Maps (row), virtual nodes,rels,paths

Connect to Community
Graph
"bolt://all:readonly@138.197.15.1"
and load all Meetup Group
64

Turn "[1,2,3]" into a Cypher
List
in plain Cypher
66

Turn JSON List into Cypher List
with "[1,2,3]" as str
with split(substring(str,1, length(str)-2),",") as numbers
return [x IN numbers| toInteger(x)]

apoc.convert.toJson
apoc.convert.fromJsonMap
apoc.convert.fromJsonList
JSON Conversion Functions
https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_from_tojson

Conversion Functions
apoc.convert.toString,.toBoolean,.toFloat,.toInteger
apoc.convert.toMap
apoc.convert.toList,.toSet
apoc.convert.toNode,.toRelationship

Gephi Integration
match path = (:Person)-[:ACTED_IN]->(:Movie)
WITH path LIMIT 1000
with collect(path) as paths
call apoc.gephi.add(null,'workspace0', paths) yield nodes,
relationships, time
return nodes, relationships, time
incremental send to Gephi, needs Gephi Streaming extension

apoc.cypher.run(fragment, params)
apoc.cypher.doIt(fragment, params)
apoc.cypher.runTimeboxed
apoc.cypher.runFile(file or url,{config})
apoc.cypher.runSchemaFile(file or url,{config})
apoc.cypher.runMany('cypher;nstatements;',{params},{config})
apoc.cypher.mapParallel(fragment, params, list-to-parallelize)
Cypher Execution

Check out the other
periodic procs
Try apoc.periodic.iterate
example
79

Warmup
• load page-cache
• page-skipping
• new implementation based on PageCache.*
• nodes + rels + rel-groups
• properties
• string / array properties
• index pages

Monitoring
• apoc.monitor.ids
• apoc.monitor.kernel
• apoc.monitor.store
• apoc.monitor.tx
• apoc.monitor.locks(minWaitTime long)

Conditional Cypher
Execution
85

Conditional Cypher Execution
CALL apoc.[do.]when(condition, ifQuery, elseQuery, params)
CALL apoc.[do.]case([condition, query, condition, query, …],
elseQuery, params)

Graph Generation
• apoc.generate.er(noNodes, noEdges, 'label', 'type')
Erdos-Renyi model (uniform)
• apoc.generate.ws(noNodes, degree, beta, 'label', 'type')
Watts-Strogatz model (clusters)
• apoc.generate.ba(noNodes, edgesPerNode, 'label', 'type')
Barabasi-Albert model (preferential attachment
• apoc.generate.complete(noNodes, 'label', 'type')
• apoc.generate.simple([degrees], 'label', 'type')

Locking
call apoc.lock.nodes([nodes])
call apoc.lock.rels([relationships])
call apoc.lock.all([nodes],[relationships])

Export
apoc.export.csv .all / .data / .query
apoc.export.cypher
apoc.export.graphml
leaving off filename does stream cypher to client

Data Creation
CALL apoc.create.node(['Label'], {key:value,…})
CALL apoc.create.nodes(['Label'], [{key:value,…}])
CALL apoc.create.addLabels, .removeLabels
CALL apoc.create.setProperty
CALL apoc.create.setProperties
CALL apoc.create.relationship(from,'TYPE',{key:value,…}, to)
CALL apoc.nodes.link([nodes],'REL_TYPE')

Virtual Entities
Function AND Procedure
apoc.create.vNode(['Label'], {key:value,…}) YIELD node
apoc.create.vRelationship(from,TYPE,{key:value,…}, to)
apoc.create.vPattern({_labels:[Label],key:value},'TYPE',
{key:value,…}, {_labels:['LabelB'],key:value})

Try
apoc.date.* with datetime()
text, coll, map, convert funcs
98

Latest Releases
Summer Release 3.4.0.2 (Aug 8)
Spring Release 3.4.0.1 (May 16)
Winter Release 3.3.0.2 (Feb 23)

TASK
Aggregation Functions
101

Latest Additions
• apoc.diff graph
• new text similarity functions
• CSV loader based on neo4j-
import format
• apoc.load.xls
• apoc.group.nodes
• Accessor functions for
(virtual) entities
• S3 Support
• HDFS Support
• apoc.index.addNodeMap
• apoc.path.create
• apoc.path.slice
• apoc.path.combine
• apoc.text.code(codepoint)
• stream apoc.export.cypher
• apoc.coll.combinations(),
apoc.coll.frequencies()
102

TASK
Which of these are you
interested in?
Ask / Try
103

Procedures / Functions from Cypher
CALL apoc.custom.asProcedure('answer','RETURN 42 as answer');
CALL custom.answer();
works also with parameters, and return columns declarations
CALL apoc.custom.asFunction('answer','RETURN $input','long',
[['input','number']]);
RETURN custom.answer(42) as answer;

Neo4j Developer Surface
Native LanguageDrivers
BOLT User Defined
Procedure
2000-2010 0.x Embedded Java API
2010-2014 1.x REST
2014-2015 2.x Cypher over HTTP
2016 3.0.x Bolt, Official Language Drivers, User Defined Procedures
2016 3.1.x User Defined Functions
2017 3.2.x User Defined Aggregation Functions

Procedures
Functions
Aggregate Functions

Can be written in any JVM language

Callable Standalone
and in
Cypher Statements

CALL example.search('User','name:Brook*')

How to build them
Developer Manual

Build a procedure or function
you'd like
start with
the template repo
github.com/neo4j-examples/neo4j-procedure-template
112

User Defined Procedures
User-defined procedures are
● @Procedure annotated, named Java Methods
○ default name: package + method
● take @Name'ed parameters (3.1. default values)
● return a Stream of value objects
● fields are turned into columns
● can use @Context injected GraphDatabaseService etc
● run within Transaction

public class FullTextIndex {
@Context
public GraphDatabaseService db;
@Procedure( name = "example.search", mode = Procedure.Mode.READ )
public Stream<SearchHit> search( @Name("index") String index,
@Name("query") String query ) {
if( !db.index().existsForNodes( index )) {
return Stream.empty();
}
return db.index().forNodes( index ).query( query ).stream()
.map( SearchHit::new );
}
public static class SearchHit {
public final Node node;
SearchHit(Node node) { this.node = node; }
}
}

try ( Driver driver = GraphDatabase.driver( "bolt://localhost",
Config.build().toConfig() ) ) {
try ( Session session = driver.session() ) {
String call = "CALL example.search('User',$query)";
Map<String,Object> params = singletonMap( "query", "name:Brook*");
StatementResult result = session.run( call, params);
while ( result.hasNext() {
// process results
}
}
}
Deploy & Register in Neo4j Server via neo4j-harness
Call & test via neo4j-java-driver

Deploying User Defined Procedures
Build or download (shadow) jar
● Drop jar-file into $NEO4J_HOME/plugins
● Restart server
● Procedure should be available
● Otherwise check neo4j.log / debug.log

Useable in any Cypher
expression or lightweight
computation

RETURN example.join(['Hello', 'World'],' ')
=> "Hello World"

public class Join {
@UserFunction
@Description("example.join(['s1','s2',...], delimiter)
- join the given strings with the given delimiter.")
public String join(
@Name("strings") List<String> strings,
@Name(value = "delimiter", defaultValue = ",") String delimiter ) {
if ( strings == null || delimiter == null ) {
return null;
}
return String.join( delimiter, strings );
}
}

Config.build().toConfig() ) )
{
try ( Session session = driver.session() )
{
String query = "RETURN example.join(['Hello', 'World']) AS result";
String result = session.run( query )
.single().get( "result" ).asString();
}
}

User Defined
Aggregation Functions
127

Custom, efficient aggregations
for Data Science and BI

Aggregation Function In APOC
• more efficient variants of collect(x)[a..b]
• apoc.agg.nth, apoc.agg.first, apoc.agg.last, apoc.agg.slice
• apoc.agg.median(x)
• apoc.agg.percentiles(x,[0.5,0.9])
• apoc.agg.product(x)
• apoc.agg.statistics() provides a full numeric statistic

UNWIND ['abc', 'abcd', 'ab'] AS string
RETURN example.longestString(string)
=> 'abcd'

public class LongestString {
@UserAggregationFunction
@Description( "aggregates the longest string found" )
public LongStringAggregator longestString() {
return new LongStringAggregator();
}
public static class LongStringAggregator {
private int longest;
private String longestString;
@UserAggregationUpdate
public void findLongest( @Name( "string" ) String string ) {
if ( string != null && string.length() > longest) {
longest = string.length();
longestString = string;
}
}
@UserAggregationResult
public String result() { return longestString; }
}
}

Config.build().toConfig() ) ) {
try ( Session session = driver.session() ) {
String query = "UNWIND ['abc', 'abcd', 'ab'] AS string " +
"RETURN example.longestString(string) AS result";
String result = session.run(query).single().get("result").asString();
}
}

One Question / Comment
from each!

APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library

Similar to APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library (20)

More from jexp

More from jexp (20)

Recently uploaded

Recently uploaded (20)

APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Library