Solr 6.0 Graph Query Overview

Solr 6.0 Graph Query Overview
Kevin Watters KMW Technology
kwatters@kmwllc.com
http://www.kmwllc.com/
03/29/2016

KMW Technology Overview
 Boston based software consulting and
professional services organization.
 Founded in 2010.
 Seven consultants with deep industry
experience.
 Boutique firm specializing in Search
and Big Data technologies.
 Custom Connectors, Pipelines,
Search, Analytics, and UI
development.

Search, Join, vs Graph
 Which query should I use?
 Search is for flat data, no relationships
◦ Data often de-normalized, updates require large
amounts of re-indexing potentially.
 Join is for one level of relationships
◦ Data is normalized, but for more than 2 tables
involved, join queries must be nested.
 Graph is for arbitrary depth/levels of
relationships.
◦ Data can be completely normalized, arbitrary
numbers of tables can be joined together.
 A one level hop on a graph is roughly
equivalent to a join query.

What is a Graph?
A generic representation of all data
models.
“One data model to rule them all”!
G = <V,E> ?!?!
 Vertices/Nodes
◦ Can have properties as key value pairs.
 Edges
◦ Can have properties as key value pairs

Graph Traversal
There are many graph traversal /
exploration algorithms. DFS, BFS, A*,
Alpha–beta, etc…
Solr graph query implements “BFS”
Breadth-first search, each hop expands
the “Frontier” of the graph. It explores
all current edges in a single step, also
known as a “hop”

Key Features and Design Goals
“Graph is a Filter on top of your data”
-someone
 Designed for large scale and large number of
edges and very deep traversals.
 Limited memory usage for traversal
 Cycle detection for “free”
 Highly cacheable
 Support multiValued fields for nodes and/or
edges
 Support filters during the traversal
 Follow Every Edge! No edge left behind!
 Works with Facets & Facet Queries!

A Word about Memory Usage
 One bit set to rule them all!
 BitSet provides cycle detection implicitly.
(Have I been here before?)
 BitSet is equal to the size of the index.
 100 Million doc index only uses about 12
MB per query! (Same size as 1 filter
cache entry!)
 Additional bitsets may be used during
query execution depending on query
params. (leaf nodes and root nodes
bitsets)

Graph Query Parser Syntax
Parameter Default Description
from field containing the node id
to Field contaning the edge id(s)
maxDepth -1
The number of hops to traverse from the root of the graph. -1 means
traverse until all edges and documents have been collected. maxDepth=1
is similar behavior to a JOIN.
traversalFilter null arbitrary query string to apply at each hop of the traversal
returnRoot true
true|false – indication of if the documents matching the root query should
be returned.
leafNodesOnly false
true|false – indication to return only documents in the result set that do not
have a value in the “to” field.
useAutn True Performance trade off based on use case. Mileage may vary.
Uses Solr’s query parser plugin and “local params” syntax
{!graph param=”value” … }

Princeton Wordnet
Princeton Wordnet has an ontology for many of the
words in the English language. These
relationships contain hierarchies of words that
represent a more general and a more specific class
of relatonships.
 https://wordnet.princeton.edu/
 Words have a “sense”, or meaning.
 Hypernym is a more specific related word.
 Hyponem is a more general related word.
◦ Jaguar is a type of Cat
◦ Large Cat is a type of Animal
 Intersections of this hierachy can answer
questions: “Is a jaguar an animal?”

Wordnet Hypernym Traversal
Start traversing from the word sense “jaguar” up the hypernym graph 9 levels.
+{!graph from="synset_id" to="hypernym_id" maxDepth=9}sense_lemma:jaguar

Wordnet Graph Intersections
Is a jaguar an animal? Query for an
intersection between the two graphs.
If a graph intersection exists, the answer is yes!

OpenCV, Video Recognition
 Imagine indexing each frame of video
from security cameras. Pass each
frame of video through OpenCV for
object recognition & face recognition.
 Each frame has a frame number of it’s
frame and the previous frame.
 Search for object/face “A” detected,
followed by object/face “B” detected,
across all of your video streams.

Users , Items and Actions
 Model your browsing/purchase history as
◦ Users (have an ID)
◦ Items (have an ID, metadata, category, etc)
◦ Actions (link between user and Items, such
as rating, purchase, like/dislike)
User -> Action -> Item -> Action -> User …
Use Graph + maxDepth to get from a user to
an item. maxDepth = 2… gets from a user to
an Item. maxDepth = 4 .. Gets from one user
to a new set of users, and on and on.

Actions occur over time
 These events can’t easily be
aggregated or flattened onto a record.
 Model this as a “person” record, with a
set of “action” records.
 Each action record has the id of the
“previous” action.
 Search for an action, graph traverse
based on person id to another action,
then finally to the person record.

Find similar users
 Graph traversal from a user (or set of
users) through their actions to items
they like, to find similar users, and out
to items they like.
 Now, exclude the original starting set
 “returnRoot=false”

Graph Query For Security
 Graph queries are elegant and simple
to use for traversing security
hierarchies such as LDAP and AD
 Custom security models that are
hierarchical or folder based in nature.

Example Company with Security Model

Document/Security Model within the Solr
Index

Security Query
 Single security query term to traverse the entire graph
{!graph from=“node_id” to=“edge_ids” returnOnlyLeaf=“true”}id:user_1
 The query is applied as a FilterQuery to the query request,
normal query is user for filtering against documents

FoaF
 Friend of a Friend of a Friend of a Friend…
 2 ways to model in the index.
 Multi-valued “friendid” field that points to other
person records.
◦ More efficient and faster search.
◦ Filter traversal based on metadata on the person
record.
 Single value field and on a document that
represents the link/edge between two person
records.
◦ More flexible slower search.
◦ Can filter edges with metadata about the edge
record..

Graph Analytics via Faceting
What do my friend’s friends like that live in
Boston?
 Identify a graph/ dataset with a graph query
to identify the people records.
 Use facets to generate analytics on the result
set based on the values in the person record
“like” field.
 Use drill down to understand characteristics
of different demographics/cohorts.
 Get counts at various levels using maxDepth
graph queries as facet queries.

What next?
 Edge weights & Relevancy
◦ Based on tf/idf or bm25?
◦ Based on numerical field values (min/max/sum/avg
weight application)?
 Min distance computation
 Better support for D3.js and other Visualization
tools
 Driving directions?
 Distributed Traversal via Kafka frontier query
broker
 SparkRDD Support? GraphX?
 minDepth parameter? Only return records that
are at least N hops away?

Additional Detail
 Graph Query Solr tickets:
https://issues.apache.org/jira/browse/SOLR-7543
Questions?
info@kmwllc.com
http://www.kmwllc.com/

Solr 6.0 Graph Query Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Solr 6.0 Graph Query Overview

Similar to Solr 6.0 Graph Query Overview (20)

Recently uploaded

Recently uploaded (20)

Solr 6.0 Graph Query Overview