Lucene-powered graph exploration with Solr and Elasticsearch

Graphs, graphs everywhere
Zbyszko Papierski, Senior Dev@JIRA Cloud,
T:@ZPapierski
E: zbyszko.papierski@gmail.com
Lucene powered relation exploration

Agenda
1. Introduction to Lucene and friends
2. Evolution of data analysis by Solr and Elasticsearch
3. Graph capabilities of Elasticsearch(brieﬂy)
4. Solr - QueryParserPlugin
5. Solr - Streaming Expressions
6. Examples

1. Create a collection
2. Put schema
3. Run feeder

Lucene
Provides mechanism for fast searching of text data - both full-text
search(analyzed data) and exact match(non-analyzed, or docValues)

Step one - indexing
{kitty|kitten|cat|cats|kittens|pussy} —> cat
{is} —>
{GORGEOUS!!!} —> gorgeous, pretty, nice, etc.

Step one - searching
{very} —> very
{nice} —> nice
{kitty} —> cat
{nice, cat, …} {very, ugly, cat, …}
{very,nice, dog, …}
{very, nice, bear, …}

Step one - scoring
{very} —> very
{nice} —> nice
{kitty} —> cat
{nice, cat, …} {very, ugly, cat, …}
{very,nice, dog, …}
{very, nice, bear, …}

Winner!
nice and cat score
higher than very and nice
or very and cat
because cat is rarer than very
this is only an example, all cats are nice…

Solr
Older, works closer with Lucene

Elasticsearch
Newer, but with more toys

Waiter, there is a graph in my full-text search engine!
are relations

• full text searching
• faceting/aggregation
• statistical
• relationship exploration
How did we get here?

1. Your standard, full-text search
2. TF-IDF-ish relationship sorting
3. It’s already there

It’s still your standard Lucene index

• From Elasticsearch 2.3
• REST API - /_graph/explore
• visualization for Kibana
• Part of elastic commercial offering (named
from 5.0 X-Pack)
Elasticsearch+Kibana
Plugin for Elasticsearch and Kibana - Graph
picture from: https://www.elastic.co/guide/en/graph/current/graph-introduction.html

• Available from Solr 6.0
• experimental feature
• currently, works for single node, single core
applications (due to change)
• no 1st party visualization
• does not track edges of the traversal
Solr
built-in GraphQueryParser
picture from: http://solr.pl/2016/04/25/wizualizacja-grafow-przy-pomocy-solr-6/

• Available from Solr 5.5
• experimental feature
• no 1st party visualization
• does track edges of the traversal and level
Solr
built-in Streaming Expressions
picture from: http://solr.pl/2016/04/25/wizualizacja-grafow-przy-pomocy-solr-6/

fq={!graph from=email to=friends maxDepth=2}email:"susan.gardner@example.com"

Params
traversalFilter
Filter query used to ﬁlter out incoming nodes on each iteration

Params
returnRoot
Should the root set of documents (found by initial query) be returned. Default: true

Params
returnOnlyLeaf
Should only leaf documents be returned. Default: false

Streaming Expressions
• New alternative way of creating and processing queries
• allow chaining functions
• also experimental
• graph functions - shortestPath, gatherNodes, scoreNodes

shortestPath
• one of the source functions - function producing tuple stream
• returns shortest path between to given nodes using iterative breadth-ﬁrst search of the graph

shortestPath - params
• collection - collection to perform the search
• from - starting node
• to - ending node
• edge - definition of edge, in format <from-field>=<to_field>
• fq - filter query, which filters out nodes taken into account
• maxDepth - maximal depth of the traversal

gatherNodes
• transforms input document stream to stream of accessible, through graph
traversal, documents
• can return edges
• allows nesting functions
• works for multi-collection streams, irregardless of number of cluster nodes
• is also a source function
• currently does not support multivalued ﬁelds

gatherNodes - params
• collection - collection on which function will be performed
• walk - defines starting nodes and the field, e.g. „zpapierski@atlassian.com->from”
• gather - defines which fields are gathered
• scatter - parameter that can have values(one or both):
• leaves - emits only leaf nodes (outer-most ones)
• branches - emits nodes leading up to leaves (root node is a branch)
• fq - filter query that filters out nodes
• maxDocFreq - every node in the result over this number is filtered out
Aggregations, cross-collection gathering and combining with other streaming expressions
is possible

scoreNodes
• Function user only with output of gatherNodes
• Score document relevancy, using TF-IDF formula
• As TF - how often document appeared on graph traversal
• IDF is fetched from documents original collection
• Adds additional ﬁeld, nodeScore, to the output stream

Lucene-powered graph exploration with Solr and Elasticsearch

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (8)

Similar to Lucene-powered graph exploration with Solr and Elasticsearch

Similar to Lucene-powered graph exploration with Solr and Elasticsearch (20)

Recently uploaded

Recently uploaded (20)

Lucene-powered graph exploration with Solr and Elasticsearch