Natural Language Processing with Graph Databases and Neo4j

Natural Language Processing
With Graph Databases
DataDay Texas
January 2016
William Lyon
@lyonwj

About
Software Developer @Neo4j
william.lyon@neo4j.com
@lyonwj
lyonwj.com
William Lyon

Agenda
• Brief intro to graph databases / Neo4j
• Representing text as a graph
• NLP tasks
• Mining word associations
• Graph based summarization and keyword
extraction
• Content recommendation

Agenda
• Brief intro to graph databases / Neo4j
• Representing text as a graph
• NLP tasks
• Mining word associations
• Graph based summarization and keyword
extraction
Survey of NLP
methods with graphs

Intro to Graph Databases / Neo4j

Neo4j
Graph Database
• Property graph data model
• Nodes and relationships
• Native graph processing
• Cypher query language

The Whiteboard Model Is the Physical Model

Relational Versus Graph Models
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREAS
DELIA
TOBIAS
MICA

Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and
direction
• Can have name-value properties
CAR
DRIVES
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:  
Jan 10, 2011
brand: “Volvo”
model: “V70”
LOVES
LOVES
LIVES WITH
OW
NS
PERSON PERSON

Cypher: Graph Query Language
CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY

“So what does this have to do with NLP?”
“Am I in the wrong talk?”
“I thought this was going to be about text processing….”

Natural Language Processing With Graphs

Natural Language Processing With Graphs
Uncovering meaning from text using a graph data model.

Representing Text As A Graph
“Nearly all text processing starts
by transforming text into vectors.”
- Matt Biddulph
www.hackdiary.com

Representing text as a graph
Text Adjacency Graph

Iterate with counter variable i,
from 0 to number of words - 2

Get or create node for
words at index i and i+1

Representing A Text Corpus As A Graph

Query Word pair frequencies (colocation)

Word Associations
• Paradigmatic
• words that can be substituted
• “Monday” <—> “Thursday”
• “cat” <—> “dog”
• Syntagmatic
• words that can be combined with each other
• “cold”, “weather”
• colocations

Computing Paradigmatic Similarity
1. Represent each word by its context
2. Compute context similarity
3. Words with high context similarity likely have
paradigmatic relation

Paradigmatic Similarity

Left1 Right1

www.lyonwj.com/2015/06/16/nlp-with-neo4j/

3. Find words with high context similarity
http://earthlab.uoi.gr/theste/index.php/theste/article/viewFile/55/37CEEAUS corpus

Example
http://www.lyonwj.com/2015/06/16/nlp-with-neo4j/
https://github.com/johnymontana/nlp-graph-notebooks
https://class.coursera.org/textanalytics-001

Graph Based Summarization
and Keyword Extraction

image credit: https://en.wikipedia.org/wiki/PageRank
https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf
https://github.com/summanlp/textrank
Keyword Extraction

• Opinion mining
• Summarize major opinions
• Concise and readable
• Major complaints /
compliments

http://kavita-ganesan.com/opinosis
1.Graph based representation of
review corpus
2.Find and score candidate
summaries
3.Select top scoring candidates
as summary

Opinion Mining - Example
• Best Buy API
• Product reviews by SKU

1.Graph based representation
of review corpus
2.Find and score candidate
summaries
3.Select top scoring candidates
as summary

Find highest ranked paths of 2-5 words

Opinion Mining - Demo
“Easy to read in sunlight”
“Comfortable great sound quality”
“I love this washer”

“Bought this smart TV for the price”
“Easy to use this vacuum”

• iPython notebook

Content recommendation
“Networks give structure to the conversation
while content mining gives meaning.”
http://breakthroughanalysis.com/2015/10/08/ltapreriitsouda/
- Preriit Souda

Using Data Relationships for
Recommendations
Content-based filtering
Recommend items based on what
users have liked in the past
Collaborative filtering
Predict what users like based on the
similarity of their behaviors,
activities and preferences to others
Movie
Person
Person
RATED
SIMILARITY
rating: 7
value: .92

Using Data Relationships for
Recommendations
Content-based filtering
Recommend items based on what
users have liked in the past
Movie
Person
Person
RATED
SIMILARITY
rating: 7
value: .92

The article graph - data model

Building the article graph
• Articles users have shared
• Extract keywords using newspaper3k
python library
• Insert in the graph
• Scrape additional articles

What are the keywords of the articles I liked?

Summary
• Property graph model
• Represent text as a graph
• Word associations
• Opinion mining

Resources
• http://kavita-ganesan.com/opinosis
• http://jexp.de/blog/2015/01/natural-language-
analytics-made-simple-and-visual-with-neo4j/
• https://github.com/johnymontana/nlp-graph-notebooks

Opinion Mining
• “Opinosis: A Graph Based Approach to Abstractive
Summarization of Highly Redundant Opinions”
• - Kavita Ganesan, Cheng Xiang Zhai, Jiawei Han University
of Illinois at Urbana-Champaign
• Multi-sentence compression: Finding shortest paths in word
graphs
• - Proceedings of the 23rd International Conference on
Computational Linguistics. COLING 10. Beijing, Cina
Aug23-27, 2010. Katy Fillipova

Natural Language Processing with Graph Databases and Neo4j

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Natural Language Processing with Graph Databases and Neo4j

Similar to Natural Language Processing with Graph Databases and Neo4j (20)

Recently uploaded

Recently uploaded (20)

Natural Language Processing with Graph Databases and Neo4j