Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Natural Language Processing
With Graph Databases
DataDay Texas
January 2016
William Lyon
@lyonwj
About
Software Developer @Neo4j
william.lyon@neo4j.com
@lyonwj
lyonwj.com
William Lyon
Agenda
• Brief intro to graph databases / Neo4j
• Representing text as a graph
• NLP tasks
• Mining word associations
• Gr...
Agenda
• Brief intro to graph databases / Neo4j
• Representing text as a graph
• NLP tasks
• Mining word associations
• Gr...
Intro to Graph Databases / Neo4j
Charts
Charts Graphs
Neo4j
Graph Database
• Property graph data model
• Nodes and relationships
• Native graph processing
• Cypher query langua...
The Whiteboard Model Is the Physical Model
Relational Versus Graph Models
Relational Model Graph Model
KNOWS
KNOWS
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerso...
Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relatio...
Cypher: Graph Query Language
CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROP...
“So what does this have to do with NLP?”
“Am I in the wrong talk?”
“I thought this was going to be about text processing….”
Natural Language Processing With Graphs
Natural Language Processing With Graphs
Uncovering meaning from text using a graph data model.
Representing Text As A Graph
“Nearly all text processing starts
by transforming text into vectors.”
- Matt Biddulph
www.ha...
Representing text as a graph
Text Adjacency Graph
Representing text as a graph
Text Adjacency Graph
My cat eats fish on Saturday.
Convert to array of words
Iterate with counter variable i,
from 0 to number of words - 2
Get or create node for
words at index i and i+1
Create :NEXT relationship
Representing A Text Corpus As A Graph
Add followship frequency
Add word counts
Query Word frequency
Query Word pair frequencies (colocation)
NLP Tasks
Mining Word Associations
Word Associations
• Paradigmatic
• words that can be substituted
• “Monday” <—> “Thursday”
• “cat” <—> “dog”
• Syntagmatic...
Computing Paradigmatic Similarity
1. Represent each word by its context
2. Compute context similarity
3. Words with high c...
Paradigmatic Similarity
1. Represent each word by its context
Paradigmatic Similarity
1. Represent each word by its context
Paradigmatic Similarity
1. Represent each word by its context
Left1 Right1
Paradigmatic Similarity
2. Compute context similarity
Paradigmatic Similarity
2. Compute context similarity
Paradigmatic Similarity
2. Compute context similarity
www.lyonwj.com/2015/06/16/nlp-with-neo4j/

Paradigmatic Similarity
3. Find words with high context similarity
http://earthlab.uoi.gr/theste/index.php/theste/article/...
Paradigmatic Similarity
Example
http://www.lyonwj.com/2015/06/16/nlp-with-neo4j/
https://github.com/johnymontana/nlp-graph...
Graph Based Summarization
and Keyword Extraction
image credit: https://en.wikipedia.org/wiki/PageRank
https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf
http...
Summarization
Opinion mining
• Opinion mining
• Summarize major opinions
• Concise and readable
• Major complaints /
compliments
http://kavita-ganesan.com/opinosis
1.Graph based representation of
review corpus
2.Find and score candidate
summaries
3.Se...
Opinion Mining - Example
• Best Buy API
• Product reviews by SKU
Opinion Mining - Example
Opinion Mining - Example
Opinion Mining - Example
1.Graph based representation
of review corpus
2.Find and score candidate
summaries
3.Select top s...
Opinion Mining - Example
Find highest ranked paths of 2-5 words
Opinion Mining - Demo
“Easy to read in sunlight”
“Comfortable great sound quality”
“I love this washer”
Opinion Mining - Demo
“Bought this smart TV for the price”
“Easy to use this vacuum”
Opinion Mining - Demo
• iPython notebook
https://github.com/johnymontana/nlp-graph-notebooks
Content Recommendation
Content recommendation
“Networks give structure to the conversation
while content mining gives meaning.”
http://breakthrou...
Using Data Relationships for
Recommendations
Content-based filtering
Recommend items based on what
users have liked in the...
Using Data Relationships for
Recommendations
Content-based filtering
Recommend items based on what
users have liked in the...
The article graph - data model
Building the article graph
• Articles users have shared
• Extract keywords using newspaper3k
python library
• Insert in th...
The article graph - example
What are the keywords of the articles I liked?
Summary
• Property graph model
• Represent text as a graph
• Word associations
• Opinion mining
• Content recommendation
Resources
graphdatabases.com
Resources
• http://kavita-ganesan.com/opinosis
• http://jexp.de/blog/2015/01/natural-language-
analytics-made-simple-and-v...
Opinion Mining
• “Opinosis: A Graph Based Approach to Abstractive
Summarization of Highly Redundant Opinions”
• - Kavita G...
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
Upcoming SlideShare
Loading in …5
×

Natural Language Processing with Graph Databases and Neo4j

10,332 views

Published on

Originally presented at DataDay Texas in Austin, this presentation shows how a graph database such as Neo4j can be used for common natural language processing tasks, such as building a word adjacency graph, mining word associations, summarization and keyword extraction and content recommendation.

Published in: Data & Analytics

Natural Language Processing with Graph Databases and Neo4j

  1. 1. Natural Language Processing With Graph Databases DataDay Texas January 2016 William Lyon @lyonwj
  2. 2. About Software Developer @Neo4j william.lyon@neo4j.com @lyonwj lyonwj.com William Lyon
  3. 3. Agenda • Brief intro to graph databases / Neo4j • Representing text as a graph • NLP tasks • Mining word associations • Graph based summarization and keyword extraction • Content recommendation
  4. 4. Agenda • Brief intro to graph databases / Neo4j • Representing text as a graph • NLP tasks • Mining word associations • Graph based summarization and keyword extraction • Content recommendation Survey of NLP methods with graphs
  5. 5. Intro to Graph Databases / Neo4j
  6. 6. Charts
  7. 7. Charts Graphs
  8. 8. Neo4j Graph Database • Property graph data model • Nodes and relationships • Native graph processing • Cypher query language
  9. 9. The Whiteboard Model Is the Physical Model
  10. 10. Relational Versus Graph Models Relational Model Graph Model KNOWS KNOWS KNOWS ANDREAS TOBIAS MICA DELIA Person FriendPerson-Friend ANDREAS DELIA TOBIAS MICA
  11. 11. Property Graph Model Components Nodes • The objects in the graph • Can have name-value properties • Can be labeled Relationships • Relate nodes by type and direction • Can have name-value properties CAR DRIVES name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: 
 Jan 10, 2011 brand: “Volvo” model: “V70” LOVES LOVES LIVES WITH OW NS PERSON PERSON
  12. 12. Cypher: Graph Query Language CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} ) LOVES Dan Ann LABEL PROPERTY NODE NODE LABEL PROPERTY
  13. 13. “So what does this have to do with NLP?” “Am I in the wrong talk?” “I thought this was going to be about text processing….”
  14. 14. Natural Language Processing With Graphs
  15. 15. Natural Language Processing With Graphs Uncovering meaning from text using a graph data model.
  16. 16. Representing Text As A Graph “Nearly all text processing starts by transforming text into vectors.” - Matt Biddulph www.hackdiary.com
  17. 17. Representing text as a graph Text Adjacency Graph
  18. 18. Representing text as a graph Text Adjacency Graph
  19. 19. My cat eats fish on Saturday.
  20. 20. Convert to array of words
  21. 21. Iterate with counter variable i, from 0 to number of words - 2
  22. 22. Get or create node for words at index i and i+1
  23. 23. Create :NEXT relationship
  24. 24. Representing A Text Corpus As A Graph
  25. 25. Add followship frequency
  26. 26. Add word counts
  27. 27. Query Word frequency
  28. 28. Query Word pair frequencies (colocation)
  29. 29. NLP Tasks
  30. 30. Mining Word Associations
  31. 31. Word Associations • Paradigmatic • words that can be substituted • “Monday” <—> “Thursday” • “cat” <—> “dog” • Syntagmatic • words that can be combined with each other • “cold”, “weather” • colocations
  32. 32. Computing Paradigmatic Similarity 1. Represent each word by its context 2. Compute context similarity 3. Words with high context similarity likely have paradigmatic relation
  33. 33. Paradigmatic Similarity 1. Represent each word by its context
  34. 34. Paradigmatic Similarity 1. Represent each word by its context
  35. 35. Paradigmatic Similarity 1. Represent each word by its context Left1 Right1
  36. 36. Paradigmatic Similarity 2. Compute context similarity
  37. 37. Paradigmatic Similarity 2. Compute context similarity
  38. 38. Paradigmatic Similarity 2. Compute context similarity www.lyonwj.com/2015/06/16/nlp-with-neo4j/

  39. 39. Paradigmatic Similarity 3. Find words with high context similarity http://earthlab.uoi.gr/theste/index.php/theste/article/viewFile/55/37CEEAUS corpus
  40. 40. Paradigmatic Similarity Example http://www.lyonwj.com/2015/06/16/nlp-with-neo4j/ https://github.com/johnymontana/nlp-graph-notebooks https://class.coursera.org/textanalytics-001
  41. 41. Graph Based Summarization and Keyword Extraction
  42. 42. image credit: https://en.wikipedia.org/wiki/PageRank https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf https://github.com/summanlp/textrank Keyword Extraction
  43. 43. Summarization Opinion mining
  44. 44. • Opinion mining • Summarize major opinions • Concise and readable • Major complaints / compliments
  45. 45. http://kavita-ganesan.com/opinosis 1.Graph based representation of review corpus 2.Find and score candidate summaries 3.Select top scoring candidates as summary
  46. 46. Opinion Mining - Example • Best Buy API • Product reviews by SKU
  47. 47. Opinion Mining - Example
  48. 48. Opinion Mining - Example
  49. 49. Opinion Mining - Example 1.Graph based representation of review corpus 2.Find and score candidate summaries 3.Select top scoring candidates as summary
  50. 50. Opinion Mining - Example Find highest ranked paths of 2-5 words
  51. 51. Opinion Mining - Demo “Easy to read in sunlight” “Comfortable great sound quality” “I love this washer”
  52. 52. Opinion Mining - Demo “Bought this smart TV for the price” “Easy to use this vacuum”
  53. 53. Opinion Mining - Demo • iPython notebook https://github.com/johnymontana/nlp-graph-notebooks
  54. 54. Content Recommendation
  55. 55. Content recommendation “Networks give structure to the conversation while content mining gives meaning.” http://breakthroughanalysis.com/2015/10/08/ltapreriitsouda/ - Preriit Souda
  56. 56. Using Data Relationships for Recommendations Content-based filtering Recommend items based on what users have liked in the past Collaborative filtering Predict what users like based on the similarity of their behaviors, activities and preferences to others Movie Person Person RATED SIMILARITY rating: 7 value: .92
  57. 57. Using Data Relationships for Recommendations Content-based filtering Recommend items based on what users have liked in the past Movie Person Person RATED SIMILARITY rating: 7 value: .92
  58. 58. The article graph - data model
  59. 59. Building the article graph • Articles users have shared • Extract keywords using newspaper3k python library • Insert in the graph • Scrape additional articles https://github.com/johnymontana/nlp-graph-notebooks
  60. 60. The article graph - example
  61. 61. What are the keywords of the articles I liked?
  62. 62. Summary • Property graph model • Represent text as a graph • Word associations • Opinion mining • Content recommendation
  63. 63. Resources
  64. 64. graphdatabases.com
  65. 65. Resources • http://kavita-ganesan.com/opinosis • http://jexp.de/blog/2015/01/natural-language- analytics-made-simple-and-visual-with-neo4j/ • https://github.com/johnymontana/nlp-graph-notebooks
  66. 66. Opinion Mining • “Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions” • - Kavita Ganesan, Cheng Xiang Zhai, Jiawei Han University of Illinois at Urbana-Champaign • Multi-sentence compression: Finding shortest paths in word graphs • - Proceedings of the 23rd International Conference on Computational Linguistics. COLING 10. Beijing, Cina Aug23-27, 2010. Katy Fillipova

×