Presented by Grant Paton-Simpson
Word Puzzles with
Neo4j and Py2neo
Overview
●
Brief look at graph databases & Neo4j
●
Introduction to word transformation game
●
Getting suitable words
●
Adding words and relationships into Neo4j
●
Querying graph data to generate puzzles
Graph Databases – a NoSQL option
http://neo4j.com/books/graph-databases/
NoSQL – when is it a good fit?
●
SQL has its origins in the 1970s
and may not be fresh and shiny
any more but ...
●
… we shouldn't choose NoSQL
for reasons of fashion.
●
Venerable SQL often a better
choice for standard hierarchies
e.g. countries that have cities
that have suburbs etc
https://twitter.com/edd/status/400190499585544192
Graph Databases
●
Graph databases much, much better for related data with:
– lots of different links between same nodes
– different numbers of links between nodes
e.g. 3 hops to one peer and 7 hops to another
– lots of peer-to-peer links
Substantial Benefits
●
Massive performance benefits (going exponential as number
of links grows)
●
Structural harmony
– between structure of data and structure of data storage
(what you draw on the whiteboard might look very similar
to how you data is actually structured)
– between questions of data and query language used to
answer them
Word transformations
●
Start with one word and get to
the other by single-letter
tranformations word-by-word
●
E.g. starting with “stores” get to
“slaked”
– BTW there are 96 alternative
ways 5 moves or less
stores
stored
stared
staked
slaked
Puzzle taster
Get from 'sloven' to 'closed' in
no more than 5 steps
(there are 10 unique solutions)
sloven
?
closed
Getting a simple word list
●
How hard could it be?
●
Lesson #1 – scrabble lists and similar are useless – only want lists
with standard words otherwise puzzles too hard
●
Lesson #2 – have to decide about taboo/profane words
●
Lesson #3 – the number of words affects the number of
ONE_LETTER_DIFF relationships a lot
●
Lesson #4 – clever optimisation not needed if restricting self to
ordinary words
SCOWL (Spell Checker Oriented Word Lists) http://wordlist.aspell.net/
Filtering words
●
Needed to turn é to e
●
Needed to eliminate possessives e.g. cat's (as used in the phrase “the
cat's whiskers”)
●
Needed to leave out capitalised words
For each word, identifying words different
by one letter only
Disclaimer: the code worked but probably some super-smart optimisations
would be possible involving n-dimensional space or something
Adding data to Neo4j
●
Create nodes and relationships
●
Lots of room for optimisations
●
Only need to build database once so 15 minutes is not worth
reducing
●
My Neo4j and Py2neo is beginner level but I was able to solve my
problem
Py2neo and Cypher
Cypher Syntax as ASCII Art (Really!)
Word Word
ONE_OFF
(Word) -[ONE_OFF]->(Word)
Cypher Syntax as ASCII Art (Really!)
Word Word
ONE_OFF
(Word) -[ONE_OFF]->(Word)
How cool is this?
Example Output
Matching chart
Live Demo – Suggestions for Start Word
“sloven” to “closed” solution(s)
Resources
●
Neo4j
– http://neo4j.com/books/graph-databases/
– http://neo4j.com/graphacademy/
– http://graphgist.neo4j.com/#!/gists
– https://www.youtube.com/channel/UCvze3hU6OZBkB1vkhH2lH9Q
●
Py2neo
– http://py2neo.org/2.0/
●
SCOWL
– http://wordlist.aspell.net/
About Catalyst

Word Puzzles with Neo4j and Py2neo

  • 1.
    Presented by GrantPaton-Simpson Word Puzzles with Neo4j and Py2neo
  • 2.
    Overview ● Brief look atgraph databases & Neo4j ● Introduction to word transformation game ● Getting suitable words ● Adding words and relationships into Neo4j ● Querying graph data to generate puzzles
  • 3.
    Graph Databases –a NoSQL option http://neo4j.com/books/graph-databases/
  • 4.
    NoSQL – whenis it a good fit? ● SQL has its origins in the 1970s and may not be fresh and shiny any more but ... ● … we shouldn't choose NoSQL for reasons of fashion. ● Venerable SQL often a better choice for standard hierarchies e.g. countries that have cities that have suburbs etc
  • 5.
  • 6.
    Graph Databases ● Graph databasesmuch, much better for related data with: – lots of different links between same nodes – different numbers of links between nodes e.g. 3 hops to one peer and 7 hops to another – lots of peer-to-peer links
  • 7.
    Substantial Benefits ● Massive performancebenefits (going exponential as number of links grows) ● Structural harmony – between structure of data and structure of data storage (what you draw on the whiteboard might look very similar to how you data is actually structured) – between questions of data and query language used to answer them
  • 8.
    Word transformations ● Start withone word and get to the other by single-letter tranformations word-by-word ● E.g. starting with “stores” get to “slaked” – BTW there are 96 alternative ways 5 moves or less stores stored stared staked slaked
  • 9.
    Puzzle taster Get from'sloven' to 'closed' in no more than 5 steps (there are 10 unique solutions) sloven ? closed
  • 10.
    Getting a simpleword list ● How hard could it be? ● Lesson #1 – scrabble lists and similar are useless – only want lists with standard words otherwise puzzles too hard ● Lesson #2 – have to decide about taboo/profane words ● Lesson #3 – the number of words affects the number of ONE_LETTER_DIFF relationships a lot ● Lesson #4 – clever optimisation not needed if restricting self to ordinary words SCOWL (Spell Checker Oriented Word Lists) http://wordlist.aspell.net/
  • 11.
    Filtering words ● Needed toturn é to e ● Needed to eliminate possessives e.g. cat's (as used in the phrase “the cat's whiskers”) ● Needed to leave out capitalised words
  • 12.
    For each word,identifying words different by one letter only Disclaimer: the code worked but probably some super-smart optimisations would be possible involving n-dimensional space or something
  • 13.
    Adding data toNeo4j ● Create nodes and relationships ● Lots of room for optimisations ● Only need to build database once so 15 minutes is not worth reducing ● My Neo4j and Py2neo is beginner level but I was able to solve my problem
  • 14.
  • 15.
    Cypher Syntax asASCII Art (Really!) Word Word ONE_OFF (Word) -[ONE_OFF]->(Word)
  • 16.
    Cypher Syntax asASCII Art (Really!) Word Word ONE_OFF (Word) -[ONE_OFF]->(Word) How cool is this?
  • 17.
  • 18.
  • 19.
    Live Demo –Suggestions for Start Word
  • 20.
  • 21.
    Resources ● Neo4j – http://neo4j.com/books/graph-databases/ – http://neo4j.com/graphacademy/ –http://graphgist.neo4j.com/#!/gists – https://www.youtube.com/channel/UCvze3hU6OZBkB1vkhH2lH9Q ● Py2neo – http://py2neo.org/2.0/ ● SCOWL – http://wordlist.aspell.net/
  • 22.