SlideShare a Scribd company logo
Natural Language
Processing with Neo4j
Kenny Bastani
@kennybastani
This is a hobby of mine
I’m passionate about it
It’s always a work in progress
I do it for fun
Machine Learning Focuses
•

Text mining

•

Natural Language Processing

•

Automatic summarization

•

Graph databases

•

Commitment to unsupervised
learning.
Why NLP and Graphs?
I wanted a better way to learn with less
effort
I wanted something a little more
zippy.
I’m mostly self-taught, so I wanted
something that made self-learning
easier for others.
The Idea
Articles

Contain

un

d

in

Found in

Fo

Phrases

Sentences
Importance of NLP
•

I’m inspired by the idea of
machines learning from
experience.

•

NLP is important for finding
valuable information in noisy
unstructured text.

•

I’m a Developer Evangelist for
Neo4j, so I’m kind of a fan of
graph databases.
Algorithms can learn
As long as it can store information and retrieve it in enough
time for it to be of any use.
Learning requires storage
To learn, storage is required.
For NLP, storage is sometimes a
second class citizen.
Much focus is on the algorithm first,
then storage second.
But really, it’s storage and retrieval
of big data that is the problem.
Machine learning
Machine learning isn’t magic or hard to understand. It’s real stuff.
We know how to do it.
It’s easily articulated.
ML algorithms solve big computational problems today.
It’s based on the idea of machines learning from prior experiences
as data.
Formulate a Hypothesis
When you analyze data, the
outcome is usually a hypothesis.
An hypothesis is a conclusion based
on limited data.
There are always more pieces
needed to solve the puzzle.
Build on Past Experience
By experience, I mean DATA.
Machine Learning techniques are
entirely based on collection and
analysis of recorded data.
So storage is really important if you
want to do machine learning
successfully.
You cannot play baseball without
your brain. Don’t try it.
The Problem with AI
The problem with AI is that it seems like
magic.
Some people say strong AI is possible.
There are some people that deny that it is
possible.
It is a central theme in many fictional
fantasy films and book genres.
It’s in Greek mythology.
Is AI Misunderstood?
Researchers admit to not fully
understanding how intelligence
works in the human brain.
We generally understand how it
works, but no consensus on how to
recreate it in machines.
AI is really just the act of perceiving
an environment and maximizing
chances of success.
You get the point.
•

Now why is a Graph Database useful for unsupervised
machine learning?

•

Let’s consider the problem I stated earlier.

•

I wanted to build a better way to summarize and
learn from Wikipedia’s combined knowledge.
Unsupervised Learning on
Wikipedia
Articles

Contain

un

d

in

Found in

Fo

Phrases

Sentences
How do you learn about
learning?
I started by observing myself learning from reading
Wikipedia articles.
I searched for an interesting term on Google.
I read through the article’s text word by word.
The Learning Algorithm
As I read the article’s text, I would sometimes come
across a phrase or term I had not seen before.
Before continuing reading I would open up a new tab
and search for the unrecognized phrase.
It was a well defined recursive algorithm.
I would drill down n-times on unrecognized article
terms until returning to the original article text.
A Self-Learning Algorithm
In the computer’s world, this process
would result in an ontology of labeled
data.
Which looks a lot like a graph.
But how would I store the results?
If only there were a database for that..
Neo4j is a graph database
…and graphs are everywhere!
Contains
Article

Phrase

un

d

in

Found in

Fo

Sentence

Simple Clustering Model
Natural Language Processing with Neo4j
Natural Language Processing with Neo4j
Summarizing Article Text
What about the NLP stuff?
This is how I did it.
The seed article

You start with a seed article which is the first article text
to start the learning algorithm with.
Fetch text from Wikipedia

Get the unstructured text and meta data from
Wikipedia.
Sliding text window
I formulated dynamic RegEx templates and treated
them as a hypothesis.
The RegEx template would slide word by word through
the text, searching for unrecognized phrases
(n known word matches + 1 wildcard word match)
Looking for redundant phrases
As each unrecognized phrase is encountered, the
dynamic RegEx is then matched against the entire
article’s text.
The algorithm looks for more than 2 identical phrases
within the article’s text.
It appends a 3rd wildcard word match to the template
and then rescans the text for redundant phrases until
none are found.
Identify Redundancy of Text
This recursive matching process within the local article’s
text resulted in finding the duplicate phrases of a
variable length.
“The King of Sweden” has 2 appearances in an article,
so that must be important to the topic of Sweden.
Better go search for an article stub on “The King of
Sweden”
Graph Storage and Retrieval
Every time a phrase that doesn’t exist as a node in
Neo4j is encountered, it becomes a target of
investigation, kind of like a hypothesis.
Each sentence that contains the extracted phrase is also
added to Neo4j as a content node.
Relationships are added between nodes, showing
semantic relationship.
Phrase inheritance

Phrases can be found within other phrases, denoting a
grammatical inheritance hierarchy mapped to a variety
of content nodes and articles.
Phrase Inheritance Graph Data
Model

Article

Contains

Sentence
“X MEN.”

Found in

Fo

un

d

in

Phrase
“X Y”

Found in

Found in

Phrase
“X”

Sentence
“X Y Z.”

Found in

Fo

u

nd

Phrase
“X Y Z”

in

Contains

Article
Natural Language Processing with Neo4j
Graphs are everywhere.

Questions?
Thanks for coming to my talk!
Please look me up on Twitter and LinkedIn!
Twitter: http://www.twitter.com/kennybastani
LinkedIn: http://www.linkedin.com/in/kennybastani

More Related Content

What's hot

Search summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSearch summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slides
Sujit Pal
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
Sujit Pal
 
[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法
台灣資料科學年會
 
Natural Language processing
Natural Language processingNatural Language processing
Natural Language processing
Sanzid Kawsar
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
Rutu Mulkar-Mehta
 
OUTDATED Text Mining 5/5: Information Extraction
OUTDATED Text Mining 5/5: Information ExtractionOUTDATED Text Mining 5/5: Information Extraction
OUTDATED Text Mining 5/5: Information Extraction
Florian Leitner
 
Basics of Python and Intro to Machine Learning
Basics of Python and Intro to Machine LearningBasics of Python and Intro to Machine Learning
Basics of Python and Intro to Machine Learning
Manish Maharjan
 
Smart Data Webinar: Advances in Natural Language Processing I - Understanding
Smart Data Webinar: Advances in Natural Language Processing I - UnderstandingSmart Data Webinar: Advances in Natural Language Processing I - Understanding
Smart Data Webinar: Advances in Natural Language Processing I - Understanding
DATAVERSITY
 
Probabilistic programming
Probabilistic programmingProbabilistic programming
Probabilistic programming
Eli Gottlieb
 
Deutsche Telecom Expert System - Router Troubleshooting
Deutsche Telecom Expert System - Router TroubleshootingDeutsche Telecom Expert System - Router Troubleshooting
Deutsche Telecom Expert System - Router Troubleshooting
Vaticle
 
OUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text ClassificationOUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text Classification
Florian Leitner
 
Breaking Down NLP for SEOs - SMX Advanced Europe 2019 - Paul Shapiro
Breaking Down NLP for SEOs - SMX Advanced Europe 2019 - Paul ShapiroBreaking Down NLP for SEOs - SMX Advanced Europe 2019 - Paul Shapiro
Breaking Down NLP for SEOs - SMX Advanced Europe 2019 - Paul Shapiro
Paul Shapiro
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Edureka!
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
gulshan kumar
 

What's hot (14)

Search summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSearch summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slides
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
 
[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法
 
Natural Language processing
Natural Language processingNatural Language processing
Natural Language processing
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
OUTDATED Text Mining 5/5: Information Extraction
OUTDATED Text Mining 5/5: Information ExtractionOUTDATED Text Mining 5/5: Information Extraction
OUTDATED Text Mining 5/5: Information Extraction
 
Basics of Python and Intro to Machine Learning
Basics of Python and Intro to Machine LearningBasics of Python and Intro to Machine Learning
Basics of Python and Intro to Machine Learning
 
Smart Data Webinar: Advances in Natural Language Processing I - Understanding
Smart Data Webinar: Advances in Natural Language Processing I - UnderstandingSmart Data Webinar: Advances in Natural Language Processing I - Understanding
Smart Data Webinar: Advances in Natural Language Processing I - Understanding
 
Probabilistic programming
Probabilistic programmingProbabilistic programming
Probabilistic programming
 
Deutsche Telecom Expert System - Router Troubleshooting
Deutsche Telecom Expert System - Router TroubleshootingDeutsche Telecom Expert System - Router Troubleshooting
Deutsche Telecom Expert System - Router Troubleshooting
 
OUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text ClassificationOUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text Classification
 
Breaking Down NLP for SEOs - SMX Advanced Europe 2019 - Paul Shapiro
Breaking Down NLP for SEOs - SMX Advanced Europe 2019 - Paul ShapiroBreaking Down NLP for SEOs - SMX Advanced Europe 2019 - Paul Shapiro
Breaking Down NLP for SEOs - SMX Advanced Europe 2019 - Paul Shapiro
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 

Viewers also liked

Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
William Lyon
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
Document Classification with Neo4j
Document Classification with Neo4jDocument Classification with Neo4j
Document Classification with Neo4j
Kenny Bastani
 
Building a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformBuilding a Graph-based Analytics Platform
Building a Graph-based Analytics Platform
Kenny Bastani
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014
PyData
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
Max De Marzi
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
Max De Marzi
 
Natural Language Processing and Graph Databases in Lumify
Natural Language Processing and Graph Databases in LumifyNatural Language Processing and Graph Databases in Lumify
Natural Language Processing and Graph Databases in Lumify
Charlie Greenbacker
 
NLIDB(Natural Language Interface to DataBases)
NLIDB(Natural Language Interface to DataBases)NLIDB(Natural Language Interface to DataBases)
NLIDB(Natural Language Interface to DataBases)
Swetha Pallati
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
Neo4j
 
Neo4j -[:LOVES]-> Cypher
Neo4j -[:LOVES]-> CypherNeo4j -[:LOVES]-> Cypher
Neo4j -[:LOVES]-> Cypher
jexp
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
outsider2
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
Roelof Pieters
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
Debanjan Mahata
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
rohitnayak
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
Tobias Lindaaker
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Jaganadh Gopinadhan
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
Tobias Lindaaker
 

Viewers also liked (20)

Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache Spark
 
Document Classification with Neo4j
Document Classification with Neo4jDocument Classification with Neo4j
Document Classification with Neo4j
 
Building a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformBuilding a Graph-based Analytics Platform
Building a Graph-based Analytics Platform
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Natural Language Processing and Graph Databases in Lumify
Natural Language Processing and Graph Databases in LumifyNatural Language Processing and Graph Databases in Lumify
Natural Language Processing and Graph Databases in Lumify
 
NLIDB(Natural Language Interface to DataBases)
NLIDB(Natural Language Interface to DataBases)NLIDB(Natural Language Interface to DataBases)
NLIDB(Natural Language Interface to DataBases)
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Neo4j -[:LOVES]-> Cypher
Neo4j -[:LOVES]-> CypherNeo4j -[:LOVES]-> Cypher
Neo4j -[:LOVES]-> Cypher
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
 

Similar to Natural Language Processing with Neo4j

A smarter way to learn python (en)
A smarter way to learn python (en)A smarter way to learn python (en)
A smarter way to learn python (en)
Gagandeepsingh227859
 
Natural Language Processing.pptx
Natural Language Processing.pptxNatural Language Processing.pptx
Natural Language Processing.pptx
PriyadharshiniG41
 
Natural Language Processing.pptx
Natural Language Processing.pptxNatural Language Processing.pptx
Natural Language Processing.pptx
PriyadharshiniG41
 
Compton-Week5-Keywords.pptx
Compton-Week5-Keywords.pptxCompton-Week5-Keywords.pptx
Compton-Week5-Keywords.pptx
AmberPierdinock
 
Brainstorming
BrainstormingBrainstorming
Brainstorming
Barbara M. King
 
Strategies For Reading Comprehension
Strategies For Reading ComprehensionStrategies For Reading Comprehension
Strategies For Reading Comprehension
Christiane
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Gabe Wilberscheid
 
Literature searching
Literature searchingLiterature searching
Literature searching
azjackson
 
Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)
Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)
Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)
PiLNAfrica
 
Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)
Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)
Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)
Saide OER Africa
 
Learning to Learn Nivel 6
Learning to Learn Nivel 6Learning to Learn Nivel 6
Learning to Learn Nivel 6
coodinacionpci
 
thinkapjava_1
thinkapjava_1thinkapjava_1
thinkapjava_1
Alaa Khateeb
 
New Approaches at Natural Language Processing Systems
New Approaches at Natural Language Processing SystemsNew Approaches at Natural Language Processing Systems
New Approaches at Natural Language Processing Systems
Andrejkovics Zoltán
 
Thinkapjava
ThinkapjavaThinkapjava
Thinkapjava
sanjeetey
 
Object And Oriented Programing ( Oop ) Languages
Object And Oriented Programing ( Oop ) LanguagesObject And Oriented Programing ( Oop ) Languages
Object And Oriented Programing ( Oop ) Languages
Jessica Deakin
 
Textual Membership Queries
Textual Membership QueriesTextual Membership Queries
Textual Membership Queries
Jonathan Zarecki
 
NLP todo
NLP todoNLP todo
NLP todo
Rohit Verma
 
Discussion (Chapter 7) What are the common challenges with which .docx
Discussion (Chapter 7) What are the common challenges with which .docxDiscussion (Chapter 7) What are the common challenges with which .docx
Discussion (Chapter 7) What are the common challenges with which .docx
mecklenburgstrelitzh
 
Learning through answering
Learning through answeringLearning through answering
Learning through answering
Eran Zimbler
 
Digital literacy
Digital literacyDigital literacy
Digital literacy
Kenia Bustamante
 

Similar to Natural Language Processing with Neo4j (20)

A smarter way to learn python (en)
A smarter way to learn python (en)A smarter way to learn python (en)
A smarter way to learn python (en)
 
Natural Language Processing.pptx
Natural Language Processing.pptxNatural Language Processing.pptx
Natural Language Processing.pptx
 
Natural Language Processing.pptx
Natural Language Processing.pptxNatural Language Processing.pptx
Natural Language Processing.pptx
 
Compton-Week5-Keywords.pptx
Compton-Week5-Keywords.pptxCompton-Week5-Keywords.pptx
Compton-Week5-Keywords.pptx
 
Brainstorming
BrainstormingBrainstorming
Brainstorming
 
Strategies For Reading Comprehension
Strategies For Reading ComprehensionStrategies For Reading Comprehension
Strategies For Reading Comprehension
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Literature searching
Literature searchingLiterature searching
Literature searching
 
Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)
Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)
Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)
 
Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)
Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)
Ace Maths Solutions Unit Five Reading: Exercises on Teaching Data Handling (pdf)
 
Learning to Learn Nivel 6
Learning to Learn Nivel 6Learning to Learn Nivel 6
Learning to Learn Nivel 6
 
thinkapjava_1
thinkapjava_1thinkapjava_1
thinkapjava_1
 
New Approaches at Natural Language Processing Systems
New Approaches at Natural Language Processing SystemsNew Approaches at Natural Language Processing Systems
New Approaches at Natural Language Processing Systems
 
Thinkapjava
ThinkapjavaThinkapjava
Thinkapjava
 
Object And Oriented Programing ( Oop ) Languages
Object And Oriented Programing ( Oop ) LanguagesObject And Oriented Programing ( Oop ) Languages
Object And Oriented Programing ( Oop ) Languages
 
Textual Membership Queries
Textual Membership QueriesTextual Membership Queries
Textual Membership Queries
 
NLP todo
NLP todoNLP todo
NLP todo
 
Discussion (Chapter 7) What are the common challenges with which .docx
Discussion (Chapter 7) What are the common challenges with which .docxDiscussion (Chapter 7) What are the common challenges with which .docx
Discussion (Chapter 7) What are the common challenges with which .docx
 
Learning through answering
Learning through answeringLearning through answering
Learning through answering
 
Digital literacy
Digital literacyDigital literacy
Digital literacy
 

More from Kenny Bastani

In the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at MicroservicesIn the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at Microservices
Kenny Bastani
 
Building Cloud Native Architectures with Spring
Building Cloud Native Architectures with SpringBuilding Cloud Native Architectures with Spring
Building Cloud Native Architectures with Spring
Kenny Bastani
 
Extending the Platform with Spring Boot and Cloud Foundry
Extending the Platform with Spring Boot and Cloud FoundryExtending the Platform with Spring Boot and Cloud Foundry
Extending the Platform with Spring Boot and Cloud Foundry
Kenny Bastani
 
Back your app with MySQL and Redis on Cloud Foundry
Back your app with MySQL and Redis on Cloud FoundryBack your app with MySQL and Redis on Cloud Foundry
Back your app with MySQL and Redis on Cloud Foundry
Kenny Bastani
 
Using Docker, Neo4j, and Spring Cloud for Developing Microservices
Using Docker, Neo4j, and Spring Cloud for Developing MicroservicesUsing Docker, Neo4j, and Spring Cloud for Developing Microservices
Using Docker, Neo4j, and Spring Cloud for Developing Microservices
Kenny Bastani
 
Cloud Native Java Microservices
Cloud Native Java MicroservicesCloud Native Java Microservices
Cloud Native Java Microservices
Kenny Bastani
 
Building REST APIs with Spring Boot and Spring Cloud
Building REST APIs with Spring Boot and Spring CloudBuilding REST APIs with Spring Boot and Spring Cloud
Building REST APIs with Spring Boot and Spring Cloud
Kenny Bastani
 
Neo4j Graph Data Modeling
Neo4j Graph Data ModelingNeo4j Graph Data Modeling
Neo4j Graph Data Modeling
Kenny Bastani
 
Building Killer Apps with Neo4j 2.0
Building Killer Apps with Neo4j 2.0Building Killer Apps with Neo4j 2.0
Building Killer Apps with Neo4j 2.0
Kenny Bastani
 

More from Kenny Bastani (9)

In the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at MicroservicesIn the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at Microservices
 
Building Cloud Native Architectures with Spring
Building Cloud Native Architectures with SpringBuilding Cloud Native Architectures with Spring
Building Cloud Native Architectures with Spring
 
Extending the Platform with Spring Boot and Cloud Foundry
Extending the Platform with Spring Boot and Cloud FoundryExtending the Platform with Spring Boot and Cloud Foundry
Extending the Platform with Spring Boot and Cloud Foundry
 
Back your app with MySQL and Redis on Cloud Foundry
Back your app with MySQL and Redis on Cloud FoundryBack your app with MySQL and Redis on Cloud Foundry
Back your app with MySQL and Redis on Cloud Foundry
 
Using Docker, Neo4j, and Spring Cloud for Developing Microservices
Using Docker, Neo4j, and Spring Cloud for Developing MicroservicesUsing Docker, Neo4j, and Spring Cloud for Developing Microservices
Using Docker, Neo4j, and Spring Cloud for Developing Microservices
 
Cloud Native Java Microservices
Cloud Native Java MicroservicesCloud Native Java Microservices
Cloud Native Java Microservices
 
Building REST APIs with Spring Boot and Spring Cloud
Building REST APIs with Spring Boot and Spring CloudBuilding REST APIs with Spring Boot and Spring Cloud
Building REST APIs with Spring Boot and Spring Cloud
 
Neo4j Graph Data Modeling
Neo4j Graph Data ModelingNeo4j Graph Data Modeling
Neo4j Graph Data Modeling
 
Building Killer Apps with Neo4j 2.0
Building Killer Apps with Neo4j 2.0Building Killer Apps with Neo4j 2.0
Building Killer Apps with Neo4j 2.0
 

Recently uploaded

RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
LINUS PROJECTS (INDIA)
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
moinahousna
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Muhammad Ali
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
Steven Carlson
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
HackersList
 

Recently uploaded (20)

RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
CiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.pptCiscoIconsLibrary cours de réseau VLAN.ppt
CiscoIconsLibrary cours de réseau VLAN.ppt
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
 

Natural Language Processing with Neo4j

  • 1. Natural Language Processing with Neo4j Kenny Bastani @kennybastani
  • 2. This is a hobby of mine I’m passionate about it It’s always a work in progress I do it for fun
  • 3. Machine Learning Focuses • Text mining • Natural Language Processing • Automatic summarization • Graph databases • Commitment to unsupervised learning.
  • 4. Why NLP and Graphs?
  • 5. I wanted a better way to learn with less effort I wanted something a little more zippy. I’m mostly self-taught, so I wanted something that made self-learning easier for others.
  • 7. Importance of NLP • I’m inspired by the idea of machines learning from experience. • NLP is important for finding valuable information in noisy unstructured text. • I’m a Developer Evangelist for Neo4j, so I’m kind of a fan of graph databases.
  • 8. Algorithms can learn As long as it can store information and retrieve it in enough time for it to be of any use.
  • 9. Learning requires storage To learn, storage is required. For NLP, storage is sometimes a second class citizen. Much focus is on the algorithm first, then storage second. But really, it’s storage and retrieval of big data that is the problem.
  • 10. Machine learning Machine learning isn’t magic or hard to understand. It’s real stuff. We know how to do it. It’s easily articulated. ML algorithms solve big computational problems today. It’s based on the idea of machines learning from prior experiences as data.
  • 11. Formulate a Hypothesis When you analyze data, the outcome is usually a hypothesis. An hypothesis is a conclusion based on limited data. There are always more pieces needed to solve the puzzle.
  • 12. Build on Past Experience By experience, I mean DATA. Machine Learning techniques are entirely based on collection and analysis of recorded data. So storage is really important if you want to do machine learning successfully. You cannot play baseball without your brain. Don’t try it.
  • 13. The Problem with AI The problem with AI is that it seems like magic. Some people say strong AI is possible. There are some people that deny that it is possible. It is a central theme in many fictional fantasy films and book genres. It’s in Greek mythology.
  • 14. Is AI Misunderstood? Researchers admit to not fully understanding how intelligence works in the human brain. We generally understand how it works, but no consensus on how to recreate it in machines. AI is really just the act of perceiving an environment and maximizing chances of success.
  • 15. You get the point. • Now why is a Graph Database useful for unsupervised machine learning? • Let’s consider the problem I stated earlier. • I wanted to build a better way to summarize and learn from Wikipedia’s combined knowledge.
  • 17. How do you learn about learning? I started by observing myself learning from reading Wikipedia articles. I searched for an interesting term on Google. I read through the article’s text word by word.
  • 18. The Learning Algorithm As I read the article’s text, I would sometimes come across a phrase or term I had not seen before. Before continuing reading I would open up a new tab and search for the unrecognized phrase. It was a well defined recursive algorithm. I would drill down n-times on unrecognized article terms until returning to the original article text.
  • 19. A Self-Learning Algorithm In the computer’s world, this process would result in an ontology of labeled data. Which looks a lot like a graph. But how would I store the results? If only there were a database for that..
  • 20. Neo4j is a graph database …and graphs are everywhere!
  • 25. What about the NLP stuff? This is how I did it.
  • 26. The seed article You start with a seed article which is the first article text to start the learning algorithm with.
  • 27. Fetch text from Wikipedia Get the unstructured text and meta data from Wikipedia.
  • 28. Sliding text window I formulated dynamic RegEx templates and treated them as a hypothesis. The RegEx template would slide word by word through the text, searching for unrecognized phrases (n known word matches + 1 wildcard word match)
  • 29. Looking for redundant phrases As each unrecognized phrase is encountered, the dynamic RegEx is then matched against the entire article’s text. The algorithm looks for more than 2 identical phrases within the article’s text. It appends a 3rd wildcard word match to the template and then rescans the text for redundant phrases until none are found.
  • 30. Identify Redundancy of Text This recursive matching process within the local article’s text resulted in finding the duplicate phrases of a variable length. “The King of Sweden” has 2 appearances in an article, so that must be important to the topic of Sweden. Better go search for an article stub on “The King of Sweden”
  • 31. Graph Storage and Retrieval Every time a phrase that doesn’t exist as a node in Neo4j is encountered, it becomes a target of investigation, kind of like a hypothesis. Each sentence that contains the extracted phrase is also added to Neo4j as a content node. Relationships are added between nodes, showing semantic relationship.
  • 32. Phrase inheritance Phrases can be found within other phrases, denoting a grammatical inheritance hierarchy mapped to a variety of content nodes and articles.
  • 33. Phrase Inheritance Graph Data Model Article Contains Sentence “X MEN.” Found in Fo un d in Phrase “X Y” Found in Found in Phrase “X” Sentence “X Y Z.” Found in Fo u nd Phrase “X Y Z” in Contains Article
  • 36. Thanks for coming to my talk! Please look me up on Twitter and LinkedIn! Twitter: http://www.twitter.com/kennybastani LinkedIn: http://www.linkedin.com/in/kennybastani

Editor's Notes

  1. Introduction My name is.., I work for.., My job is.. Today I want to talk to you about NLP with Neo4j I’m from California, I live in the SF Bay Area.
  2. These are the core ideas behind my research on NLP
  3. My story about making a better search engine on top of Wikipedia. The problem was understanding unstructured text. I wanted to solve that problem. Wikipedia has so much valuable knowledge. Analyzing it on your own document by document would take a life time.
  4. This process yields this basic graph structure.
  5. Why I am here I am infatuated with the idea of machine learning
  6. Anything can learn. Anything can learn that can store information. To back reference. To assimilate knowledge about past experience.
  7. The store part of learning is crucial
  8. Machine learning is real. It isn’t magic. It is profoundly real, interesting, and simple. It is simple to articulate. It is the ability of machines to learn from prior experiences.
  9. Machine learning algorithms make a hypothesis based on studying data and predicting something meaningful.
  10. When I say experience. I mean DATA. Machine learning is based on collecting DATA.
  11. Problem with AI is that it has a lot in common with magic. A lot of people say it exists, and a lot of people say it doesn’t. There are groups, cults, movies, books, and endless fantasy stories that are based around AI. It’s a central theme in some ancient greek mythology. It’s a wrapper term for loads of stuff.
  12. Because we don’t really understand how intelligence works at the human level. Or at least there is no easy way to describe it. Generally it is the act of perceiving an environment and then acting to maximize chances of success.
  13. So I wanted to build a better search engine for Wikipedia. So naturally I started by using Wikipedia to learn more about NLP, machine learning.
  14. This process yields this basic graph structure.
  15. I recorded my process. I observed myself. I would search for a term. I would read through the text and when I came to a term I didn’t recognize, I would open up a new tab from the hyperlink of the term and then repeat the process until I made my way back up to the original topic I searched for.
  16. I recorded my process. I observed myself. I would search for a term. I would read through the text and when I came to a term I didn’t recognize, I would open up a new tab from the hyperlink of the term and then repeat the process until I made my way back up to the original topic I searched for.
  17. So I put together a diagram of my learning process as a recursive algorithm. Through that process I built a prototype. But it had no database!
  18. The result of the algorithm was a graph. I needed to store that data as a graph. Naturally I found my way to Neo4j, which is a graph database.
  19. Simple graph data model. Many different articles, Contain many different phrases,Extracted from many sentences, Which were extracted from the article
  20. Visualizing the result in Gephi
  21. Here is what the database looked like at 200k nodes and 1 million relationships when visualized in Gephi
  22. Now with Cypher (Neo4j’s query language) I could traverse these nodes to do automatic summarization of Wikipedia text.
  23. How the algorithm works
  24. You start with a seed article’s name. Which sits in a queue waiting to be processed by one of the application’s worker roles. (Using Windows Azure Service Bus)
  25. The article’s text and meta data are fetched from Wikipedia’s open search API.
  26. The text is then analyzed using a sliding window of RegEx. Each word has a look behind and a look ahead.
  27. As each word is read, the bi-gram (2 word phrase) is matched on the entire text, looking ahead or behind of the current position.
  28. If there is more than one match within the text being analyzed, then the multiple bi-grams turn into tri-grams by looking ahead one word for each match.
  29. This process is repeated until the text returns no duplicate n-grams. At this point, any n-gram that has more than one match within the text of the article is stored in the Neo4j database as a phrase that is contained within the article’s node. Each sentence that contained at least one of the n-grams is also added to the database, with relationships pointing to both the article node and the phrase node that is contained within it.
  30. Further more, each phrase node can have an ancestry. Because each phrase can be a derivative of some other phrase.