SlideShare a Scribd company logo
Document Classification with Neo4j 
(graphs)-[:are]->(everywhere) 
© All Rights Reserved 2014 | Neo Technology, Inc. 
@kennybastani 
Neo4j Developer Evangelist
© All Rights Reserved 2014 | Neo Technology, Inc. 
Agenda 
• Introduction to Neo4j 
• Introduction to Graph-based Document Classification 
• Graph-based Hierarchical Pattern Recognition 
• Generating a Vector Space Model for Recommendations 
• Graphify for Neo4j 
• U.S. Presidential Speech Transcript Analysis 
2
Introduction to Neo4j 
© All Rights Reserved 2014 | Neo Technology, Inc. 
3
The Property Graph Data Model 
© All Rights Reserved 2014 | Neo Technology, Inc. 
4
© All Rights Reserved 2014 | Neo Technology, Inc. 
John 
Sally 
Graph Databases 
Book 
5
© All Rights Reserved 2014 | Neo Technology, Inc. 
name: John 
age: 27 
name: Sally 
age: 32 
FRIEND_OF 
since: 01/09/2013 
title: Graph Databases 
authors: Ian Robinson, 
Jim Webber 
HAS_READ 
on: 2/03/2013 
rating: 5 
HAS_READ 
on: 02/09/2013 
rating: 4 
FRIEND_OF 
since: 01/09/2013 
6
The Relational Table Model 
© All Rights Reserved 2014 | Neo Technology, Inc. 
7
Customers Customer_Accounts Accounts 
© All Rights Reserved 2014 | Neo Technology, Inc. 
8
The Neo4j Browser 
© All Rights Reserved 2014 | Neo Technology, Inc. 
9
Neo4j Browser - finding help 
© All Rights Reserved 2014 | Neo Technology, Inc. 
http://localhost:7474/ 
10
Execute Cypher, Visualize 
© All Rights Reserved 2014 | Neo Technology, Inc. 
11
Introduction to Document Classification 
© All Rights Reserved 2014 | Neo Technology, Inc. 
12
© All Rights Reserved 2014 | Neo Technology, Inc. 
Document Classification 
Automatically assign a document to one or more classes 
Documents may be classified according to their subjects or 
according to other attributes 
Automatically classify unlabeled documents to a set of relevant 
classes using labeled training data 
13
Example Use Cases for Document 
© All Rights Reserved 2014 | Neo Technology, Inc. 
Classification 
14
Sentiment Analysis for Movie Reviews 
Scenario: A movie website allows users to submit reviews describing what they 
either liked or disliked about a particular movie. 
© All Rights Reserved 2014 | Neo Technology, Inc. 
Problem: The user reviews are unstructured text. 
How do I automatically generate a score indicating whether the review was 
positive or negative? 
Solution: Train a natural language parsing model on a dataset that has been 
labeled in previous reviews as either positive or negative. 
15
Recommend Relevant Tags 
Scenario: A Q/A website allows users to submit questions and receive answers 
from other users. 
Problem: Users sometime do not know what tags to apply to their questions in 
order to increase discoverability for receiving answers. 
Solution: Automatically recommend the most relevant tags for questions by 
classifying the text from training on previous questions. 
© All Rights Reserved 2014 | Neo Technology, Inc. 
16
Recommend Similar Articles 
Scenario: A news website provides hundreds of new articles a day to users on a 
broad range of topics. 
Problem: The site needs to increase user engagement and time spent on the site. 
Solution: Train natural language parsing models for daily articles in order to 
provide recommendations for highly relevant articles at the bottom of each page. 
© All Rights Reserved 2014 | Neo Technology, Inc. 
17
How Automated Document Classification Works 
© All Rights Reserved 2014 | Neo Technology, Inc. 
18
Label 
© All Rights Reserved 2014 | Neo Technology, Inc. 
X Y 
Document 
Document 
Document 
Document 
Label Label 
Assign a set of labels that describes the 
document’s text 
Supervised Learning 
Step 1: Create a Training Dataset 
Z 
19
Step 2: Train a Natural Language Parsing Model 
p 
X Y 
= State Machine 
© All Rights Reserved 2014 | Neo Technology, Inc. 
Deep feature representations are selected and 
learned using an evolutionary algorithm 
State machines represent predicates that evaluate to 
0 or 1 for a text match 
State machines map to classes of document labels 
that matched text during training 
Deep Learning 
p p 
p p p 
Class 
Class 
Z 
Class 
20
cos(θ) 
© All Rights Reserved 2014 | Neo Technology, Inc. 
Unlabeled Document 
The natural language parsing model is 
used to classify other unlabeled 
documents 
X 
Class 
Y 
Class 
Z 
Class 
0.99 
0.67 
0.01 
cos(θ) 
cos(θ) 
Step 3: Classify Unlabeled Documents 
21
Hierarchical Pattern Recognition 
© All Rights Reserved 2014 | Neo Technology, Inc. 
(HPR) 
22
What is Hierarchical Pattern Recognition (HPR)? 
HPR is a graph-based deep learning algorithm I 
created that learns deep feature representations in 
linear time — 
I created the algorithm to do graph-based traversals 
using a hierarchy of finite state machines (FSM). 
Designed for scalable performance in P time: 
© All Rights Reserved 2014 | Neo Technology, Inc. 
23
Influences & Inspirations 
+ = 
p 
p p 
p p p 
X Y Z 
© All Rights Reserved 2014 | Neo Technology, Inc. 
24 
Ray Kurzweil 
(Pattern Recognition Theory of Mind) 
Jeff Hawkins 
(Hierarchical Temporal Memory) 
Hierarchical Pattern Recognition
How does feature extraction work? 
p 
© All Rights Reserved 2014 | Neo Technology, Inc. 
25 
Hierarchical Pattern Recognition 
“Deep” feature representations are learned and associated 
with labels that are mapped to documents that the feature 
was discovered in. 
The feature hierarchy is translated into a Vector Space Model 
for classification on feature vectors generated from unlabeled 
text. 
p p 
p p p 
X Y Z 
HPR uses a probabilistic model in combination with an 
evolutionary algorithm to generate hierarchies of deep feature 
representations.
Graph-based feature learning 
© All Rights Reserved 2014 | Neo Technology, Inc. 
26
Learning new features from 
matches on training data 
© All Rights Reserved 2014 | Neo Technology, Inc. 
27
Cost Function for the Generations of Features 
Reproduction occurs after a threshold of matches has been 
exceeded for a feature. 
After replication the cost function is applied to increase that 
threshold every time the feature reproduces. 
is the current threshold on the feature node. 
is the minimum threshold, which I chose as 5 for new features. 
© All Rights Reserved 2014 | Neo Technology, Inc. 
Cost function: 
28
© All Rights 29 Reserved 2014 | Neo Technology, Inc.
Vector Space Model 
© All Rights Reserved 2014 | Neo Technology, Inc. 
30
Generating Feature Vectors 
The natural language parsing model created during training can be 
turned into a global feature index. 
This global feature index is a list of Neo4j internal IDs for every feature 
in the hierarchy. 
Using that global feature index, a multi-dimensional vector space is 
created with a length equal to the number of features in the hierarchy. 
© All Rights Reserved 2014 | Neo Technology, Inc. 
31
Relevance Rankings 
“Relevance rankings of documents in a keyword search can be 
calculated, using the assumptions of document similarities theory, by 
comparing the deviation of angles between each document vector and 
the original query vector where the query is represented as the same 
kind of vector as the documents.” - Wikipedia 
© All Rights Reserved 2014 | Neo Technology, Inc. 
32
Vector-based Cosine Similarity Measure 
In practice, it is easier to calculate the cosine of the angle between the 
vectors, instead of the angle itself: 
© All Rights Reserved 2014 | Neo Technology, Inc. 
33
Cosine Similarity & Vector Space Model 
© All Rights Reserved 2014 | Neo Technology, Inc. 
34
Vector-based Cosine Similarity Measure 
“The resulting similarity ranges from -1 meaning exactly opposite, to 1 
meaning exactly the same, with 0 usually indicating independence, 
and in-between values indicating intermediate similarity or 
dissimilarity.” 
© All Rights Reserved 2014 | Neo Technology, Inc. 
via Wikipedia 
35
Graphify for Neo4j 
© All Rights Reserved 2014 | Neo Technology, Inc. 
36
Graphify for Neo4j 
Graphify is a Neo4j unmanaged extension used for 
document and text classification using graph-based 
hierarchical pattern recognition. 
© All Rights Reserved 2014 | Neo Technology, Inc. 
https://github.com/kbastani/graphify 
37
Example Project 
Head over to the GitHub project page and clone it to your 
local machine. 
Follow the directions listed in the README.md to install the 
extension. 
Navigate to the /examples directory of the project. 
© All Rights Reserved 2014 | Neo Technology, Inc. 
Run: 
examples/graphify-examples-author/src/java/org/neo4j/nlp/examples/author/main.java 
38
U.S. Presidential Speech 
Transcript Analysis 
© All Rights Reserved 2014 | Neo Technology, Inc. 
39
Identify the Political Affiliation of a Presidential Speech 
This example ingests a set of texts from presidential speeches with 
labels from the author of that speech in training phase. After building 
the training models, unlabeled presidential speeches are classified in 
the test phase. 
© All Rights Reserved 2014 | Neo Technology, Inc. 
40
The Presidents 
© All Rights Reserved 2014 | Neo Technology, Inc. 
• Ronald Reagan 
• labels: liberal, republican, ronald-reagan 
• George H.W. Bush 
• labels: conservative, republican, bush41 
• Bill Clinton 
• labels: liberal, democrat, bill-clinton 
• George W. Bush 
• labels: conservative, republican, bush43 
• Barack Obama 
• labels: liberal, democrat, barack-obama 
41
© All Rights Reserved 2014 | Neo Technology, Inc. 
Training 
Each of the presidents in the example have 6 speeches to analyze. 
4 of the speeches are used to build a natural language parsing model. 
2 of the speeches are used to test the validity of that model. 
42
Get Similar Labels/Classes 
© All Rights Reserved 2014 | Neo Technology, Inc. 
43
Ronald Reagan 
republican 0.7182046285385341 
liberal 0.644281223102398 
democrat 0.4854114595950056 
conservative 0.4133639188595147 
bill-clinton 0.4057969121945167 
barack-obama 0.323947855372623 
bush41 0.3222644898334092 
bush43 0.3161309849153592 
© All Rights Reserved 2014 | Neo Technology, Inc. 
Class Similarity 
44
George H.W. Bush 
conservative 0.7032274806766954 
republican 0.6047256274615608 
liberal 0.4439742461594541 
democrat 0.39114918238853674 
bill-clinton 0.3234223107986785 
ronald-reagan 0.3222644898334092 
barack-obama 0.2929260544514002 
bush43 0.29106733975087984 
© All Rights Reserved 2014 | Neo Technology, Inc. 
Class Similarity 
45
democrat 0.8375678825642422 
liberal 0.7847858060182163 
republican 0.5561860529059708 
conservative 0.45365774896422445 
barack-obama 0.4507676679770066 
ronald-reagan 0.4057969121945167 
bush43 0.365042482383354 
bush41 0.3234223107986785 
© All Rights Reserved 2014 | Neo Technology, Inc. 
Bill Clinton 
Class Similarity 
46
George W. Bush 
conservative 0.820636570272315 
republican 0.7056890956512284 
liberal 0.5075788396061254 
democrat 0.4505424322086937 
bill-clinton 0.365042482383354 
barack-obama 0.33801949243378965 
ronald-reagan 0.3161309849153592 
bush41 0.29106733975087984 
© All Rights Reserved 2014 | Neo Technology, Inc. 
Class Similarity 
47
Barack Obama 
democrat 0.7668017370739147 
liberal 0.7184792203867296 
republican 0.4847680475425114 
bill-clinton 0.4507676679770066 
conservative 0.4149264161292232 
bush43 0.33801949243378965 
ronald-reagan 0.323947855372623 
bush41 0.2929260544514002 
© All Rights Reserved 2014 | Neo Technology, Inc. 
Class Similarity 
48
Get involved in the Neo4j community 
© All Rights Reserved 2014 | Neo Technology, Inc. 
49
http://stackoverflow.com/questions/tagged/neo4j 
© All Rights Reserved 2014 | Neo Technology, Inc. 
50
http://groups.google.com/group/neo4j 
© All Rights Reserved 2014 | Neo Technology, Inc. 
51
https://github.com/neo4j/neo4j/issues 
© All Rights Reserved 2014 | Neo Technology, Inc. 
52
http://neo4j.meetup.com/ 
© All Rights Reserved 2014 | Neo Technology, Inc. 
53
© All Rights Reserved 2014 | Neo Technology, Inc. 
(Thank You) 
54
Twitter www.twitter.com/kennybastani 
LinkedIn www.linkedin.com/in/kennybastani 
GitHub www.github.com/kbastani 
© All Rights Reserved 2014 | Neo Technology, Inc. 
Get in touch 
55

More Related Content

What's hot

Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlin
Simon Jupp
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Madhav Mishra
 
Introduction to Policy Lab (Jan 2021)
Introduction to Policy Lab (Jan 2021)Introduction to Policy Lab (Jan 2021)
Introduction to Policy Lab (Jan 2021)
Policy Lab
 
How to choose publishable research topic
How to choose publishable research topicHow to choose publishable research topic
How to choose publishable research topic
Hasanain Ghazi
 
Pinecone Vector Database.pdf
Pinecone Vector Database.pdfPinecone Vector Database.pdf
Pinecone Vector Database.pdf
Aniruddha Chakrabarti
 
Supply Chain Twin Demo - Companion Deck
Supply Chain Twin Demo - Companion DeckSupply Chain Twin Demo - Companion Deck
Supply Chain Twin Demo - Companion Deck
Neo4j
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & TomorrowAmsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
Search: Probabilistic Information Retrieval
Search: Probabilistic Information RetrievalSearch: Probabilistic Information Retrieval
Search: Probabilistic Information Retrieval
Vipul Munot
 
Introduction to Neo4j and .Net
Introduction to Neo4j and .NetIntroduction to Neo4j and .Net
Introduction to Neo4j and .Net
Neo4j
 

What's hot (9)

Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlin
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Introduction to Policy Lab (Jan 2021)
Introduction to Policy Lab (Jan 2021)Introduction to Policy Lab (Jan 2021)
Introduction to Policy Lab (Jan 2021)
 
How to choose publishable research topic
How to choose publishable research topicHow to choose publishable research topic
How to choose publishable research topic
 
Pinecone Vector Database.pdf
Pinecone Vector Database.pdfPinecone Vector Database.pdf
Pinecone Vector Database.pdf
 
Supply Chain Twin Demo - Companion Deck
Supply Chain Twin Demo - Companion DeckSupply Chain Twin Demo - Companion Deck
Supply Chain Twin Demo - Companion Deck
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & TomorrowAmsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
 
Search: Probabilistic Information Retrieval
Search: Probabilistic Information RetrievalSearch: Probabilistic Information Retrieval
Search: Probabilistic Information Retrieval
 
Introduction to Neo4j and .Net
Introduction to Neo4j and .NetIntroduction to Neo4j and .Net
Introduction to Neo4j and .Net
 

Viewers also liked

Natural language search using Neo4j
Natural language search using Neo4jNatural language search using Neo4j
Natural language search using Neo4j
Kenny Bastani
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
William Lyon
 
Natural Language Processing with Neo4j
Natural Language Processing with Neo4jNatural Language Processing with Neo4j
Natural Language Processing with Neo4j
Kenny Bastani
 
Building a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformBuilding a Graph-based Analytics Platform
Building a Graph-based Analytics Platform
Kenny Bastani
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
Neo4j
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
Max De Marzi
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
Max De Marzi
 
Neo4J Open Source Graph Database
Neo4J Open Source Graph DatabaseNeo4J Open Source Graph Database
Neo4J Open Source Graph Database
Mark Maslyn
 
20141216 graph database prototyping ams meetup
20141216 graph database prototyping ams meetup20141216 graph database prototyping ams meetup
20141216 graph database prototyping ams meetup
Rik Van Bruggen
 
Dnc Day 4 – Obama Speech
Dnc Day 4 – Obama SpeechDnc Day 4 – Obama Speech
Dnc Day 4 – Obama Speechmkursh
 
The impact of language planning, terminology planning, and arabicization, on ...
The impact of language planning, terminology planning, and arabicization, on ...The impact of language planning, terminology planning, and arabicization, on ...
The impact of language planning, terminology planning, and arabicization, on ...
Alexander Decker
 
Meryl streep took a stand against donald trump
Meryl streep took a stand against donald trumpMeryl streep took a stand against donald trump
Meryl streep took a stand against donald trump
Susana Gallardo
 
AP Invoice Processing for JD Edwards_Bottomline Technologies
AP Invoice Processing for JD Edwards_Bottomline TechnologiesAP Invoice Processing for JD Edwards_Bottomline Technologies
AP Invoice Processing for JD Edwards_Bottomline Technologies
Bottomline Technologies
 
Document Classification In PHP
Document Classification In PHPDocument Classification In PHP
Document Classification In PHP
Ian Barber
 
The war on terrorism
The war on terrorismThe war on terrorism
The war on terrorism
alcatdubois
 
M893 & m894 seahawks contest
M893 & m894 seahawks contestM893 & m894 seahawks contest
M893 & m894 seahawks contestdthielen1
 
Adivina de _quienes_son_las_siguientes_cansiones[1]
Adivina de _quienes_son_las_siguientes_cansiones[1]Adivina de _quienes_son_las_siguientes_cansiones[1]
Adivina de _quienes_son_las_siguientes_cansiones[1]
turnedspon8520
 

Viewers also liked (20)

Natural language search using Neo4j
Natural language search using Neo4jNatural language search using Neo4j
Natural language search using Neo4j
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 
Natural Language Processing with Neo4j
Natural Language Processing with Neo4jNatural Language Processing with Neo4j
Natural Language Processing with Neo4j
 
Building a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformBuilding a Graph-based Analytics Platform
Building a Graph-based Analytics Platform
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache Spark
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Neo4J Open Source Graph Database
Neo4J Open Source Graph DatabaseNeo4J Open Source Graph Database
Neo4J Open Source Graph Database
 
20141216 graph database prototyping ams meetup
20141216 graph database prototyping ams meetup20141216 graph database prototyping ams meetup
20141216 graph database prototyping ams meetup
 
Dnc Day 4 – Obama Speech
Dnc Day 4 – Obama SpeechDnc Day 4 – Obama Speech
Dnc Day 4 – Obama Speech
 
The impact of language planning, terminology planning, and arabicization, on ...
The impact of language planning, terminology planning, and arabicization, on ...The impact of language planning, terminology planning, and arabicization, on ...
The impact of language planning, terminology planning, and arabicization, on ...
 
Meryl streep took a stand against donald trump
Meryl streep took a stand against donald trumpMeryl streep took a stand against donald trump
Meryl streep took a stand against donald trump
 
AP Invoice Processing for JD Edwards_Bottomline Technologies
AP Invoice Processing for JD Edwards_Bottomline TechnologiesAP Invoice Processing for JD Edwards_Bottomline Technologies
AP Invoice Processing for JD Edwards_Bottomline Technologies
 
Document Classification In PHP
Document Classification In PHPDocument Classification In PHP
Document Classification In PHP
 
The war on terrorism
The war on terrorismThe war on terrorism
The war on terrorism
 
M893 & m894 seahawks contest
M893 & m894 seahawks contestM893 & m894 seahawks contest
M893 & m894 seahawks contest
 
Visual Resume
Visual ResumeVisual Resume
Visual Resume
 
Adivina de _quienes_son_las_siguientes_cansiones[1]
Adivina de _quienes_son_las_siguientes_cansiones[1]Adivina de _quienes_son_las_siguientes_cansiones[1]
Adivina de _quienes_son_las_siguientes_cansiones[1]
 

Similar to Document Classification with Neo4j

History Of C Essay
History Of C EssayHistory Of C Essay
History Of C Essay
Melissa Williams
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
Tao Xie
 
xAPI: The Landscape
xAPI: The LandscapexAPI: The Landscape
xAPI: The Landscape
Megan Bowe
 
Software system design sample
Software system design sampleSoftware system design sample
Software system design sample
Norman K Ma
 
Data science workshop
Data science workshopData science workshop
Data science workshop
Hortonworks
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationAn Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
Enrico Palumbo
 
C# programming : Chapter One
C# programming : Chapter OneC# programming : Chapter One
C# programming : Chapter One
Khairi Aiman
 
See to believe: capturing insights using contextual inquiry
See to believe: capturing insights using contextual inquirySee to believe: capturing insights using contextual inquiry
See to believe: capturing insights using contextual inquiry
Deirdre Costello
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
Paris Open Source Summit
 
Sudipta mukherjee 2016_2017
Sudipta mukherjee 2016_2017Sudipta mukherjee 2016_2017
Sudipta mukherjee 2016_2017
Sudipta Mukherjee
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
Maruti Gollapudi
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
Tao Xie
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
Tao Xie
 
Software craftsmanship - Imperative or Hype
Software craftsmanship - Imperative or HypeSoftware craftsmanship - Imperative or Hype
Software craftsmanship - Imperative or Hype
SUGSA
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
Neo4j
 
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
NUS Institute of Applied Learning Sciences and Educational Technology
 
Sudipta_Mukherjee_Resume-Nov_2022.pdf
Sudipta_Mukherjee_Resume-Nov_2022.pdfSudipta_Mukherjee_Resume-Nov_2022.pdf
Sudipta_Mukherjee_Resume-Nov_2022.pdf
Sudipta Mukherjee
 
Final presentation
Final presentationFinal presentation
Final presentation
Nitish Upreti
 
Building Large Sustainable Apps
Building Large Sustainable AppsBuilding Large Sustainable Apps
Building Large Sustainable Apps
Buğra Oral
 

Similar to Document Classification with Neo4j (20)

History Of C Essay
History Of C EssayHistory Of C Essay
History Of C Essay
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
 
xAPI: The Landscape
xAPI: The LandscapexAPI: The Landscape
xAPI: The Landscape
 
Software system design sample
Software system design sampleSoftware system design sample
Software system design sample
 
Data science workshop
Data science workshopData science workshop
Data science workshop
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationAn Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
 
C# programming : Chapter One
C# programming : Chapter OneC# programming : Chapter One
C# programming : Chapter One
 
See to believe: capturing insights using contextual inquiry
See to believe: capturing insights using contextual inquirySee to believe: capturing insights using contextual inquiry
See to believe: capturing insights using contextual inquiry
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
Sudipta mukherjee 2016_2017
Sudipta mukherjee 2016_2017Sudipta mukherjee 2016_2017
Sudipta mukherjee 2016_2017
 
Sudipta_Mukherjee_2016_2017
Sudipta_Mukherjee_2016_2017Sudipta_Mukherjee_2016_2017
Sudipta_Mukherjee_2016_2017
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
 
Software craftsmanship - Imperative or Hype
Software craftsmanship - Imperative or HypeSoftware craftsmanship - Imperative or Hype
Software craftsmanship - Imperative or Hype
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
Xiangen Hu - WESST - AutoTutor, an implementation of Conversation-Based Intel...
 
Sudipta_Mukherjee_Resume-Nov_2022.pdf
Sudipta_Mukherjee_Resume-Nov_2022.pdfSudipta_Mukherjee_Resume-Nov_2022.pdf
Sudipta_Mukherjee_Resume-Nov_2022.pdf
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Building Large Sustainable Apps
Building Large Sustainable AppsBuilding Large Sustainable Apps
Building Large Sustainable Apps
 

More from Kenny Bastani

In the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at MicroservicesIn the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at Microservices
Kenny Bastani
 
Building Cloud Native Architectures with Spring
Building Cloud Native Architectures with SpringBuilding Cloud Native Architectures with Spring
Building Cloud Native Architectures with Spring
Kenny Bastani
 
Extending the Platform with Spring Boot and Cloud Foundry
Extending the Platform with Spring Boot and Cloud FoundryExtending the Platform with Spring Boot and Cloud Foundry
Extending the Platform with Spring Boot and Cloud Foundry
Kenny Bastani
 
Back your app with MySQL and Redis on Cloud Foundry
Back your app with MySQL and Redis on Cloud FoundryBack your app with MySQL and Redis on Cloud Foundry
Back your app with MySQL and Redis on Cloud Foundry
Kenny Bastani
 
Using Docker, Neo4j, and Spring Cloud for Developing Microservices
Using Docker, Neo4j, and Spring Cloud for Developing MicroservicesUsing Docker, Neo4j, and Spring Cloud for Developing Microservices
Using Docker, Neo4j, and Spring Cloud for Developing Microservices
Kenny Bastani
 
Cloud Native Java Microservices
Cloud Native Java MicroservicesCloud Native Java Microservices
Cloud Native Java Microservices
Kenny Bastani
 
Building REST APIs with Spring Boot and Spring Cloud
Building REST APIs with Spring Boot and Spring CloudBuilding REST APIs with Spring Boot and Spring Cloud
Building REST APIs with Spring Boot and Spring Cloud
Kenny Bastani
 
Neo4j Graph Data Modeling
Neo4j Graph Data ModelingNeo4j Graph Data Modeling
Neo4j Graph Data Modeling
Kenny Bastani
 
Building Killer Apps with Neo4j 2.0
Building Killer Apps with Neo4j 2.0Building Killer Apps with Neo4j 2.0
Building Killer Apps with Neo4j 2.0
Kenny Bastani
 

More from Kenny Bastani (9)

In the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at MicroservicesIn the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at Microservices
 
Building Cloud Native Architectures with Spring
Building Cloud Native Architectures with SpringBuilding Cloud Native Architectures with Spring
Building Cloud Native Architectures with Spring
 
Extending the Platform with Spring Boot and Cloud Foundry
Extending the Platform with Spring Boot and Cloud FoundryExtending the Platform with Spring Boot and Cloud Foundry
Extending the Platform with Spring Boot and Cloud Foundry
 
Back your app with MySQL and Redis on Cloud Foundry
Back your app with MySQL and Redis on Cloud FoundryBack your app with MySQL and Redis on Cloud Foundry
Back your app with MySQL and Redis on Cloud Foundry
 
Using Docker, Neo4j, and Spring Cloud for Developing Microservices
Using Docker, Neo4j, and Spring Cloud for Developing MicroservicesUsing Docker, Neo4j, and Spring Cloud for Developing Microservices
Using Docker, Neo4j, and Spring Cloud for Developing Microservices
 
Cloud Native Java Microservices
Cloud Native Java MicroservicesCloud Native Java Microservices
Cloud Native Java Microservices
 
Building REST APIs with Spring Boot and Spring Cloud
Building REST APIs with Spring Boot and Spring CloudBuilding REST APIs with Spring Boot and Spring Cloud
Building REST APIs with Spring Boot and Spring Cloud
 
Neo4j Graph Data Modeling
Neo4j Graph Data ModelingNeo4j Graph Data Modeling
Neo4j Graph Data Modeling
 
Building Killer Apps with Neo4j 2.0
Building Killer Apps with Neo4j 2.0Building Killer Apps with Neo4j 2.0
Building Killer Apps with Neo4j 2.0
 

Recently uploaded

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

Document Classification with Neo4j

  • 1. Document Classification with Neo4j (graphs)-[:are]->(everywhere) © All Rights Reserved 2014 | Neo Technology, Inc. @kennybastani Neo4j Developer Evangelist
  • 2. © All Rights Reserved 2014 | Neo Technology, Inc. Agenda • Introduction to Neo4j • Introduction to Graph-based Document Classification • Graph-based Hierarchical Pattern Recognition • Generating a Vector Space Model for Recommendations • Graphify for Neo4j • U.S. Presidential Speech Transcript Analysis 2
  • 3. Introduction to Neo4j © All Rights Reserved 2014 | Neo Technology, Inc. 3
  • 4. The Property Graph Data Model © All Rights Reserved 2014 | Neo Technology, Inc. 4
  • 5. © All Rights Reserved 2014 | Neo Technology, Inc. John Sally Graph Databases Book 5
  • 6. © All Rights Reserved 2014 | Neo Technology, Inc. name: John age: 27 name: Sally age: 32 FRIEND_OF since: 01/09/2013 title: Graph Databases authors: Ian Robinson, Jim Webber HAS_READ on: 2/03/2013 rating: 5 HAS_READ on: 02/09/2013 rating: 4 FRIEND_OF since: 01/09/2013 6
  • 7. The Relational Table Model © All Rights Reserved 2014 | Neo Technology, Inc. 7
  • 8. Customers Customer_Accounts Accounts © All Rights Reserved 2014 | Neo Technology, Inc. 8
  • 9. The Neo4j Browser © All Rights Reserved 2014 | Neo Technology, Inc. 9
  • 10. Neo4j Browser - finding help © All Rights Reserved 2014 | Neo Technology, Inc. http://localhost:7474/ 10
  • 11. Execute Cypher, Visualize © All Rights Reserved 2014 | Neo Technology, Inc. 11
  • 12. Introduction to Document Classification © All Rights Reserved 2014 | Neo Technology, Inc. 12
  • 13. © All Rights Reserved 2014 | Neo Technology, Inc. Document Classification Automatically assign a document to one or more classes Documents may be classified according to their subjects or according to other attributes Automatically classify unlabeled documents to a set of relevant classes using labeled training data 13
  • 14. Example Use Cases for Document © All Rights Reserved 2014 | Neo Technology, Inc. Classification 14
  • 15. Sentiment Analysis for Movie Reviews Scenario: A movie website allows users to submit reviews describing what they either liked or disliked about a particular movie. © All Rights Reserved 2014 | Neo Technology, Inc. Problem: The user reviews are unstructured text. How do I automatically generate a score indicating whether the review was positive or negative? Solution: Train a natural language parsing model on a dataset that has been labeled in previous reviews as either positive or negative. 15
  • 16. Recommend Relevant Tags Scenario: A Q/A website allows users to submit questions and receive answers from other users. Problem: Users sometime do not know what tags to apply to their questions in order to increase discoverability for receiving answers. Solution: Automatically recommend the most relevant tags for questions by classifying the text from training on previous questions. © All Rights Reserved 2014 | Neo Technology, Inc. 16
  • 17. Recommend Similar Articles Scenario: A news website provides hundreds of new articles a day to users on a broad range of topics. Problem: The site needs to increase user engagement and time spent on the site. Solution: Train natural language parsing models for daily articles in order to provide recommendations for highly relevant articles at the bottom of each page. © All Rights Reserved 2014 | Neo Technology, Inc. 17
  • 18. How Automated Document Classification Works © All Rights Reserved 2014 | Neo Technology, Inc. 18
  • 19. Label © All Rights Reserved 2014 | Neo Technology, Inc. X Y Document Document Document Document Label Label Assign a set of labels that describes the document’s text Supervised Learning Step 1: Create a Training Dataset Z 19
  • 20. Step 2: Train a Natural Language Parsing Model p X Y = State Machine © All Rights Reserved 2014 | Neo Technology, Inc. Deep feature representations are selected and learned using an evolutionary algorithm State machines represent predicates that evaluate to 0 or 1 for a text match State machines map to classes of document labels that matched text during training Deep Learning p p p p p Class Class Z Class 20
  • 21. cos(θ) © All Rights Reserved 2014 | Neo Technology, Inc. Unlabeled Document The natural language parsing model is used to classify other unlabeled documents X Class Y Class Z Class 0.99 0.67 0.01 cos(θ) cos(θ) Step 3: Classify Unlabeled Documents 21
  • 22. Hierarchical Pattern Recognition © All Rights Reserved 2014 | Neo Technology, Inc. (HPR) 22
  • 23. What is Hierarchical Pattern Recognition (HPR)? HPR is a graph-based deep learning algorithm I created that learns deep feature representations in linear time — I created the algorithm to do graph-based traversals using a hierarchy of finite state machines (FSM). Designed for scalable performance in P time: © All Rights Reserved 2014 | Neo Technology, Inc. 23
  • 24. Influences & Inspirations + = p p p p p p X Y Z © All Rights Reserved 2014 | Neo Technology, Inc. 24 Ray Kurzweil (Pattern Recognition Theory of Mind) Jeff Hawkins (Hierarchical Temporal Memory) Hierarchical Pattern Recognition
  • 25. How does feature extraction work? p © All Rights Reserved 2014 | Neo Technology, Inc. 25 Hierarchical Pattern Recognition “Deep” feature representations are learned and associated with labels that are mapped to documents that the feature was discovered in. The feature hierarchy is translated into a Vector Space Model for classification on feature vectors generated from unlabeled text. p p p p p X Y Z HPR uses a probabilistic model in combination with an evolutionary algorithm to generate hierarchies of deep feature representations.
  • 26. Graph-based feature learning © All Rights Reserved 2014 | Neo Technology, Inc. 26
  • 27. Learning new features from matches on training data © All Rights Reserved 2014 | Neo Technology, Inc. 27
  • 28. Cost Function for the Generations of Features Reproduction occurs after a threshold of matches has been exceeded for a feature. After replication the cost function is applied to increase that threshold every time the feature reproduces. is the current threshold on the feature node. is the minimum threshold, which I chose as 5 for new features. © All Rights Reserved 2014 | Neo Technology, Inc. Cost function: 28
  • 29. © All Rights 29 Reserved 2014 | Neo Technology, Inc.
  • 30. Vector Space Model © All Rights Reserved 2014 | Neo Technology, Inc. 30
  • 31. Generating Feature Vectors The natural language parsing model created during training can be turned into a global feature index. This global feature index is a list of Neo4j internal IDs for every feature in the hierarchy. Using that global feature index, a multi-dimensional vector space is created with a length equal to the number of features in the hierarchy. © All Rights Reserved 2014 | Neo Technology, Inc. 31
  • 32. Relevance Rankings “Relevance rankings of documents in a keyword search can be calculated, using the assumptions of document similarities theory, by comparing the deviation of angles between each document vector and the original query vector where the query is represented as the same kind of vector as the documents.” - Wikipedia © All Rights Reserved 2014 | Neo Technology, Inc. 32
  • 33. Vector-based Cosine Similarity Measure In practice, it is easier to calculate the cosine of the angle between the vectors, instead of the angle itself: © All Rights Reserved 2014 | Neo Technology, Inc. 33
  • 34. Cosine Similarity & Vector Space Model © All Rights Reserved 2014 | Neo Technology, Inc. 34
  • 35. Vector-based Cosine Similarity Measure “The resulting similarity ranges from -1 meaning exactly opposite, to 1 meaning exactly the same, with 0 usually indicating independence, and in-between values indicating intermediate similarity or dissimilarity.” © All Rights Reserved 2014 | Neo Technology, Inc. via Wikipedia 35
  • 36. Graphify for Neo4j © All Rights Reserved 2014 | Neo Technology, Inc. 36
  • 37. Graphify for Neo4j Graphify is a Neo4j unmanaged extension used for document and text classification using graph-based hierarchical pattern recognition. © All Rights Reserved 2014 | Neo Technology, Inc. https://github.com/kbastani/graphify 37
  • 38. Example Project Head over to the GitHub project page and clone it to your local machine. Follow the directions listed in the README.md to install the extension. Navigate to the /examples directory of the project. © All Rights Reserved 2014 | Neo Technology, Inc. Run: examples/graphify-examples-author/src/java/org/neo4j/nlp/examples/author/main.java 38
  • 39. U.S. Presidential Speech Transcript Analysis © All Rights Reserved 2014 | Neo Technology, Inc. 39
  • 40. Identify the Political Affiliation of a Presidential Speech This example ingests a set of texts from presidential speeches with labels from the author of that speech in training phase. After building the training models, unlabeled presidential speeches are classified in the test phase. © All Rights Reserved 2014 | Neo Technology, Inc. 40
  • 41. The Presidents © All Rights Reserved 2014 | Neo Technology, Inc. • Ronald Reagan • labels: liberal, republican, ronald-reagan • George H.W. Bush • labels: conservative, republican, bush41 • Bill Clinton • labels: liberal, democrat, bill-clinton • George W. Bush • labels: conservative, republican, bush43 • Barack Obama • labels: liberal, democrat, barack-obama 41
  • 42. © All Rights Reserved 2014 | Neo Technology, Inc. Training Each of the presidents in the example have 6 speeches to analyze. 4 of the speeches are used to build a natural language parsing model. 2 of the speeches are used to test the validity of that model. 42
  • 43. Get Similar Labels/Classes © All Rights Reserved 2014 | Neo Technology, Inc. 43
  • 44. Ronald Reagan republican 0.7182046285385341 liberal 0.644281223102398 democrat 0.4854114595950056 conservative 0.4133639188595147 bill-clinton 0.4057969121945167 barack-obama 0.323947855372623 bush41 0.3222644898334092 bush43 0.3161309849153592 © All Rights Reserved 2014 | Neo Technology, Inc. Class Similarity 44
  • 45. George H.W. Bush conservative 0.7032274806766954 republican 0.6047256274615608 liberal 0.4439742461594541 democrat 0.39114918238853674 bill-clinton 0.3234223107986785 ronald-reagan 0.3222644898334092 barack-obama 0.2929260544514002 bush43 0.29106733975087984 © All Rights Reserved 2014 | Neo Technology, Inc. Class Similarity 45
  • 46. democrat 0.8375678825642422 liberal 0.7847858060182163 republican 0.5561860529059708 conservative 0.45365774896422445 barack-obama 0.4507676679770066 ronald-reagan 0.4057969121945167 bush43 0.365042482383354 bush41 0.3234223107986785 © All Rights Reserved 2014 | Neo Technology, Inc. Bill Clinton Class Similarity 46
  • 47. George W. Bush conservative 0.820636570272315 republican 0.7056890956512284 liberal 0.5075788396061254 democrat 0.4505424322086937 bill-clinton 0.365042482383354 barack-obama 0.33801949243378965 ronald-reagan 0.3161309849153592 bush41 0.29106733975087984 © All Rights Reserved 2014 | Neo Technology, Inc. Class Similarity 47
  • 48. Barack Obama democrat 0.7668017370739147 liberal 0.7184792203867296 republican 0.4847680475425114 bill-clinton 0.4507676679770066 conservative 0.4149264161292232 bush43 0.33801949243378965 ronald-reagan 0.323947855372623 bush41 0.2929260544514002 © All Rights Reserved 2014 | Neo Technology, Inc. Class Similarity 48
  • 49. Get involved in the Neo4j community © All Rights Reserved 2014 | Neo Technology, Inc. 49
  • 50. http://stackoverflow.com/questions/tagged/neo4j © All Rights Reserved 2014 | Neo Technology, Inc. 50
  • 51. http://groups.google.com/group/neo4j © All Rights Reserved 2014 | Neo Technology, Inc. 51
  • 52. https://github.com/neo4j/neo4j/issues © All Rights Reserved 2014 | Neo Technology, Inc. 52
  • 53. http://neo4j.meetup.com/ © All Rights Reserved 2014 | Neo Technology, Inc. 53
  • 54. © All Rights Reserved 2014 | Neo Technology, Inc. (Thank You) 54
  • 55. Twitter www.twitter.com/kennybastani LinkedIn www.linkedin.com/in/kennybastani GitHub www.github.com/kbastani © All Rights Reserved 2014 | Neo Technology, Inc. Get in touch 55

Editor's Notes

  1. When we think about data, we tend to think about how things are connected. This is a natural part of how we talk about things, and also of the graph model. “This is also a graph, but with some data attached. Here: we’ve attached names to the nodes and described the type of the relationships.”
  2. “We can take this further, and attach arbitrary key/value pairs” This is the Property Graph Model, which has the following characteristics: It contains Nodes and Relationships, both of which can contain properties (key-value pairs). Relationships are always between exactly 2 nodes. They have a type, and they are directed. “There are other graph models, however everyone in the industry has converged on the idea that this model is the most obvious and the most useful for real humans and the application we’re building”
  3. Let’s review the relational table model, to see the difference from the graph property model
  4. Start with Customers and Accounts “We have a customer, Alice.” “She’s got 3 accounts” “To keep track of which accounts Alice owns, we need a 3rd table, to store the mapping. Typically called a join table.”
  5. Dashboard, for monitoring of key stats Node, Relationship and Property “counts” are just estimates (actually represent the allocated ID space for each graph entity)
  6. “The Console is where you can run graph queries, written in Cypher.” We’ll be using this starting... now.
  7. Disclaimer: This is a graph-based approach to text classification and pattern recognition. This can be done in many different ways, including SVM, bayesian networks, belief networks, and many other approaches. I chose to create this on top of Neo4j because first its a database and second its already formatted as a network. This gives me the advantage of not worrying about data storage.
  8. Explain how the genetic algorithm works.
  9. I chose this example project because it’s easy to get presidential speeches online and it seemed like a good example to get others going with Graphify.
  10. “Get involved with the community, attend meetups, browse our open source code libraries, including Neo4j, by visiting us on GitHub.”
  11. “Visit stackoverflow.com with the tag Neo4j to get fast answers to your questions. We have a very active community of contributors that provide thorough answers 24/7. If you get stuck, make sure you head there.”
  12. “The same goes for Google groups, if you prefer that format over Stackoverflow.”
  13. “You can visit us on GitHub to submit or browse issues.”
  14. “Finally, I urge you to check out our website’s meetup page to find out where meetups are happening all around the world. Also we encourage you to share your experience with Neo4j, your applications, and your use cases by speaking at a local meetup. If you’re interested, please reach out to me, my contact details are in the next slide.”
  15. “Thank you for spending some time with me and learning about Neo4j and Cypher.”
  16. “Get in touch with me about meetups and Neo4j community events happening around the world.” “I’ll now open up the floor to questions.”