SlideShare a Scribd company logo
#Graphorum
Produced by
#Graphorum
Graph Techniques for Natural
Language Processing
Sujit Pal, Elsevier Labs
#Graphorum
Who am I?
• (Mostly self taught) data scientist
• Work at Elsevier Labs
• Worked with Deep Learning, Machine Learning, Natural Language
Processing, Search, Backend Web Development, Database
Administration, and Unix System Administration in reverse
chronological order.
• Took Graph Theory in college
• Rekindled interest after Social Network Analysis course on Coursera
• Interested in applications of Graph techniques to NLP
2
#Graphorum
NLP Today
Image Credit: https://www.kaggle.com/general/76963
3
#Graphorum
Typical NLP + Graph problems
• Represent text units as nodes and (similarity based) relationships as
edges in graph
• Leverage intrinsic or extrinsic graphical structure of data
• Intrinsic – co-citations and co-mentions in academic graph
• Extrinsic – text data from social networks
• Leverage external graph structure such as Knowledge Graph to
improve results for NLP task
4
#Graphorum
Case Studies
• Summarization using network metrics
• Document Clustering using Random Walk
• Word Sense Disambiguation using Label Propagation
• Incorporating external knowledge for Topic Identification
5
#Graphorum
Matrices and Graphs are Interchangeable
6
• Text elements => vectors
• Collection of elements =>
matrix
• Similarity = operation on
pairwise rows of matrix
• Convert to graph
• Graph Methods!
#Graphorum
Case Study #1
Full paper: https://www.sciencedirect.com/science/article/pii/S0020025508004520
7
#Graphorum
Case Study #1: Steps
• Create graph – sentences are nodes, edges connect sentences that
share common meaningful nouns
• Develop 14 summarizers (CN-SUMM) based on various graph metrics,
each summarizer produces a ranked list of sentences
• Voting based ensemble (CN-VOTING), ranks sentences with sum of
rankings from each of the 14 summarizers
• Return top ranked sentences from CN-VOTING as summary
8
#Graphorum
Case Study #1: Implementation
• Extract common nouns from sentences and compute similarity as
overlap
• Construct graph of sentences
• Compute Degree, Strength, Closeness, and PageRank centrality scores
per node, Shortest Path from each node to every other node, D-Ring,
k-Core, and w-Cuts, determine K most central nodes by each measure
• Ensemble predictions using Voting to produce summary sentences
• See https://github.com/sujitpal/nlp-graph-examples/tree/master/01-
doc-summarization
9
#Graphorum
Case Study #1: Degree and Strength
10
• Degree – number of edges incident on
a vertex, measured by Degree
Centrality
• Strength – sum of edge weights
incident on the vertex, measured by
Weighted Degree Centrality
#Graphorum
Case Study #1: Closeness
11
• Closeness Centrality measures how
efficiently a vertex is able to spread
information across the network
• Defined as average “farness” (inverse
distance) to all other nodes
#Graphorum
Case Study #1: PageRank
12
• Popularized by Google’s Brin and Page
• Quality and number of in-links to a page is
rough estimate of page quality
• Iterative procedure, until convergence
• Starts with all nodes having same rank
• “Surfer” starts on random page
• Chooses a page randomly from among its
outlinks
• With probability (1-d) (d=0.15 for web)
jump to some random page on web
#Graphorum
Case Study #1: Shortest Paths
13
• Mean shortest path from each node to every other node
• Compute all-pairs shortest paths
• Algorithm uses linear number of matrix multiplication
• Order is O(V4)
• Introduced by Shimbel (1953)
• Compute mean shortest path from each node to all other nodes
• An indirect measure of centrality
#Graphorum
Case Study #1: D-Ring
14
• Create subgraphs by dilating
• Start with highest degree (or position)
• D-ring is difference of subgraphs created
by consecutive dilations
• Continue to add D-rings until enough
nodes available for summary
• Two measures of centrality – CN-RingL
and CN-RingK
#Graphorum
Case Study #1: K-Core
15
• Create subgraph starting with node with
highest degree (or position) k
• Relax threshold for k and continue adding
nodes until there are enough nodes for the
summary
• Two measures of centrality based on
degree or position – CN-CoreK and CN-
CoreL
#Graphorum
Case Study #1: W-Cuts
16
• Create subgraph starting with node
pair with highest edge weight
• Relax edge weight threshold and
continue adding nodes until enough
nodes available for summary
• Two measures of centrality – CN-CutK
and CN-CutL, based on preference
given to position or degree
#Graphorum
Case Study #1: Results
17
#Graphorum
Case Study #1: Closing Thoughts
• Generated summaries are good, but biased towards longer sentences
• Strategy described above can be extended to multi-document
summarization as well, e.g., summary of product reviews.
• A variant of the strategy described is used in the gensim summarizer.
18
#Graphorum
End of Case Study #1
19
#Graphorum
Case Study #2
Full paper: https://www.aclweb.org/anthology/N06-1061
20
#Graphorum
Case Study #2: Steps
• Paper asserts that Language Model based graph is more effective for clustering
than TD matrix based graph
• Represent each document in corpus as a node, edges connect documents by
cosine similarity of TF-IDF document vectors
• Compute t-step (t=1, 2, 3) random walks for each node, considering only top k
edges (for k ~ 80), and compute generation probabilities
• Cluster resulting graph of document generation probabilities with k-means and
Louvain Modularity
21
#Graphorum
Case Study #2: Implementation
• 20-Newsgroup dataset (18k newsgroup postings, 20 categories)
• Clean text and construct TD matrix
• Construct cosine similarity matrix S, sparsify using top generators (c=80), remove self-edges, and
renormalize
• Run (c=80) random walks on each node for path length = 1, 2, 3
• Compute empirical transition probability matrix (language model!) G from walks
• Construct graph, apply Louvain Community Detection on various graphs
• Compare against K-Means clusters from various document vectors
• See https://github.com/sujitpal/nlp-graph-examples/tree/master/02-docs-clustering
22
#Graphorum
Case Study #2: TD Matrix to Cosine Similarity
23
Image Credit: https://www.quora.com/What-is-a-tf-idf-vector
• Documents represented as TD Matrix
(n documents x t features)
• Similarity Matrix (n x n) = TD Matrix
(n x t) times its transpose (t x n),
divided by |T| to keep similarity
values in range (0, 1)
#Graphorum
Case Study #2: Random Walks
24
Image Credit: https://snap.stanford.edu/node2vec/
• Probabilistic technique used to
“flatten” graph into feature vector
• Intuition – similar nodes are closer to
each other in the graph than
dissimilar nodes
• Compute empirical generation
probabilities
• Other popular applications --
DeepWalk and node2vec
#Graphorum
Case Study #2: Louvain Modularity
25
• Community Detection Algorithm – maximize
modularity score for each community
• Modularity = difference between actual number
of edges between node pair and expected
number of edges, summed over all nodes in
community
• Iterative procedure, run till convergence
• Greedily assign nodes to communities,
optimizing local modularity
• Define a coarse grained network of
communities
#Graphorum
Case Study #2: Results
• Silhouette Score = tightness /
separation
• Baseline – TD + K-means + Labels close
to 0
• G1, G2, G3 – LM Matrices for n=1,
n={1,2}, and n={1,2,3}.
• LM based graphs outperform TD
matrix based graphs
• Louvain outperforms K-Means
26
#Graphorum
Case Study #2: Closing Thoughts
• Transforming the graph to have edges based on transition
probabilities based on random walks yields better clustering results.
• Random Walks on graph structures often used to “flatten” the graph
and expose higher-order proximity dependencies that can sometimes
look like semantic similarity
• Community Detection algorithms can be used for clustering, and
often produce more explainable clusters
27
#Graphorum
End of Case Study #2
28
#Graphorum
Case Study #3
Full paper: https://www.aclweb.org/anthology/P05-1049
29
#Graphorum
Case Study #3: Steps
• Choose ambiguous word of interest (https://muse.dillfrog.com/lists/ambiguous)
• Find sentences containing ambiguous word from large corpus
• Manually assign labels to some sentences
• Featurize each sentence using POS of neighboring words, unigrams, and local
collocations
• Create graph with sentences as nodes, edges weighted by cosine similarity and JS
divergence of feature vectors
• Propagate Labels till convergence
• Generate word sense clusters
30
#Graphorum
Case Study #3: Implementation
• We selected the ambiguous word “compound” with these 2 senses
• Chemical compound
• Composite or multiple
• Extracted 670 sentences containing “compound” from SD corpus
• Manually marked up 40 total sentences (19 + 21) ~ 5% of corpus
• Created TD matrix of 1..3 grams + 3-gram POS tags, sparsified (k=5), removed self-
edges, and created graph
• Run Label Propagation to propagate the 40 labels to unlabeled sentences
• See https://github.com/sujitpal/nlp-graph-examples/tree/master/03-word-sense-
disambiguation
31
#Graphorum
Case Study #3: Label Propagation
32
• Label Propagation uses network structure to detect
communities.
• Used here in semi-supervised manner by specifying
labels for a small subset of nodes
• Iterative algorithm
• Initialize nodes each with unique label
• Each node updates its labels to the most frequent
label of its neighbors
• Converges when each node has the most
frequent label of its neighbors
• Not guaranteed to converge!
#Graphorum
Case Study #3: Results
• Of 623 unlabeled sentences, Label Propagation predicts 319 sentences use the
first sense (chemical compound), 7 use the second sense (composite), and misses
298
• Misses are mostly chemical compounds (sense 1)
• Examples:
• Sense #1: ORTEP view of the compound [CuL8(ClO4)2] with the numbering
scheme adopted.
• Sense #2: Sensitive to compound fluorescence.
• Results can probably be improved – tried increasing initial labels, and by starting
with denser networks (so LP does not terminate as quickly)
33
#Graphorum
End of Case Study #3
34
#Graphorum
Case Study #4
Full paper: https://www.aclweb.org/anthology/W09-1126
35
#Graphorum
Case Study #4: Steps
• Build up in-memory graph structure for Knowledge Graph (KG)
• Match phrases in document against KG entries
• Compute Personalized PageRank (PPR) biased to matched nodes
• Roll up top scored concepts from PPR to category concepts
• Report top category concepts as document topics
36
#Graphorum
Case Study #4: Implementation
• Annotate ScienceDaily article against Aho-Corasick dictionary of KG concepts
• Using company proprietary KG to build graph, 2 versions
• Lateral relations only
• isChildOf (child -> parent) relations only
• Run Personalized PageRank (PPR) against Lateral Relations graph setting source
nodes to concepts found in article
• Roll up high PPR score concepts to disease category concepts
• Top disease category concepts are document topic labels
• See https://github.com/sujitpal/nlp-graph-examples/tree/master/04-topic-
identification
37
#Graphorum
Case Study #4: Aho-Corasick Matching
38
• Inverted index of terms to
concept ID stored in trie-like
data structure, where every
node is a token in phrase
representing concept name
• Document streamed against
this data structure to produce
list of phrases in document
matched against concepts in
dictionary
Image Credit: https://brunorb.com/aho-corasick/
#Graphorum
Case Study #4: Personalized PageRank
39
• In PageRank, surfer doing random walk on graph jumps to some random point in
the graph with some probability d (d=0.15 for web graphs)
• In Personalized PageRank (PPR), surfer will jump to a neighborhood of the graph
specified by a set of nodes (source nodes)
• Overall effect is to assign high PPR to nodes that are in close proximity to the
source nodes.
• Personalized PageRank has been found to be an effective measure for
recommendation systems
#Graphorum
Case Study #4: Disease Categories
40
• Disease Category Concepts are
children of Diseases Concept
• Navigate to parent from
Discovered Concepts until a
Disease Category node is found
(or no parents are found)
• Roll up discovered concepts to
their Disease Categories – these
are the Document Topics
#Graphorum
Case Study #4: Results
41
#Graphorum
Case Study #4: Closing Thoughts
• Topic predictions from rolling up high PPR concepts are serendipitous,
but not necessarily complete
• Better results if combined with topic predictions obtained from rolling
up concepts found in article
42
#Graphorum
End of Case Study #4
43
#Graphorum
Summing up
• Content features and graph structure often reinforce each other
• Can be useful for unsupervised and semi-supervised NLP tasks
• Not necessarily an either-or – BERT based models can coexist with
Graph techniques
44
#Graphorum
Tools
• Originally planned to use Spark + GraphFrames for large graphs and
Neo4j for small / medium graphs
• Neo4j worked well for largest graph (500 K nodes, 1.3 M edges)
• Neo4j algorithms frequently have more functionality
• Allows multiple source nodes for Parallel PageRank
• Allows weighted edges in Label Propagation
• Ended up using Neo4j for all case studies
45
#Graphorum
Reading List
46
#Graphorum
Thank you
• My contact information
• Email: sujit.pal@elsevier.com
• LinkedIn: https://www.linkedin.com/in/sujitpal/
• Twitter: https://twitter.com/palsujit
• Blog: http://sujitpal.blogspot.com/
• Code for this presentation:
• https://github.com/sujitpal/nlp-graph-examples
47

More Related Content

What's hot

Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
Nik Spirin
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
Yanchang Zhao
 
Detecting Multiple Aliases in Social Media
Detecting Multiple Aliases in Social MediaDetecting Multiple Aliases in Social Media
Detecting Multiple Aliases in Social Media
Amendra Shrestha
 
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
Konstantinos Zagoris
 
Summary of SIGIR 2011 Papers
Summary of SIGIR 2011 PapersSummary of SIGIR 2011 Papers
Summary of SIGIR 2011 Papers
chetanagavankar
 
Easing embedding learning by comprehensive transcription of heterogeneous inf...
Easing embedding learning by comprehensive transcription of heterogeneous inf...Easing embedding learning by comprehensive transcription of heterogeneous inf...
Easing embedding learning by comprehensive transcription of heterogeneous inf...
paper_reader
 
AINL 2016: Galinsky, Alekseev, Nikolenko
AINL 2016: Galinsky, Alekseev, NikolenkoAINL 2016: Galinsky, Alekseev, Nikolenko
AINL 2016: Galinsky, Alekseev, Nikolenko
Lidia Pivovarova
 
AINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, NikolenkoAINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, Nikolenko
Lidia Pivovarova
 
Automatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksAutomatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural Networks
Jinho Choi
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With R
Jahnab Kumar Deka
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detection
NASIM ALAM
 
multiple dispatch(OOPs concepts)
multiple dispatch(OOPs concepts)multiple dispatch(OOPs concepts)
multiple dispatch(OOPs concepts)
sumitra22
 
Text Mining Using R
Text Mining Using RText Mining Using R
Text Mining Using R
Knoldus Inc.
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalDustin Smith
 
Author Identification of Source Code Segments Written by Multiple Authors Usi...
Author Identification of Source Code Segments Written by Multiple Authors Usi...Author Identification of Source Code Segments Written by Multiple Authors Usi...
Author Identification of Source Code Segments Written by Multiple Authors Usi...
Parvez Mahbub
 
Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)
Rich Heimann
 
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systemsAOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
Zhenyun Zhuang
 
Text Mining using LDA with Context
Text Mining using LDA with ContextText Mining using LDA with Context
Text Mining using LDA with Context
Steffen Staab
 
Understanding WeboNaver
Understanding WeboNaverUnderstanding WeboNaver
Understanding WeboNaver
Han Woo PARK
 

What's hot (20)

Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
Detecting Multiple Aliases in Social Media
Detecting Multiple Aliases in Social MediaDetecting Multiple Aliases in Social Media
Detecting Multiple Aliases in Social Media
 
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
 
Summary of SIGIR 2011 Papers
Summary of SIGIR 2011 PapersSummary of SIGIR 2011 Papers
Summary of SIGIR 2011 Papers
 
Easing embedding learning by comprehensive transcription of heterogeneous inf...
Easing embedding learning by comprehensive transcription of heterogeneous inf...Easing embedding learning by comprehensive transcription of heterogeneous inf...
Easing embedding learning by comprehensive transcription of heterogeneous inf...
 
AINL 2016: Galinsky, Alekseev, Nikolenko
AINL 2016: Galinsky, Alekseev, NikolenkoAINL 2016: Galinsky, Alekseev, Nikolenko
AINL 2016: Galinsky, Alekseev, Nikolenko
 
AINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, NikolenkoAINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, Nikolenko
 
Automatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural NetworksAutomatic Personality Prediction with Attention-based Neural Networks
Automatic Personality Prediction with Attention-based Neural Networks
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With R
 
Sigir 2011 proceedings
Sigir 2011 proceedingsSigir 2011 proceedings
Sigir 2011 proceedings
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detection
 
multiple dispatch(OOPs concepts)
multiple dispatch(OOPs concepts)multiple dispatch(OOPs concepts)
multiple dispatch(OOPs concepts)
 
Text Mining Using R
Text Mining Using RText Mining Using R
Text Mining Using R
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Author Identification of Source Code Segments Written by Multiple Authors Usi...
Author Identification of Source Code Segments Written by Multiple Authors Usi...Author Identification of Source Code Segments Written by Multiple Authors Usi...
Author Identification of Source Code Segments Written by Multiple Authors Usi...
 
Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)
 
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systemsAOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
 
Text Mining using LDA with Context
Text Mining using LDA with ContextText Mining using LDA with Context
Text Mining using LDA with Context
 
Understanding WeboNaver
Understanding WeboNaverUnderstanding WeboNaver
Understanding WeboNaver
 

Similar to Graph Techniques for Natural Language Processing

Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
Justin Sybrandt, Ph.D.
 
Probabilistic Topic models
Probabilistic Topic modelsProbabilistic Topic models
Probabilistic Topic models
Carlos Badenes-Olmedo
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Denis Parra Santander
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Databricks
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Mikel Emaldi Manrique
 
Odsc 2019 entity_reputation_knowledge_graph
Odsc 2019 entity_reputation_knowledge_graphOdsc 2019 entity_reputation_knowledge_graph
Odsc 2019 entity_reputation_knowledge_graph
venkatramanJ4
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
thanhdowork
 
Hierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondHierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyond
Frank Kelly
 
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
Waqas Nawaz
 
Mining the social web 6
Mining the social web 6Mining the social web 6
Mining the social web 6
HyeonSeok Choi
 
20191107 deeplearningapproachesfornetworks
20191107 deeplearningapproachesfornetworks20191107 deeplearningapproachesfornetworks
20191107 deeplearningapproachesfornetworks
tm1966
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Spark Summit
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...
Feng Li
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
bintis1
 
Keynote at AImWD
Keynote at AImWDKeynote at AImWD
Keynote at AImWD
Stefan Schlobach
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Databricks
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Fred Madrid
 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
MOVING Project
 

Similar to Graph Techniques for Natural Language Processing (20)

Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
Probabilistic Topic models
Probabilistic Topic modelsProbabilistic Topic models
Probabilistic Topic models
 
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
Network Visualization guest lecture at #DataVizQMSS at @Columbia / #SNA at PU...
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
 
Odsc 2019 entity_reputation_knowledge_graph
Odsc 2019 entity_reputation_knowledge_graphOdsc 2019 entity_reputation_knowledge_graph
Odsc 2019 entity_reputation_knowledge_graph
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
 
Hierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondHierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyond
 
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
 
Mining the social web 6
Mining the social web 6Mining the social web 6
Mining the social web 6
 
20191107 deeplearningapproachesfornetworks
20191107 deeplearningapproachesfornetworks20191107 deeplearningapproachesfornetworks
20191107 deeplearningapproachesfornetworks
 
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
 
Keynote at AImWD
Keynote at AImWDKeynote at AImWD
Keynote at AImWD
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
 

More from Sujit Pal

Supporting Concept Search using a Clinical Healthcare Knowledge Graph
Supporting Concept Search using a Clinical Healthcare Knowledge GraphSupporting Concept Search using a Clinical Healthcare Knowledge Graph
Supporting Concept Search using a Clinical Healthcare Knowledge Graph
Sujit Pal
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
Sujit Pal
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
Sujit Pal
 
Cheap Trick for Question Answering
Cheap Trick for Question AnsweringCheap Trick for Question Answering
Cheap Trick for Question Answering
Sujit Pal
 
Searching Across Images and Test
Searching Across Images and TestSearching Across Images and Test
Searching Across Images and Test
Sujit Pal
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Sujit Pal
 
The power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringThe power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestring
Sujit Pal
 
Backprop Visualization
Backprop VisualizationBackprop Visualization
Backprop Visualization
Sujit Pal
 
Accelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudAccelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn Cloud
Sujit Pal
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Sujit Pal
 
Leslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal ClubLeslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal Club
Sujit Pal
 
Using Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalUsing Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based Retrieval
Sujit Pal
 
Transformer Mods for Document Length Inputs
Transformer Mods for Document Length InputsTransformer Mods for Document Length Inputs
Transformer Mods for Document Length Inputs
Sujit Pal
 
Question Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other StoriesQuestion Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other Stories
Sujit Pal
 
Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDS
Sujit Pal
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Sujit Pal
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
Sujit Pal
 
Search summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSearch summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slides
Sujit Pal
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
Sujit Pal
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
Sujit Pal
 

More from Sujit Pal (20)

Supporting Concept Search using a Clinical Healthcare Knowledge Graph
Supporting Concept Search using a Clinical Healthcare Knowledge GraphSupporting Concept Search using a Clinical Healthcare Knowledge Graph
Supporting Concept Search using a Clinical Healthcare Knowledge Graph
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
 
Cheap Trick for Question Answering
Cheap Trick for Question AnsweringCheap Trick for Question Answering
Cheap Trick for Question Answering
 
Searching Across Images and Test
Searching Across Images and TestSearching Across Images and Test
Searching Across Images and Test
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...
 
The power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringThe power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestring
 
Backprop Visualization
Backprop VisualizationBackprop Visualization
Backprop Visualization
 
Accelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudAccelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn Cloud
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
 
Leslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal ClubLeslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal Club
 
Using Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalUsing Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based Retrieval
 
Transformer Mods for Document Length Inputs
Transformer Mods for Document Length InputsTransformer Mods for Document Length Inputs
Transformer Mods for Document Length Inputs
 
Question Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other StoriesQuestion Answering as Search - the Anserini Pipeline and Other Stories
Question Answering as Search - the Anserini Pipeline and Other Stories
 
Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDS
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
 
Search summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSearch summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slides
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

Graph Techniques for Natural Language Processing

  • 1. #Graphorum Produced by #Graphorum Graph Techniques for Natural Language Processing Sujit Pal, Elsevier Labs
  • 2. #Graphorum Who am I? • (Mostly self taught) data scientist • Work at Elsevier Labs • Worked with Deep Learning, Machine Learning, Natural Language Processing, Search, Backend Web Development, Database Administration, and Unix System Administration in reverse chronological order. • Took Graph Theory in college • Rekindled interest after Social Network Analysis course on Coursera • Interested in applications of Graph techniques to NLP 2
  • 3. #Graphorum NLP Today Image Credit: https://www.kaggle.com/general/76963 3
  • 4. #Graphorum Typical NLP + Graph problems • Represent text units as nodes and (similarity based) relationships as edges in graph • Leverage intrinsic or extrinsic graphical structure of data • Intrinsic – co-citations and co-mentions in academic graph • Extrinsic – text data from social networks • Leverage external graph structure such as Knowledge Graph to improve results for NLP task 4
  • 5. #Graphorum Case Studies • Summarization using network metrics • Document Clustering using Random Walk • Word Sense Disambiguation using Label Propagation • Incorporating external knowledge for Topic Identification 5
  • 6. #Graphorum Matrices and Graphs are Interchangeable 6 • Text elements => vectors • Collection of elements => matrix • Similarity = operation on pairwise rows of matrix • Convert to graph • Graph Methods!
  • 7. #Graphorum Case Study #1 Full paper: https://www.sciencedirect.com/science/article/pii/S0020025508004520 7
  • 8. #Graphorum Case Study #1: Steps • Create graph – sentences are nodes, edges connect sentences that share common meaningful nouns • Develop 14 summarizers (CN-SUMM) based on various graph metrics, each summarizer produces a ranked list of sentences • Voting based ensemble (CN-VOTING), ranks sentences with sum of rankings from each of the 14 summarizers • Return top ranked sentences from CN-VOTING as summary 8
  • 9. #Graphorum Case Study #1: Implementation • Extract common nouns from sentences and compute similarity as overlap • Construct graph of sentences • Compute Degree, Strength, Closeness, and PageRank centrality scores per node, Shortest Path from each node to every other node, D-Ring, k-Core, and w-Cuts, determine K most central nodes by each measure • Ensemble predictions using Voting to produce summary sentences • See https://github.com/sujitpal/nlp-graph-examples/tree/master/01- doc-summarization 9
  • 10. #Graphorum Case Study #1: Degree and Strength 10 • Degree – number of edges incident on a vertex, measured by Degree Centrality • Strength – sum of edge weights incident on the vertex, measured by Weighted Degree Centrality
  • 11. #Graphorum Case Study #1: Closeness 11 • Closeness Centrality measures how efficiently a vertex is able to spread information across the network • Defined as average “farness” (inverse distance) to all other nodes
  • 12. #Graphorum Case Study #1: PageRank 12 • Popularized by Google’s Brin and Page • Quality and number of in-links to a page is rough estimate of page quality • Iterative procedure, until convergence • Starts with all nodes having same rank • “Surfer” starts on random page • Chooses a page randomly from among its outlinks • With probability (1-d) (d=0.15 for web) jump to some random page on web
  • 13. #Graphorum Case Study #1: Shortest Paths 13 • Mean shortest path from each node to every other node • Compute all-pairs shortest paths • Algorithm uses linear number of matrix multiplication • Order is O(V4) • Introduced by Shimbel (1953) • Compute mean shortest path from each node to all other nodes • An indirect measure of centrality
  • 14. #Graphorum Case Study #1: D-Ring 14 • Create subgraphs by dilating • Start with highest degree (or position) • D-ring is difference of subgraphs created by consecutive dilations • Continue to add D-rings until enough nodes available for summary • Two measures of centrality – CN-RingL and CN-RingK
  • 15. #Graphorum Case Study #1: K-Core 15 • Create subgraph starting with node with highest degree (or position) k • Relax threshold for k and continue adding nodes until there are enough nodes for the summary • Two measures of centrality based on degree or position – CN-CoreK and CN- CoreL
  • 16. #Graphorum Case Study #1: W-Cuts 16 • Create subgraph starting with node pair with highest edge weight • Relax edge weight threshold and continue adding nodes until enough nodes available for summary • Two measures of centrality – CN-CutK and CN-CutL, based on preference given to position or degree
  • 18. #Graphorum Case Study #1: Closing Thoughts • Generated summaries are good, but biased towards longer sentences • Strategy described above can be extended to multi-document summarization as well, e.g., summary of product reviews. • A variant of the strategy described is used in the gensim summarizer. 18
  • 19. #Graphorum End of Case Study #1 19
  • 20. #Graphorum Case Study #2 Full paper: https://www.aclweb.org/anthology/N06-1061 20
  • 21. #Graphorum Case Study #2: Steps • Paper asserts that Language Model based graph is more effective for clustering than TD matrix based graph • Represent each document in corpus as a node, edges connect documents by cosine similarity of TF-IDF document vectors • Compute t-step (t=1, 2, 3) random walks for each node, considering only top k edges (for k ~ 80), and compute generation probabilities • Cluster resulting graph of document generation probabilities with k-means and Louvain Modularity 21
  • 22. #Graphorum Case Study #2: Implementation • 20-Newsgroup dataset (18k newsgroup postings, 20 categories) • Clean text and construct TD matrix • Construct cosine similarity matrix S, sparsify using top generators (c=80), remove self-edges, and renormalize • Run (c=80) random walks on each node for path length = 1, 2, 3 • Compute empirical transition probability matrix (language model!) G from walks • Construct graph, apply Louvain Community Detection on various graphs • Compare against K-Means clusters from various document vectors • See https://github.com/sujitpal/nlp-graph-examples/tree/master/02-docs-clustering 22
  • 23. #Graphorum Case Study #2: TD Matrix to Cosine Similarity 23 Image Credit: https://www.quora.com/What-is-a-tf-idf-vector • Documents represented as TD Matrix (n documents x t features) • Similarity Matrix (n x n) = TD Matrix (n x t) times its transpose (t x n), divided by |T| to keep similarity values in range (0, 1)
  • 24. #Graphorum Case Study #2: Random Walks 24 Image Credit: https://snap.stanford.edu/node2vec/ • Probabilistic technique used to “flatten” graph into feature vector • Intuition – similar nodes are closer to each other in the graph than dissimilar nodes • Compute empirical generation probabilities • Other popular applications -- DeepWalk and node2vec
  • 25. #Graphorum Case Study #2: Louvain Modularity 25 • Community Detection Algorithm – maximize modularity score for each community • Modularity = difference between actual number of edges between node pair and expected number of edges, summed over all nodes in community • Iterative procedure, run till convergence • Greedily assign nodes to communities, optimizing local modularity • Define a coarse grained network of communities
  • 26. #Graphorum Case Study #2: Results • Silhouette Score = tightness / separation • Baseline – TD + K-means + Labels close to 0 • G1, G2, G3 – LM Matrices for n=1, n={1,2}, and n={1,2,3}. • LM based graphs outperform TD matrix based graphs • Louvain outperforms K-Means 26
  • 27. #Graphorum Case Study #2: Closing Thoughts • Transforming the graph to have edges based on transition probabilities based on random walks yields better clustering results. • Random Walks on graph structures often used to “flatten” the graph and expose higher-order proximity dependencies that can sometimes look like semantic similarity • Community Detection algorithms can be used for clustering, and often produce more explainable clusters 27
  • 28. #Graphorum End of Case Study #2 28
  • 29. #Graphorum Case Study #3 Full paper: https://www.aclweb.org/anthology/P05-1049 29
  • 30. #Graphorum Case Study #3: Steps • Choose ambiguous word of interest (https://muse.dillfrog.com/lists/ambiguous) • Find sentences containing ambiguous word from large corpus • Manually assign labels to some sentences • Featurize each sentence using POS of neighboring words, unigrams, and local collocations • Create graph with sentences as nodes, edges weighted by cosine similarity and JS divergence of feature vectors • Propagate Labels till convergence • Generate word sense clusters 30
  • 31. #Graphorum Case Study #3: Implementation • We selected the ambiguous word “compound” with these 2 senses • Chemical compound • Composite or multiple • Extracted 670 sentences containing “compound” from SD corpus • Manually marked up 40 total sentences (19 + 21) ~ 5% of corpus • Created TD matrix of 1..3 grams + 3-gram POS tags, sparsified (k=5), removed self- edges, and created graph • Run Label Propagation to propagate the 40 labels to unlabeled sentences • See https://github.com/sujitpal/nlp-graph-examples/tree/master/03-word-sense- disambiguation 31
  • 32. #Graphorum Case Study #3: Label Propagation 32 • Label Propagation uses network structure to detect communities. • Used here in semi-supervised manner by specifying labels for a small subset of nodes • Iterative algorithm • Initialize nodes each with unique label • Each node updates its labels to the most frequent label of its neighbors • Converges when each node has the most frequent label of its neighbors • Not guaranteed to converge!
  • 33. #Graphorum Case Study #3: Results • Of 623 unlabeled sentences, Label Propagation predicts 319 sentences use the first sense (chemical compound), 7 use the second sense (composite), and misses 298 • Misses are mostly chemical compounds (sense 1) • Examples: • Sense #1: ORTEP view of the compound [CuL8(ClO4)2] with the numbering scheme adopted. • Sense #2: Sensitive to compound fluorescence. • Results can probably be improved – tried increasing initial labels, and by starting with denser networks (so LP does not terminate as quickly) 33
  • 34. #Graphorum End of Case Study #3 34
  • 35. #Graphorum Case Study #4 Full paper: https://www.aclweb.org/anthology/W09-1126 35
  • 36. #Graphorum Case Study #4: Steps • Build up in-memory graph structure for Knowledge Graph (KG) • Match phrases in document against KG entries • Compute Personalized PageRank (PPR) biased to matched nodes • Roll up top scored concepts from PPR to category concepts • Report top category concepts as document topics 36
  • 37. #Graphorum Case Study #4: Implementation • Annotate ScienceDaily article against Aho-Corasick dictionary of KG concepts • Using company proprietary KG to build graph, 2 versions • Lateral relations only • isChildOf (child -> parent) relations only • Run Personalized PageRank (PPR) against Lateral Relations graph setting source nodes to concepts found in article • Roll up high PPR score concepts to disease category concepts • Top disease category concepts are document topic labels • See https://github.com/sujitpal/nlp-graph-examples/tree/master/04-topic- identification 37
  • 38. #Graphorum Case Study #4: Aho-Corasick Matching 38 • Inverted index of terms to concept ID stored in trie-like data structure, where every node is a token in phrase representing concept name • Document streamed against this data structure to produce list of phrases in document matched against concepts in dictionary Image Credit: https://brunorb.com/aho-corasick/
  • 39. #Graphorum Case Study #4: Personalized PageRank 39 • In PageRank, surfer doing random walk on graph jumps to some random point in the graph with some probability d (d=0.15 for web graphs) • In Personalized PageRank (PPR), surfer will jump to a neighborhood of the graph specified by a set of nodes (source nodes) • Overall effect is to assign high PPR to nodes that are in close proximity to the source nodes. • Personalized PageRank has been found to be an effective measure for recommendation systems
  • 40. #Graphorum Case Study #4: Disease Categories 40 • Disease Category Concepts are children of Diseases Concept • Navigate to parent from Discovered Concepts until a Disease Category node is found (or no parents are found) • Roll up discovered concepts to their Disease Categories – these are the Document Topics
  • 42. #Graphorum Case Study #4: Closing Thoughts • Topic predictions from rolling up high PPR concepts are serendipitous, but not necessarily complete • Better results if combined with topic predictions obtained from rolling up concepts found in article 42
  • 43. #Graphorum End of Case Study #4 43
  • 44. #Graphorum Summing up • Content features and graph structure often reinforce each other • Can be useful for unsupervised and semi-supervised NLP tasks • Not necessarily an either-or – BERT based models can coexist with Graph techniques 44
  • 45. #Graphorum Tools • Originally planned to use Spark + GraphFrames for large graphs and Neo4j for small / medium graphs • Neo4j worked well for largest graph (500 K nodes, 1.3 M edges) • Neo4j algorithms frequently have more functionality • Allows multiple source nodes for Parallel PageRank • Allows weighted edges in Label Propagation • Ended up using Neo4j for all case studies 45
  • 47. #Graphorum Thank you • My contact information • Email: sujit.pal@elsevier.com • LinkedIn: https://www.linkedin.com/in/sujitpal/ • Twitter: https://twitter.com/palsujit • Blog: http://sujitpal.blogspot.com/ • Code for this presentation: • https://github.com/sujitpal/nlp-graph-examples 47