Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Semantic Knowledge Graph:
A compact, auto-generated model for real-time traversal
and ranking of any relationship with...
Trey Grainger
SVP of Engineering
• Previously Director of Engineering @ CareerBuilder
• MBA, Management of Technology – Ge...
Terminology / Background
A Graph
DSAA 2016
Montreal
Quebec Canada
Semantic
Knowledge
Graph Paper
Trey
Grainger
Mohammed
Korayem
Andries
Smith
Khali...
“Solr is the popular, blazing-fast,
open source enterprise search
platform built on Apache Lucene™.”
Key Solr Features:
● Multilingual Keyword search
● Relevancy Ranking of results
● Faceting & Analytics
● Highlighting
● Sp...
Term Documents
a doc1 [2x]
brown doc3 [1x] , doc5 [1x]
cat doc4 [1x]
cow doc2 [1x] , doc5 [1x]
… ...
once doc1 [1x], doc5 ...
/solr/select/?q=apache solr
Term Documents
… …
apache doc1, doc3, doc4,
doc5
…
hadoop doc2, doc4, doc6
… …
solr doc1, doc3...
Related Work
Knowledge
Graph
Related Work
• Primarily related to ontology Learning.
• Recently, large-scale knowledge bases that utiliz...
Problem Description
Knowledge
Graph
Challenges we are solving
Because current knowledge bases / ontology learning systems typically
requires e...
Knowledge
Graph
Semantic Data Encoded into Free Text Content
e en eng engi engineer engineers
engineer engineersNode Type:...
Model
id: 1
job_title: Software Engineer
desc: software engineer at a
great company
skills: .Net, C#, java
id: 2
job_title: Regi...
Knowledge
Graph
Set-theory View
Graph View
How the Graph Traversal Works
skill: Java
skill: Scala
skill:
Hibernate
skill:
...
Knowledge
Graph
Graph Model
Structure:
Single-level Traversal / Scoring:
Multi-level Traversal / Scoring:
Knowledge
Graph
Multi-level Traversal
Data Structure View
Graph View
doc 1
doc 2
doc 3
doc 4
doc 5
doc 6
skill:
Java
skill...
Knowledge
Graph
Scoring nodes in the Graph
Foreground vs. Background Analysis
Every term scored against it’s context. The ...
Source: Trey Grainger,
Khalifeh AlJadda, Mohammed
Korayem, Andries Smith.“The
Semantic Knowledge Graph: A
compact, auto-ge...
Knowledge
Graph
Materialization of new nodes through shared documents
engineer
engineers
software engineer*
(materialized ...
Implementation
Open Sourced!
Knowledge
Graph
Populating the Graph
Knowledge
Graph
Knowledge
Graph
Experiments
Knowledge
Graph
Data Cleansing
{ "type":"keywords”, "values":[
{ "value":"hive", "relatedness": 0.9765, "popularity":369 }...
Knowledge
Graph
Predictive Analytics
Knowledge
Graph
Search Expansion
Experiment: Take an initial query, and expand keyword
phrases to include the most related...
The Semantic Search Problem
User’s Query:
machine learning research and development Portland, OR software
engineer AND had...
machine learning
Keywords:
Search Behavior,
Application Behavior, etc.
Job Title Classifier, Skills Extractor, Job Level C...
Knowledge
Graph
Document Summarization
Experiment: Pass in raw text
(extracting phrases as needed), and
rank their similar...
Document Enrichment – Find / Score Relationships
Document Summarization – Rank / Clean Keywords
Knowledge
Graph
Future Work
• Semantic Search (more experiments)
• Search Engine Relevancy Algorithms
• Trending Topics
• ...
Knowledge
Graph
Conclusion
Applications:
The Semantic Knowledge Graph has numerous applications, including
automatically b...
Knowledge
Graph
References
Contact Info
Trey Grainger
trey.grainger@lucidworks.com
@treygrainger
http://solrinaction.com
Other presentations:
http://...
The Semantic Knowledge Graph
Upcoming SlideShare
Loading in …5
×

The Semantic Knowledge Graph

7,720 views

Published on

Presentation of the Semantic Knowledge Graph research paper at the 2016 IEEE 3rd International Conference on Data Science and Advanced Analytics (Montreal, Canada - October 18th, 2016)

Abstract—This paper describes a new kind of knowledge representation and mining system which we are calling the Semantic Knowledge Graph. At its heart, the Semantic Knowledge Graph leverages an inverted index, along with a complementary uninverted index, to represent nodes (terms) and edges (the documents within intersecting postings lists for multiple terms/nodes). This provides a layer of indirection between each pair of nodes and their corresponding edge, enabling edges to materialize dynamically from underlying corpus statistics. As a result, any combination of nodes can have edges to any other nodes materialize and be scored to reveal latent relationships between the nodes. This provides numerous benefits: the knowledge graph can be built automatically from a real-world corpus of data, new nodes - along with their combined edges - can be instantly materialized from any arbitrary combination of preexisting nodes (using set operations), and a full model of the semantic relationships between all entities within a domain can be represented and dynamically traversed using a highly compact representation of the graph. Such a system has widespread applications in areas as diverse as knowledge modeling and reasoning, natural language processing, anomaly detection, data cleansing, semantic search, analytics, data classification, root cause analysis, and recommendations systems. The main contribution of this paper is the introduction of a novel system - the Semantic Knowledge Graph - which is able to dynamically discover and score interesting relationships between any arbitrary combination of entities (words, phrases, or extracted concepts) through dynamically materializing nodes and edges from a compact graphical representation built automatically from a corpus of data representative of a knowledge domain.

Published in: Technology
  • Be the first to comment

The Semantic Knowledge Graph

  1. 1. The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain Trey Grainger SVP of Engineering Lucidworks Khalifeh AlJadda Lead Data Scientist CareerBuilder Mohammed Korayem Data Scientist CareerBuilder Andries Smith Software Engineer CareerBuilder
  2. 2. Trey Grainger SVP of Engineering • Previously Director of Engineering @ CareerBuilder • MBA, Management of Technology – Georgia Tech • BA, Computer Science, Business, & Philosophy – Furman University • Information Retrieval & Web Search - Stanford University Fun outside of CB: • Co-author of Solr in Action, plus a handful of research papers • Frequent conference speaker • Founder of Celiaccess.com, the gluten-free search engine • Lucene/Solr contributor About Me
  3. 3. Terminology / Background
  4. 4. A Graph DSAA 2016 Montreal Quebec Canada Semantic Knowledge Graph Paper Trey Grainger Mohammed Korayem Andries Smith Khalifeh AlJadda in_country Node / Vertex Edge
  5. 5. “Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene™.”
  6. 6. Key Solr Features: ● Multilingual Keyword search ● Relevancy Ranking of results ● Faceting & Analytics ● Highlighting ● Spelling Correction ● Autocomplete/Type-ahead Prediction ● Sorting, Grouping, Deduplication ● Distributed, Fault-tolerant, Scalable ● Geospatial search ● Complex Function queries ● Recommendations (More Like This) ● … many more *source: Solr in Action, chapter 2
  7. 7. Term Documents a doc1 [2x] brown doc3 [1x] , doc5 [1x] cat doc4 [1x] cow doc2 [1x] , doc5 [1x] … ... once doc1 [1x], doc5 [1x] over doc2 [1x], doc3 [1x] the doc2 [2x], doc3 [2x], doc4[2x], doc5 [1x] … … Document Content Field doc1 once upon a time, in a land far, far away doc2 the cow jumped over the moon. doc3 the quick brown fox jumped over the lazy dog. doc4 the cat in the hat doc5 The brown cow said “moo” once. … … What you SEND to Lucene/Solr: How the content is INDEXED into Lucene/Solr (conceptually): The inverted index
  8. 8. /solr/select/?q=apache solr Term Documents … … apache doc1, doc3, doc4, doc5 … hadoop doc2, doc4, doc6 … … solr doc1, doc3, doc4, doc7, doc8 … … doc5 doc7 doc8 doc1 doc3 doc4 solr apache apache solr Matching queries to documents
  9. 9. Related Work
  10. 10. Knowledge Graph Related Work • Primarily related to ontology Learning. • Recently, large-scale knowledge bases that utilize ontologies (FreeBase [4], DBpedia [5], and YAGO [6, 7]) have been constructed using structured sources such as Wikipedia infoboxes. • Other approaches (DeepDive [8], Nell2RDF [9], and PROSPERA [10]) crawl the web and use machine learning and natural language processing to build web-scale knowledge graphs.
  11. 11. Problem Description
  12. 12. Knowledge Graph Challenges we are solving Because current knowledge bases / ontology learning systems typically requires explicitly modeling nodes and edges into a graph ahead of time, this unfortunately presents several limitations to the use of such a knowledge graph: • Entities not modeled explicitly as nodes have no known relationships to any other entities. • Edges exist between nodes, but not between arbitrary combinations of nodes, and therefore such a graph is not ideal for representing nuanced meanings of an entity when appearing within different contexts, as is common within natural language. • Substantial meaning is encoded in the linguistic representation of the domain that is lost when the underlying textual representation is not preserved: phrases, interaction of concepts through actions (i.e. verbs), positional ordering of entities and the phrases containing those entities, variations in spelling and other representations of entities, the use of adjectives to modify entities to represent more complex concepts, and aggregate frequencies of occurrence for different representations of entities relative to other representations. • It can be an arduous process to create robust ontologies, map a domain into a graph representing those ontologies, and ensure the generated graph is compact, accurate, comprehensive, and kept up to date.
  13. 13. Knowledge Graph Semantic Data Encoded into Free Text Content e en eng engi engineer engineers engineer engineersNode Type: Term software engineer software engineers electrical engineering engineer engineering software … … … Node Type: Character Sequence Node Type: Term Sequence Node Type: Document id: 1 text: looking for a software engineerwith degree in computer science or electrical engineering id: 2 text: apply to be a software engineer and work with other great software engineers id: 3 text: start a great careerin electrical engineering … …
  14. 14. Model
  15. 15. id: 1 job_title: Software Engineer desc: software engineer at a great company skills: .Net, C#, java id: 2 job_title: Registered Nurse desc: a registered nurse at hospital doing hard work skills: oncology, phlebotemy id: 3 job_title: Java Developer desc: a software engineer or a java engineer doing work skills: java, scala, hibernate field term postings list doc pos desc a 1 4 2 1 3 1, 5 at 1 3 2 4 company 1 6 doing 2 6 3 8 engineer 1 2 3 3, 7 great 1 5 hard 2 7 hospital 2 5 java 3 6 nurse 2 3 or 3 4 registered 2 2 software 1 1 3 2 work 2 10 3 9 job_title java developer 3 1 … … … … field doc term desc 1 a at company engineer great software 2 a at doing hard hospital nurse registered work 3 a doing engineer java or software work job_title 1 Software Engineer … … … Terms-Docs Inverted IndexDocs-Terms Uninverted IndexDocuments Knowledge Graph
  16. 16. Knowledge Graph Set-theory View Graph View How the Graph Traversal Works skill: Java skill: Scala skill: Hibernate skill: Oncology doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 skill: Java skill: Java skill: Scala skill: Hibernate skill: Oncology Data Structure View Java Scala Hibernate docs 1, 2, 6 docs 3, 4 Oncology doc 5
  17. 17. Knowledge Graph Graph Model Structure: Single-level Traversal / Scoring: Multi-level Traversal / Scoring:
  18. 18. Knowledge Graph Multi-level Traversal Data Structure View Graph View doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 skill: Java skill: Java skill: Scala skill: Hibernate skill: Oncology doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 job_title: Software Engineer job_title: Data Scientist job_title: Java Developer …… Inverted Index Lookup Doc Values Index Lookup Doc Values Index Lookup Inverted Index Lookup Java Java Developer Hibernate Scala Software Engineer Data Scientist has_related_job_title has_related_job_title
  19. 19. Knowledge Graph Scoring nodes in the Graph Foreground vs. Background Analysis Every term scored against it’s context. The more commonly the term appears within it’s foreground context versus its background context, the more relevant it is to the specified foreground context. countFG(x) - totalDocsFG * probBG(x) z = -------------------------------------------------------- sqrt(totalDocsFG * probBG(x) * (1 - probBG(x))) { "type":"keywords”, "values":[ { "value":"hive", "relatedness": 0.9765, "popularity":369 }, { "value":”spark", "relatedness": 0.9634, "popularity":15653 }, { "value":".net", "relatedness": 0.5417, "popularity":17683 }, { "value":"bogus_word", "relatedness": 0.0, "popularity":0 }, { "value":"teaching", "relatedness": -0.1510, "popularity":9923 }, { "value":"CPR", "relatedness": -0.4012, "popularity":27089 } ] } + - Foreground Query: "Hadoop"
  20. 20. Source: Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith.“The Semantic Knowledge Graph: A compact, auto-generated model for real-time traversal and ranking of any relationship within a domain”. DSAA 2016. Knowledge Graph Multi-level Graph Traversal with Scores software engineer* (materialized node) Java C# .NET .NET Developer Java Developer Hibernate ScalaVB.NET Software Engineer Data Scientist Skill Nodes has_related_skillStarting Node Skill Nodes has_related_skill Job Title Nodes has_related_job_title 0.90 0.88 0.93 0.93 0.34 0.74 0.91 0.89 0.74 0.89 0.780.72 0.48 0.93 0.76 0.83 0.80 0.64 0.61 0.780.55
  21. 21. Knowledge Graph Materialization of new nodes through shared documents engineer engineers software engineer* (materialized node) engineer* (materialized node) Software doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 links_to links_to
  22. 22. Implementation
  23. 23. Open Sourced!
  24. 24. Knowledge Graph Populating the Graph
  25. 25. Knowledge Graph
  26. 26. Knowledge Graph
  27. 27. Experiments
  28. 28. Knowledge Graph Data Cleansing { "type":"keywords”, "values":[ { "value":"hive", "relatedness": 0.9765, "popularity":369 }, { "value":”spark", "relatedness": 0.9634, "popularity":15653 }, { "value":".net", "relatedness": 0.5417, "popularity":17683 }, { "value":"bogus_word", "relatedness": 0.0, "popularity":0 }, { "value":"teaching", "relatedness": -0.1510, "popularity":9923 }, { "value":"CPR", "relatedness": -0.4012, "popularity":27089 } ] } Foreground Query: "Hadoop" Experiment: Data analyst manually annotated 500 pairs of terms found together in real query logs as “relevant” or “not relevant” Results: SKG removed 78% of the terms while maintaining a 95% accuracy at removing the correct noisy pairs from the input data.
  29. 29. Knowledge Graph Predictive Analytics
  30. 30. Knowledge Graph Search Expansion Experiment: Take an initial query, and expand keyword phrases to include the most related entities to that query Example:
  31. 31. The Semantic Search Problem User’s Query: machine learning research and development Portland, OR software engineer AND hadoop, java Traditional Query Parsing: (machine AND learning AND research AND development AND portland) OR (software AND engineer AND hadoop AND java) Semantic Query Parsing: "machine learning" AND "research and development" AND "Portland, OR" AND "software engineer" AND hadoop AND java Semantically Expanded Query: ("machine learning"^10 OR "data scientist" OR "data mining" OR "artificial intelligence") AND ("research and development"^10 OR "r&d") AND AND ("Portland, OR"^10 OR "Portland, Oregon" OR {!geofilt pt=45.512,-122.676 d=50 sfield=geo}) AND ("software engineer"^10 OR "software developer") AND (hadoop^10 OR "big data" OR hbase OR hive) AND (java^10 OR j2ee)
  32. 32. machine learning Keywords: Search Behavior, Application Behavior, etc. Job Title Classifier, Skills Extractor, Job Level Classifier, etc. Semantic Interpretation keywords:((machine learning)^10 OR { AT_LEAST_2: ("data mining"^0.9, matlab^0.8, "data scientist"^0.75, "artificial intelligence"^0.7, "neural networks"^0.55)) } { BOOST_TO_TOP: ( job_title:( "software engineer" OR "data manager" OR "data scientist" OR "hadoop engineer")) } Modified Query: Related Occupations machine learning: {15-1031.00 .58 Computer Software Engineers, Applications 15-1011.00 .55 Computer and Information Scientists, Research 15-1032.00 .52 Computer Software Engineers, Systems Software } machine learning: { software engineer .65, data manager .3, data scientist .25, hadoop engineer .2, } Common Job Titles Query Expansion Related Phrases machine learning: { data mining .9, matlab .8, data scientist .75, artificial intelligence .7, neural networks .55 } Known keyword phrases java developer machine learning registered nurse FST Knowledge Graph in +
  33. 33. Knowledge Graph Document Summarization Experiment: Pass in raw text (extracting phrases as needed), and rank their similarity to the documents using the SKG. Additionally, can traverse the graph to “related” entities/keyword phrases NOT found in the original document Applications: Content-based and multi-modal recommendations (no cold-start problem), data cleansing prior to clustering or other ML methods, semantic search / similarity scoring
  34. 34. Document Enrichment – Find / Score Relationships
  35. 35. Document Summarization – Rank / Clean Keywords
  36. 36. Knowledge Graph Future Work • Semantic Search (more experiments) • Search Engine Relevancy Algorithms • Trending Topics • Recommendation Systems • Root Cause Analysis • Abuse Detection
  37. 37. Knowledge Graph Conclusion Applications: The Semantic Knowledge Graph has numerous applications, including automatically building ontologies, identification of trending topics over time, predictive analytics on timeseries data, root-cause analysis surfacing concepts related to failure scenarios from free text, data cleansing, document summarization, semantic search interpretation and expansion of queries, recommendation systems, and numerous other forms of anomaly detection. Main contribution of this paper: The introduction (and open sourcing) of the the Semantic Knowledge Graph, a novel and compact new graph model that can dynamically materialize and score the relationships between any arbitrary combination of entities represented within a corpus of documents.
  38. 38. Knowledge Graph References
  39. 39. Contact Info Trey Grainger trey.grainger@lucidworks.com @treygrainger http://solrinaction.com Other presentations: http://www.treygrainger.com

×