SlideShare a Scribd company logo
1 of 20
Download to read offline
IBM Research
© 2014 IBM Corporation
A Scalable Graph Representation of Knowledge Bases
and its Uses for Semantic Document Relatedness
Yosi Mass, Dafna Sheinwald (HRL)
Feng Cao, Yuan Ni, Hai Pei Zhang, Qiongkai Xu (CRL)
© 2014 IBM Corporation
IBM Research
2
Introduction – Knowledge Base
A Knowledge-base (KB) is a representation of a knowledge where -
 Nodes represent entities
 Edges represent relationships between entities
 Nodes and edges may have attributes
Linked Open Data
© 2014 IBM Corporation
IBM Research
The DBPedia Knowledge base
© 2014 IBM Corporation
IBM Research
4
Usage of Knowledge Bases
1. Semantic understanding of a text by mapping phrases to the knowledge base.
2. Helps to find relatedness/similarity between two given texts
In the United Kingdom and Ireland, high school students traditionally do not have 'free
periods' but do have 'break' which normally occurs just after their second lesson of the
day (normally referred to as second period).
 Mentions
 United Kingdom - http://en.wikipedia.org/wiki/United_Kingdom
 Ireland - http://en.wikipedia.org/wiki/Ireland
 high school students - http://en.wikipedia.org/wiki/High_school - note the derivation to "high school
student" and then the re-direct to "High school".
 ‘free periods’ - http://en.wikipedia.org/wiki/Period_(school) - note the disambiguation.
 ‘break’ - http://en.wikipedia.org/wiki/Break_(work) - note the disambiguation.
 lesson - http://en.wikipedia.org/wiki/Lesson
 day - http://en.wikipedia.org/wiki/Day
– period - http://en.wikipedia.org/wiki/Period_(school) - note the disambiguation.
© 2014 IBM Corporation
IBM Research
5
Mention Detection
Graph based Similarity scorers
• Exploits the graph structure to find relationships between pairs of mentions
• Aggregate over all pairs
Facet graph use case - find semantic relatedness between two text
paragraphs
Paragraph 1 Paragraph 2
?
© 2014 IBM Corporation
IBM Research
Outline
• Generation of the Facet Graph from DBPedia
• Mention Detection
• Similarity measures on the FacetGraph
© 2014 IBM Corporation
IBM Research
Outline
• Generation of the Facet Graph from DBPedia
• Mention Detection
• Similarity measures on the FacetGraph
© 2014 IBM Corporation
IBM Research
Titan graph
Hbase
shortest path
similarity scorers
The TinkerPop Stack Usage in a project
Cassandra (planned)
Hadoop
Access the graph
Map reduce code
To generate the graph
Graph stack library
© 2014 IBM Corporation
IBM Research
• Input is given as RDF triples.
• Example
http://dbpedia.org/resource/Yehuda_Vilner,
http://dbpedia.org/ontology/birthPlace
http://dbpedia.org/resource/Israel
• URIs are translated to vertexIDs
• Adding a triple requires:
1. Add the subject and object as nodes (or get their IDs if they are already in the graph)
2. Add the predicate as an edge between the two nodes
This is the
most
expensive
operation
9
Generate the Knowledge Graph from RDF data
subject
object
predicate
Does not scale
to millions of
triples
© 2014 IBM Corporation
IBM Research
A scalable solution using MapReduce
• What is MapReduce?
• Programming model for expressing distributed computations at a massive scale
• Execution framework for organizing and performing such computations
• Open-source implementation called Hadoop
• Programmers specify two functions:
map (k, v) → <k’, v’>*
reduce (k’, v’*) → <k’’, v’’>*
All values with the same key are sent to the same reducer
The execution framework handles everything else…
© 2014 IBM Corporation
IBM Research
mapmap map map
Shuffle and Sort: aggregate values by keys
reduce reduce reduce
k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6
ba 1 2 c c3 6 a c5 2 b c7 8
a 1 5 b 2 7 c 2 3 6 8
r1 s1 r2 s2 r3 s3
MapReduce
© 2014 IBM Corporation
IBM Research
Graph generation using MapReduce
Job 1 – sort by subjects
(S1, P1, O1)
(S2, P2, O2)
(S3, P3, O1)
(S1, P2, O2)
map
S1 (P1, O1)
S2 (P2, O2)
S3 (P3, O1)
S1 (P2, O2)
reduce
Job 2 – add subjects to graph and sort by objects
map
O1 (P1, SID1)
O2 (P2, SID2)
O1 (P3, SID3)
O2 (P2, SID1)
reduce
S1 (P1, O1)
S2 (P2, O2)
S3 (P3, O1)
S1 (P2, O2)
O1 (P1, SID1)
O2 (P2, SID1)
O1 (P3, SID3)
O2 (P2, SID2)
Job 3 – add objects and edges to graph
S1 (P1, O1)
S2 (P2, O2)
S3 (P3, O1)
S1 (P2, O2)
O1 (P1, SID1)
O2 (P2, SID1)
O1 (P3, SID3)
O2 (P2, SID2)
map
SID1
OID1
P1
OID2
P2
SID3 P3
SID2
P2
© 2014 IBM Corporation
IBM Research
• Implementation based on Titan Graph Library With Hbase as the backend
• Runs on a cluster of 3 machines
• Each machine has 16 cores, 2Tb disk and 32Gb mem
13
Facet Graph Architecture
Rexster
Server
Titan graph 1
Hbase
Application REST API
Hadoop cluster
Titan graph n…
© 2014 IBM Corporation
IBM Research
14
Facet Graph performance
• Creation (offline)
• Use three Map-reduce jobs to index DBPedia into Titan
1. First job sorts subjects
2. Second job adds subjects
3. Third job adds objects and edges
• Access (online)
• Implemented as a JAVAAPI that wraps REST API through Rexster server
• Performance on a cluster of 3 machines each with 16 cores, 2Tb disk and 32Gb mem
Graph #Vertices #Edges Creation time Access time
Semantics FG 14M 72M 3h:45m 1 msec to get node
description
2 sec to get 223K inlinks of
an heavy node (USA)
Links FG 19M 152M 7h:18m 4.4 sec to get 447K inlinks
of an heave node (USA)
© 2014 IBM Corporation
IBM Research
Outline
• Generation of the Facet Graph from DBPedia
• Mention Detection
• Similarity measures on the FacetGraph
© 2014 IBM Corporation
IBM Research
16
Mention detection
Input Text
Lexicon
Spotting
candidates
Selection
Disambiguation
Lucene Index
Facet Graph
Spotting stage: recognizes in a sentence the phrases (surface forms) that may indicate a
mention in the KB
Candidate selection stage: given the surface form, retrieves the set of candidate URIs
for disambiguation
Disambiguation stage: uses the context around the spotted phrase to decide on the best
candidate.
Annotated Text
© 2014 IBM Corporation
IBM Research
Outline
• Generation of the Facet Graph from DBPedia
• Mention Detection
• Similarity measures on the FacetGraph
© 2014 IBM Corporation
IBM Research
18
Pairwise Concept similarity based on wikilinks [1]
[1] Milne D., Witten I. H., An Effective, Low-Cost Measure of Semantic Relatedness Obtained from
Wikipedia Links, AAAI, 2008
© 2014 IBM Corporation
IBM Research
Our assets on IBM.next
IBM Confidential14/9/
http://ibmnext.stage1.mybluemix.net/assets
© 2014 IBM Corporation
IBM Research
Thank You

More Related Content

What's hot

Hypergraph Mining For Social Networks
Hypergraph Mining For Social NetworksHypergraph Mining For Social Networks
Hypergraph Mining For Social NetworksGiacomo Bergami
 
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQL
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQLThe DE-9IM Matrix in Details using ST_Relate: In Picture and SQL
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQLtorp42
 
Spatial Indexing
Spatial IndexingSpatial Indexing
Spatial Indexingtorp42
 
Networking assignment 1
Networking assignment 1Networking assignment 1
Networking assignment 1Soham Sengupta
 
Assignment on different types of addressing modes
Assignment on different types of addressing modesAssignment on different types of addressing modes
Assignment on different types of addressing modesNusratJahan263
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingKyong-Ha Lee
 
8.1.4.8 lab identifying i pv4 addresses
8.1.4.8 lab   identifying i pv4 addresses8.1.4.8 lab   identifying i pv4 addresses
8.1.4.8 lab identifying i pv4 addressesRehab El Nagar
 
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMVLSICS Design
 
What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?Gábor Szárnyas
 
Data compression using python draft
Data compression using python draftData compression using python draft
Data compression using python draftAshok Govindarajan
 
F# and Financial Data Making Data Analysis Simple
F# and Financial Data Making Data Analysis SimpleF# and Financial Data Making Data Analysis Simple
F# and Financial Data Making Data Analysis SimpleTomas Petricek
 

What's hot (15)

Hypergraph Mining For Social Networks
Hypergraph Mining For Social NetworksHypergraph Mining For Social Networks
Hypergraph Mining For Social Networks
 
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQL
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQLThe DE-9IM Matrix in Details using ST_Relate: In Picture and SQL
The DE-9IM Matrix in Details using ST_Relate: In Picture and SQL
 
Spatial Indexing
Spatial IndexingSpatial Indexing
Spatial Indexing
 
Networking assignment 1
Networking assignment 1Networking assignment 1
Networking assignment 1
 
Assignment on different types of addressing modes
Assignment on different types of addressing modesAssignment on different types of addressing modes
Assignment on different types of addressing modes
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
 
8.1.4.8 lab identifying i pv4 addresses
8.1.4.8 lab   identifying i pv4 addresses8.1.4.8 lab   identifying i pv4 addresses
8.1.4.8 lab identifying i pv4 addresses
 
Final
FinalFinal
Final
 
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORMDUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
 
What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?What Makes Graph Queries Difficult?
What Makes Graph Queries Difficult?
 
HDF-EOS Vector Data
HDF-EOS Vector DataHDF-EOS Vector Data
HDF-EOS Vector Data
 
Data compression using python draft
Data compression using python draftData compression using python draft
Data compression using python draft
 
F# and Financial Data Making Data Analysis Simple
F# and Financial Data Making Data Analysis SimpleF# and Financial Data Making Data Analysis Simple
F# and Financial Data Making Data Analysis Simple
 
Lo18
Lo18Lo18
Lo18
 
grammer
grammergrammer
grammer
 

Similar to Knowledg graphs yosi mass

Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...Flink Forward
 
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKBig Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKMatt Stubbs
 
1st UIM-GDB - Connections to the Real World
1st UIM-GDB - Connections to the Real World1st UIM-GDB - Connections to the Real World
1st UIM-GDB - Connections to the Real WorldAchim Friedland
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Databricks
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defensemarek_pomocka
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Experiences on Processing Spatial Data with MapReduce ssdbm09
Experiences on Processing Spatial Data with MapReduce ssdbm09Experiences on Processing Spatial Data with MapReduce ssdbm09
Experiences on Processing Spatial Data with MapReduce ssdbm09lghost1201
 
MHM_RS_23_04_13.pptx
MHM_RS_23_04_13.pptxMHM_RS_23_04_13.pptx
MHM_RS_23_04_13.pptxMinHtetMyint1
 
Locally densest subgraph discovery
Locally densest subgraph discoveryLocally densest subgraph discovery
Locally densest subgraph discoveryaftab alam
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...CONUL Conference
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Eiji Sekiya
 
Unifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge GraphUnifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge GraphVaticle
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkVincent Poncet
 

Similar to Knowledg graphs yosi mass (20)

Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...Introducing Arc:  A Common Intermediate Language for Unified Batch and Stream...
Introducing Arc: A Common Intermediate Language for Unified Batch and Stream...
 
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKBig Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
 
1st UIM-GDB - Connections to the Real World
1st UIM-GDB - Connections to the Real World1st UIM-GDB - Connections to the Real World
1st UIM-GDB - Connections to the Real World
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defense
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Experiences on Processing Spatial Data with MapReduce ssdbm09
Experiences on Processing Spatial Data with MapReduce ssdbm09Experiences on Processing Spatial Data with MapReduce ssdbm09
Experiences on Processing Spatial Data with MapReduce ssdbm09
 
MHM_RS_23_04_13.pptx
MHM_RS_23_04_13.pptxMHM_RS_23_04_13.pptx
MHM_RS_23_04_13.pptx
 
Locally densest subgraph discovery
Locally densest subgraph discoveryLocally densest subgraph discovery
Locally densest subgraph discovery
 
Apache Nemo
Apache NemoApache Nemo
Apache Nemo
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
Scala+data
Scala+dataScala+data
Scala+data
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
 
Unifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge GraphUnifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge Graph
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 

More from diannepatricia

Teaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsondiannepatricia
 
Cognitive systems institute talk 8 june 2017 - v.1.0
Cognitive systems institute talk   8 june 2017 - v.1.0Cognitive systems institute talk   8 june 2017 - v.1.0
Cognitive systems institute talk 8 june 2017 - v.1.0diannepatricia
 
Building Compassionate Conversational Systems
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systemsdiannepatricia
 
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”diannepatricia
 
Cognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibilitydiannepatricia
 
Artificial Intellingence in the Car
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Cardiannepatricia
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”diannepatricia
 
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...diannepatricia
 
170330 cognitive systems institute speaker series mark sherman - watson pr...
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...diannepatricia
 
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”diannepatricia
 
Cognitive Assistance for the Aging
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Agingdiannepatricia
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"diannepatricia
 
The Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligencediannepatricia
 
Developing Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognitiondiannepatricia
 
Cyber-Social Learning Systems
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systemsdiannepatricia
 
“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”diannepatricia
 
"Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...diannepatricia
 
Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50diannepatricia
 
KATE - a Platform for Machine Learning
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learningdiannepatricia
 
Cognitive Computing for Aging Society
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Societydiannepatricia
 

More from diannepatricia (20)

Teaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watson
 
Cognitive systems institute talk 8 june 2017 - v.1.0
Cognitive systems institute talk   8 june 2017 - v.1.0Cognitive systems institute talk   8 june 2017 - v.1.0
Cognitive systems institute talk 8 june 2017 - v.1.0
 
Building Compassionate Conversational Systems
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systems
 
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
 
Cognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibility
 
Artificial Intellingence in the Car
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Car
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”
 
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
 
170330 cognitive systems institute speaker series mark sherman - watson pr...
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...
 
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
 
Cognitive Assistance for the Aging
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Aging
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
 
The Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligence
 
Developing Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognition
 
Cyber-Social Learning Systems
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systems
 
“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”
 
"Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...
 
Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50
 
KATE - a Platform for Machine Learning
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learning
 
Cognitive Computing for Aging Society
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Society
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Knowledg graphs yosi mass

  • 1. IBM Research © 2014 IBM Corporation A Scalable Graph Representation of Knowledge Bases and its Uses for Semantic Document Relatedness Yosi Mass, Dafna Sheinwald (HRL) Feng Cao, Yuan Ni, Hai Pei Zhang, Qiongkai Xu (CRL)
  • 2. © 2014 IBM Corporation IBM Research 2 Introduction – Knowledge Base A Knowledge-base (KB) is a representation of a knowledge where -  Nodes represent entities  Edges represent relationships between entities  Nodes and edges may have attributes Linked Open Data
  • 3. © 2014 IBM Corporation IBM Research The DBPedia Knowledge base
  • 4. © 2014 IBM Corporation IBM Research 4 Usage of Knowledge Bases 1. Semantic understanding of a text by mapping phrases to the knowledge base. 2. Helps to find relatedness/similarity between two given texts In the United Kingdom and Ireland, high school students traditionally do not have 'free periods' but do have 'break' which normally occurs just after their second lesson of the day (normally referred to as second period).  Mentions  United Kingdom - http://en.wikipedia.org/wiki/United_Kingdom  Ireland - http://en.wikipedia.org/wiki/Ireland  high school students - http://en.wikipedia.org/wiki/High_school - note the derivation to "high school student" and then the re-direct to "High school".  ‘free periods’ - http://en.wikipedia.org/wiki/Period_(school) - note the disambiguation.  ‘break’ - http://en.wikipedia.org/wiki/Break_(work) - note the disambiguation.  lesson - http://en.wikipedia.org/wiki/Lesson  day - http://en.wikipedia.org/wiki/Day – period - http://en.wikipedia.org/wiki/Period_(school) - note the disambiguation.
  • 5. © 2014 IBM Corporation IBM Research 5 Mention Detection Graph based Similarity scorers • Exploits the graph structure to find relationships between pairs of mentions • Aggregate over all pairs Facet graph use case - find semantic relatedness between two text paragraphs Paragraph 1 Paragraph 2 ?
  • 6. © 2014 IBM Corporation IBM Research Outline • Generation of the Facet Graph from DBPedia • Mention Detection • Similarity measures on the FacetGraph
  • 7. © 2014 IBM Corporation IBM Research Outline • Generation of the Facet Graph from DBPedia • Mention Detection • Similarity measures on the FacetGraph
  • 8. © 2014 IBM Corporation IBM Research Titan graph Hbase shortest path similarity scorers The TinkerPop Stack Usage in a project Cassandra (planned) Hadoop Access the graph Map reduce code To generate the graph Graph stack library
  • 9. © 2014 IBM Corporation IBM Research • Input is given as RDF triples. • Example http://dbpedia.org/resource/Yehuda_Vilner, http://dbpedia.org/ontology/birthPlace http://dbpedia.org/resource/Israel • URIs are translated to vertexIDs • Adding a triple requires: 1. Add the subject and object as nodes (or get their IDs if they are already in the graph) 2. Add the predicate as an edge between the two nodes This is the most expensive operation 9 Generate the Knowledge Graph from RDF data subject object predicate Does not scale to millions of triples
  • 10. © 2014 IBM Corporation IBM Research A scalable solution using MapReduce • What is MapReduce? • Programming model for expressing distributed computations at a massive scale • Execution framework for organizing and performing such computations • Open-source implementation called Hadoop • Programmers specify two functions: map (k, v) → <k’, v’>* reduce (k’, v’*) → <k’’, v’’>* All values with the same key are sent to the same reducer The execution framework handles everything else…
  • 11. © 2014 IBM Corporation IBM Research mapmap map map Shuffle and Sort: aggregate values by keys reduce reduce reduce k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6 ba 1 2 c c3 6 a c5 2 b c7 8 a 1 5 b 2 7 c 2 3 6 8 r1 s1 r2 s2 r3 s3 MapReduce
  • 12. © 2014 IBM Corporation IBM Research Graph generation using MapReduce Job 1 – sort by subjects (S1, P1, O1) (S2, P2, O2) (S3, P3, O1) (S1, P2, O2) map S1 (P1, O1) S2 (P2, O2) S3 (P3, O1) S1 (P2, O2) reduce Job 2 – add subjects to graph and sort by objects map O1 (P1, SID1) O2 (P2, SID2) O1 (P3, SID3) O2 (P2, SID1) reduce S1 (P1, O1) S2 (P2, O2) S3 (P3, O1) S1 (P2, O2) O1 (P1, SID1) O2 (P2, SID1) O1 (P3, SID3) O2 (P2, SID2) Job 3 – add objects and edges to graph S1 (P1, O1) S2 (P2, O2) S3 (P3, O1) S1 (P2, O2) O1 (P1, SID1) O2 (P2, SID1) O1 (P3, SID3) O2 (P2, SID2) map SID1 OID1 P1 OID2 P2 SID3 P3 SID2 P2
  • 13. © 2014 IBM Corporation IBM Research • Implementation based on Titan Graph Library With Hbase as the backend • Runs on a cluster of 3 machines • Each machine has 16 cores, 2Tb disk and 32Gb mem 13 Facet Graph Architecture Rexster Server Titan graph 1 Hbase Application REST API Hadoop cluster Titan graph n…
  • 14. © 2014 IBM Corporation IBM Research 14 Facet Graph performance • Creation (offline) • Use three Map-reduce jobs to index DBPedia into Titan 1. First job sorts subjects 2. Second job adds subjects 3. Third job adds objects and edges • Access (online) • Implemented as a JAVAAPI that wraps REST API through Rexster server • Performance on a cluster of 3 machines each with 16 cores, 2Tb disk and 32Gb mem Graph #Vertices #Edges Creation time Access time Semantics FG 14M 72M 3h:45m 1 msec to get node description 2 sec to get 223K inlinks of an heavy node (USA) Links FG 19M 152M 7h:18m 4.4 sec to get 447K inlinks of an heave node (USA)
  • 15. © 2014 IBM Corporation IBM Research Outline • Generation of the Facet Graph from DBPedia • Mention Detection • Similarity measures on the FacetGraph
  • 16. © 2014 IBM Corporation IBM Research 16 Mention detection Input Text Lexicon Spotting candidates Selection Disambiguation Lucene Index Facet Graph Spotting stage: recognizes in a sentence the phrases (surface forms) that may indicate a mention in the KB Candidate selection stage: given the surface form, retrieves the set of candidate URIs for disambiguation Disambiguation stage: uses the context around the spotted phrase to decide on the best candidate. Annotated Text
  • 17. © 2014 IBM Corporation IBM Research Outline • Generation of the Facet Graph from DBPedia • Mention Detection • Similarity measures on the FacetGraph
  • 18. © 2014 IBM Corporation IBM Research 18 Pairwise Concept similarity based on wikilinks [1] [1] Milne D., Witten I. H., An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links, AAAI, 2008
  • 19. © 2014 IBM Corporation IBM Research Our assets on IBM.next IBM Confidential14/9/ http://ibmnext.stage1.mybluemix.net/assets
  • 20. © 2014 IBM Corporation IBM Research Thank You