SlideShare a Scribd company logo
1 of 68
From Artwork to Cyber Attacks: Lessons
Learned in Building Knowledge Graphs
using Semantic Web Technologies
Craig Knoblock
USC Information Sciences Institute
U.S. Semantic Technologies Symposium
March 1, 2018
Center on Knowledge Graphs: People
2
Center on Knowledge Graphs: People (cont.)
3
Center on Knowledge Graphs: Projects
4Center on Knowledge GraphsUSC Information Sciences Institute
Goal: Building Knowledge Graphs
raw  messy  disconnected clean  organized  linked
hard to query, analyze & visualize easy to query, analyze & visualize
5Center on Knowledge GraphsUSC Information Sciences Institute
Questions Addressed in this Talk
1. Where should the Semantic Web data come from?
• Triplestores? Linked data? Schema.org?
2. What is the “best” representation of the data in a knowledge graph?
• Very detailed domain-specific ontologies?
3. How should we deal with incomplete and incorrect information
• Manual curation? Automated data cleaning?
4. How do we organize and store the data for efficient access?
• RDF? Triplestore?
6Center on Knowledge GraphsUSC Information Sciences Institute
Steps To Build a KG
Crawling Extraction
DataAcquisition
Mapping To
Ontology
Entity Linking
&Similarity
Knowledge Graph
Deployment
Query &
Visualization
Elastic
Search
Graph
DB
schema.org geonames
Data
Acquisition
Feature
Extraction
Feature
Alignment
Entity
Resolution
Graph
Construction
User
Interface
7Center on Knowledge GraphsUSC Information Sciences Institute
Feature
Extraction
Illegal Arms Sales
• 100s of web sites
• ATF wants to find people buying
and selling across state lines
• Challenge: extract and align the
data across sites
USC Information Sciences Institute Center on Knowledge Graphs 8
Extraction
9
Structured Extraction
10
Automated Extraction
[Minton et al., Inferlink]
Input: A Pile of Pages
11Center on Knowledge GraphsUSC Information Sciences Institute
Automated Extraction
input:
a pile of pages
Classify by
Templates
pages clustered
by template
12Center on Knowledge GraphsUSC Information Sciences Institute
Automated Extraction
input:
a pile of pages
Classify by
Templates
pages clustered
by template
Infer
Extractor
Infer
Extractor
Infer
Extractor
Infer
Extractor
extractor
13Center on Knowledge GraphsUSC Information Sciences Institute
Unsupervised Extraction Tool
14
Extraction Evaluation
Title Desc Seller Date Price Loc Cat
Member
Since
Expires Views ID
Perfect 1.0
(50/50)
.76
(37/49)
.95
(40/42)
.83
(40/48)
.87
(39/45)
.51
(23/45)
.68
(34/50)
1.0
(35/35)
.52
(15/29)
.76
(19/25)
.97
(35/36)
Including
partial
and extra
data
1.0
(50/50)
.98
(48/49)
.95
(40/42)
.83
(40/48)
.98
(44/45)
.84
(38/45)
.88
(44/50)
1.0
(35/35)
.55
(16/29)
1.0
(25/25)
1.0
(36/36)
10 websites, 5 pages each
fields
15Center on Knowledge GraphsUSC Information Sciences Institute
Steps To Build a KG
Crawling Extraction
DataAcquisition
Mapping To
Ontology
Entity Linking
&Similarity
Knowledge Graph
Deployment
Query &
Visualization
Elastic
Search
Graph
DB
schema.org geonames
Data
Acquisition
Feature
Extraction
Feature
Alignment
Entity
Resolution
Graph
Construction
User
Interface
16Center on Knowledge GraphsUSC Information Sciences Institute
Knowledge Graph for Predicting Cyber Attacks
Elastic
Search
Cyber
Domain OntologyBlogs
Twitter
Conferences
CPEs
Darkweb
marketplaces
News
CVEs
Darkweb
Forums
Abuse.ch Karma
Model
Model
Microsoft
Bulletins
17Center on Knowledge GraphsUSC Information Sciences Institute
Cyber Domain Ontology
18
28 Classes
97 Properties
Based on Schema.org
Karma: Mapping Data to Ontologies
Services
Relational
Sources
Karma
{ JSON-LD }
Hierarchical
Sources
Cyber Ontology
19
[ Knoblock, Szekely, et al. ISWC 2012 ]
USC Information Sciences Institute
Map Source to Domain Ontology
Domain Ontology
Source
20
object property
data property
Software
Vulnerability
Topic
name
version
author
hasVulnerability
name
description
name
isTopicOf
PostisVulnerabilityOf
location
mentions
datePublished
topic
hasTopic
username
Person
isAuthorOf
Semantic Model: maps
source to domain
ontology
Column 1 Column
2
Column 3 Column 4 Column 5
Bro can you give me a.. English windows xp sp3 CVE-2016-1052 303828
… ‫أنا‬‫جربت‬‫البرنامج‬‫وعمل‬‫ع‬ Arabic jp2_cdef_destroy 147075
salve a tutti, ultimamento … Italian cve-2012-4969 execcommand vuln cve-2012-4969 107075
USC Information Sciences Institute Center on Knowledge Graphs
Semantic Types
Post Topic Vulnerabilit
y
Person
text language
name
userId
name
Post
21
Column 1 Column
2
Column 3 Column 4 Column 5
Bro can you give me a.. English windows xp sp3 CVE-2016-1052 303828
… ‫أنا‬‫جربت‬‫البرنامج‬‫وعمل‬‫ع‬ Arabic jp2_cdef_destroy 147075
salve a tutti, ultimamento … Italian cve-2012-4969 execcommand vuln cve-2012-4969 107075
USC Information Sciences Institute Center on Knowledge Graphs
Relationships
Post
Topic
Vulnerability
Person
text language
mentions
hasTopic
author
name
userId
name
22
Column 1 Column
2
Column 3 Column 4 Column 5
Bro can you give me a.. English windows xp sp3 CVE-2016-1052 303828
… ‫أنا‬‫جربت‬‫البرنامج‬‫وعمل‬‫ع‬ Arabic jp2_cdef_destroy 147075
salve a tutti, ultimamento … Italian cve-2012-4969 execcommand vuln cve-2012-4969 107075
USC Information Sciences Institute Center on Knowledge Graphs
Cyber KG Dashboard
23Center on Knowledge GraphsUSC Information Sciences Institute
Karma Learns the Source Models
Taheriyan et al., ISWC 2013, ICSC 2014
Domain Ontology
Learn
Semantic Types
Sample Data
Construct a Graph
Generate
Candidate Models
Rank Results
Known Semantic
Models
24Center on Knowledge GraphsUSC Information Sciences Institute
Learning Semantic Types
Requirements:
Learn from a small number of examples
Distinguish both string and numeric values
Can be learned quickly and is highly scalable to large numbers
of semantic types
Person OrganizationCity State
name birthdate name namename
Person
name date city state workplace
1 Fred Collins Oct 1959 Seattle WA Microsoft
2 Tina Peterson May 1980 New York NY Google
Domain Ontology
25Center on Knowledge GraphsUSC Information Sciences Institute
Training machine learning model
[Pham et al., ISWC 2016]
26
Predicting new attribute
27
Construct a Graph
Construct a graph from semantic types and ontology
date
28USC Information Sciences Institute
Determine Relationships
Select minimal tree that connects all semantic types
A customized Steiner tree algorithm [Kou & Markowsky, 1981]
Initial Model
date 29USC Information Sciences Institute
Refining the Model
Correct Model
Impose constraints on Steiner Tree Algorithm
30Center on Knowledge GraphsUSC Information Sciences Institute
Knowledge
Graphs
Karma uses semantic models to create knowledge graphs
Karma semi-automatically builds
semantic models
31USC Information Sciences Institute Center on Knowledge Graphs
American Art
Collaborative
• Consortium of 14 American art
museums
• Explore the use of Linked Data for
research, education, and outreach
• Build 5* Linked Data for the museums
• Create tools to support the construction of Linked Data
32Center on Knowledge GraphsUSC Information Sciences Institute
[Knoblock et al., ISWC 2017]
Example Model of Actor for Amon Carter
33Center on Knowledge GraphsUSC Information Sciences Institute
Complete Model of Actor for Amon Carter
34Center on Knowledge GraphsUSC Information Sciences Institute
AAC Data Statistics
35Center on Knowledge GraphsUSC Information Sciences Institute
AAC Target Mappings
36
AAC Mapping Validator
37
Statistics on What Was Mapped
38Center on Knowledge GraphsUSC Information Sciences Institute
Steps To Build a KG
Crawling Extraction
DataAcquisition
Mapping To
Ontology
Entity Linking
&Similarity
Knowledge Graph
Deployment
Query &
Visualization
Elastic
Search
Graph
DB
schema.org geonames
Data
Acquisition
Feature
Extraction
Feature
Alignment
Entity
Resolution
Graph
Construction
User
Interface
39Center on Knowledge GraphsUSC Information Sciences Institute
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
Product
4
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
price description
manufacturerproduct
Multi-Type Graph 40
Collective Entity Resolution
[Zhu et al, ISWC’16]
Identifying and linking instances of the same real world entity
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
Product
4
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
price description
manufacturerproduct
Multi-Type Graph
Collective Entity Resolution
[Zhu et al, ISWC’16]
Identifying and linking instances of the same real world entity
41
Common Approach:
Pairwise Comparisons
Product 5 299
Quiet Comfort 25 Noise Cancelling
Headphone
Bose
Electronic
299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4
599 Dish WasherBoschProduct 3
292 Premium Noise Cancelling HeadphonesSonyProduct 2
Noise Cancelling HeadphonesSonyProduct 1
Price TitleManufacturer
Jaro
0.5
distance
0.2
Jaccard
0.3
Acceptance Threshold: 0.8 42USC Information Sciences Institute
Graph Summarization:
Original Graph
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
Product
4
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
price description
manufacturerproduct
43Center on Knowledge Graphs
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
Product
4
Similar Nodes simt(x, y)
44
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
Product
4
Graph Sumarization:
Super-Nodes
45
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
Product
4
Super-Links
46
Quiet Comfort
25 Noise
Cancelling
Headphone
Bose
Electroni
c
Product
1
Noise
Cancelling
Headphones
Product
2
292
Premium
Noise
Cancelling
Headphones
Son
y
Product
3
599
Dish Washer
Bosch
229
Bose Noise
Cancelling
Headphones
Bos
e
Product
5
299
Product
4
Super-Links
47Center on Knowledge GraphsUSC Information Sciences Institute
Bose
Electroni
c
Product
3
Bosch
Bos
e
Product
5
Product
4
Predict Links In Original Graph
48Center on Knowledge GraphsUSC Information Sciences Institute
Bose
Electroni
c
Product
3
Bosch
Bos
e
Product
5
Product
4
Predict Links In Original Graph
Bose
Electroni
c
Product
3
Bosch
Bos
e
Product
5
Product
4
49USC Information Sciences Institute
Predict Links In Original Graph
Bose
Electroni
c
Product
3
Bosch
Bos
e
Product
5
Product
4
50Center on Knowledge GraphsUSC Information Sciences Institute
Re-Clustering Improves Reconstruction Quality
Bose
Electroni
c
Product
3
Bosch
Bos
e
Product
5
Product
4
Bose
Electroni
c
Product
3
Bosch
Bos
e
Product
5
Product
4
51USC Information Sciences Institute
Quality Comparison
Precision Recall F-measure
Author Paper Product Author Paper Product Author Paper Product
Limes-F 0.958 0.827 0.446 0.864 0.761 0.16 0.909 0.792 0.236
Silk-F 0.846 0.877 0.459 0.986 0.756 0.348 0.91 0.812 0.395
Gsum 0.727 0.668 0.01 0.569 0.624 0.587 0.638 0.645 0.02
CoSum-B 0.993 0.871 0.58 0.94 0.611 0.477 0.966 0.718 0.524
Limes-MO 0.912 0.827 0.446 0.944 0.761 0.16 0.928 0.792 0.236
Silk-MO 0.932 0.877 0.459 0.958 0.756 0.348 0.945 0.812 0.395
Serf 0.985 0.837 0.436 0.687 0.808 0.186 0.809 0.822 0.261
CoSum-P 0.999 0.771 0.639 0.997 0.997 0.695 0.998 0.87 0.666
Commercial 0.615 0.63 0.622
AuthorLDA 0.995
52
Steps To Build a KG
Crawling Extraction
DataAcquisition
Mapping To
Ontology
Entity Linking
&Similarity
Knowledge Graph
Deployment
Query &
Visualization
Elastic
Search
Graph
DB
schema.org geonames
Data
Acquisition
Feature
Extraction
Feature
Alignment
Entity
Resolution
Graph
Construction
User
Interface
53Center on Knowledge GraphsUSC Information Sciences Institute
Counter Human Trafficking
54Center on Knowledge GraphsUSC Information Sciences Institute
DIG for Counter Human Trafficking
Find the locations where a potential
victim was advertised
Successfully deployed and used to find
victims and prosecute traffickers
Graph Construction
assembling the data for efficient query & analysis
- Data represented in JSON-LD
- Stored in ElasticSearch
• Cloud-based search engine based on Apache Lucene
• Horizontal scaling, replication, load balancing
• Queries are fast!
• Everything is a document
- bulk loading: massive data imports (> 100M web pages)
- real-time updates: live, changing data (~5,000 pages/hour)
58Center on Knowledge GraphsUSC Information Sciences Institute
Adult
Service
Offer Person
Efficient indexing and query
Phone
Web
Page
ElasticSearch Data Model
59Center on Knowledge GraphsUSC Information Sciences Institute
Indexing for High Performance
Knowledge Graph Queries
Avg. Query Times in Milliseconds
Single User Query Load
1.2 billion triples
State of the Art Graph Database (RDF)
DIG indexing deployed in ElasticSearch
60Center on Knowledge GraphsUSC Information Sciences Institute
• Index time for 16 million documents ~2.5 Hours
• Query times:
• Average Query time for Keyword searches: 8 msec
• Find a specific CVE: 14 msec
• Get all mentions of a MS Bulletin in all sources: 48 msec
• Get all Malware named ‘Locky’ and sort results by observedDate: 68 msec
• Get all blogs mentioning keyword ‘microsoft’ with a date range: 98 msec
• Aggregate and give document counts for each publisher/sensor: 409 msec
61
Knowledge Graph Performance
in Cyber Domain
USC Information Sciences Institute Center on Knowledge Graphs
Questions Addressed in This Talk
1. Where should the Semantic Web data come from?
• Triplestores? Linked data? Schema.org?
2. What is the best representation of the data in a knowledge graph?
• Do we want to use the most detailed ontology possible
3. How should we deal with missing and incomplete information
• Manual curation? Automated data cleaning?
4. How do we organize and store the data for efficient access?
• RDF? Triplestore?
Questions Addressed in This Talk
Lessons Learned
62Center on Knowledge GraphsUSC Information Sciences Institute
Questions Addressed in This Talk
1. Where should the Semantic Web data come from?
• Triplestores? Linked data? Schema.org?
The Web!
Waiting for the rest of the world to adopt the Semantic Web and
provide the data in RDF is an approach doomed to failure!
Questions Addressed in This Talk
Lessons Learned
63Center on Knowledge GraphsUSC Information Sciences Institute
Questions Addressed in This Talk
1. Where should the Semantic Web data come from?
• Triplestores? Linked data? Schema.org?
2. What is the “best” representation of the data in a knowledge graph?
• Do we want to use the most detailed ontology possible
The simplest one you need for the problem you are trying to solve
Overly complicated ontologies that attempt to be comprehensive for a
domain, get in the way of solving the real problems
Questions Addressed in This Talk
Lessons Learned
64Center on Knowledge GraphsUSC Information Sciences Institute
Questions Addressed in This Talk
1. Where should the Semantic Web data come from?
• Triplestores? Linked data? Schema.org?
2. What is the best representation of the data in a knowledge graph?
• Carefully curated domain-specific ontologies?
3. How should we deal with missing and incorrect information
• Manual curation? Automated data cleaning?
Clean where possible, but need techniques that can face these problems
The world is a messy place and the ability to deal with it allows us to solve
real-world problems
Questions Addressed in This Talk
Lessons Learned
65Center on Knowledge GraphsUSC Information Sciences Institute
Questions Addressed in This Talk
1. Where should the Semantic Web data come from?
• Triplestores? Linked data? Schema.org?
2. What is the best representation of the data in a knowledge graph?
• Carefully curated domain-specific ontologies?
3. How should we deal with missing and incomplete information
• Manual curation? Automated data cleaning?
4. How do we organize and store the data for efficient access?
• RDF? Triplestore?
In whatever datastore best meets the goals of the problem!
It is a mistake to equate the Semantic Web with triples and triplestores.
Questions Addressed in This Talk
Lessons Learned
66Center on Knowledge GraphsUSC Information Sciences Institute
Important Directions for Future Research
1. Techniques for extracting data from the online sources
2. Approaches to quickly build, refine, and extend ontologies
to solve specific problems
3. Methods for semantically annotating data from extracted
sources
4. Scalable and configurable techniques for entity resolution
5. Highly scalable algorithms for querying and reasoning
6. Ability to publish and query semantic data on web pages
67Center on Knowledge GraphsUSC Information Sciences Institute
Thanks! 68

More Related Content

What's hot

GraphConnect SF 2013 Keynote
GraphConnect SF 2013 KeynoteGraphConnect SF 2013 Keynote
GraphConnect SF 2013 KeynoteEmil Eifrem
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphsStefan Dietze
 
Building Knowledge Graphs in DIG
Building Knowledge Graphs in DIGBuilding Knowledge Graphs in DIG
Building Knowledge Graphs in DIGPalak Modi
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsUniversity of Washington
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰台灣資料科學年會
 
Scientific 3D Printing with GRASS GIS (FOSSGIS 2014)
Scientific 3D Printing with GRASS GIS (FOSSGIS 2014)Scientific 3D Printing with GRASS GIS (FOSSGIS 2014)
Scientific 3D Printing with GRASS GIS (FOSSGIS 2014)Peter Löwe
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Sören Auer
 
Using a graph database for analyzing your Liferay data
Using a graph database for analyzing your Liferay dataUsing a graph database for analyzing your Liferay data
Using a graph database for analyzing your Liferay dataMáté Thurzó
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Fernando de Assis Rodrigues
 
袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战hdhappy001
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmIOSR Journals
 
AI-based re-identification of behavioral data
AI-based re-identification of behavioral dataAI-based re-identification of behavioral data
AI-based re-identification of behavioral dataMOSTLY AI
 
Application Modeling with Graph Databases - Relationships are cool
Application Modeling with Graph Databases - Relationships are coolApplication Modeling with Graph Databases - Relationships are cool
Application Modeling with Graph Databases - Relationships are coolLars Martin
 
HLG Big Data project and Sandbox
HLG Big Data project and SandboxHLG Big Data project and Sandbox
HLG Big Data project and SandboxCarlo Vaccari
 
Combining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linkingCombining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linkingBesnik Fetahu
 

What's hot (20)

GraphConnect SF 2013 Keynote
GraphConnect SF 2013 KeynoteGraphConnect SF 2013 Keynote
GraphConnect SF 2013 Keynote
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphs
 
Building Knowledge Graphs in DIG
Building Knowledge Graphs in DIGBuilding Knowledge Graphs in DIG
Building Knowledge Graphs in DIG
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD Models
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰
 
Scientific 3D Printing with GRASS GIS (FOSSGIS 2014)
Scientific 3D Printing with GRASS GIS (FOSSGIS 2014)Scientific 3D Printing with GRASS GIS (FOSSGIS 2014)
Scientific 3D Printing with GRASS GIS (FOSSGIS 2014)
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...
 
Using a graph database for analyzing your Liferay data
Using a graph database for analyzing your Liferay dataUsing a graph database for analyzing your Liferay data
Using a graph database for analyzing your Liferay data
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...
 
袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战
 
A Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient AlgorithmA Trinity Construction for Web Extraction Using Efficient Algorithm
A Trinity Construction for Web Extraction Using Efficient Algorithm
 
AI-based re-identification of behavioral data
AI-based re-identification of behavioral dataAI-based re-identification of behavioral data
AI-based re-identification of behavioral data
 
Application Modeling with Graph Databases - Relationships are cool
Application Modeling with Graph Databases - Relationships are coolApplication Modeling with Graph Databases - Relationships are cool
Application Modeling with Graph Databases - Relationships are cool
 
HLG Big Data project and Sandbox
HLG Big Data project and SandboxHLG Big Data project and Sandbox
HLG Big Data project and Sandbox
 
Combining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linkingCombining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linking
 

Similar to From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs using Semantic Web Technologies

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsMarcel Kurovski
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systemsinovex GmbH
 
Applied AI Workshop - Presentation - Connect Day GDL
Applied AI Workshop - Presentation - Connect Day GDLApplied AI Workshop - Presentation - Connect Day GDL
Applied AI Workshop - Presentation - Connect Day GDLMarc Teunis
 
Data Visualization for Big Data: Experience from the Front Line
Data Visualization for Big Data: Experience from the Front LineData Visualization for Big Data: Experience from the Front Line
Data Visualization for Big Data: Experience from the Front LineRosa Romero Gómez, PhD
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMateusz Dymczyk
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석datasciencekorea
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)Zenodia Charpy
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data ExtractionDasha Herrmannova
 
Synergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringSynergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringTao Xie
 
Data Science with Azure Machine Learning and  R
Data Science with  Azure Machine Learning and  RData Science with  Azure Machine Learning and  R
Data Science with Azure Machine Learning and  RChristos Charmatzis
 
KDD, Data Mining, Data Science_I.pptx
KDD, Data Mining, Data Science_I.pptxKDD, Data Mining, Data Science_I.pptx
KDD, Data Mining, Data Science_I.pptxYogeshGairola2
 
Graphing Grifters: Identify & Display Patterns of Corruption With Oracle Graph
Graphing Grifters: Identify & Display Patterns of Corruption With Oracle GraphGraphing Grifters: Identify & Display Patterns of Corruption With Oracle Graph
Graphing Grifters: Identify & Display Patterns of Corruption With Oracle GraphJim Czuprynski
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)Tao Xie
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Main principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningMain principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningNikolay Karelin
 

Similar to From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs using Semantic Web Technologies (20)

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Applied AI Workshop - Presentation - Connect Day GDL
Applied AI Workshop - Presentation - Connect Day GDLApplied AI Workshop - Presentation - Connect Day GDL
Applied AI Workshop - Presentation - Connect Day GDL
 
Data Visualization for Big Data: Experience from the Front Line
Data Visualization for Big Data: Experience from the Front LineData Visualization for Big Data: Experience from the Front Line
Data Visualization for Big Data: Experience from the Front Line
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) Developers
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Synergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringSynergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software Engineering
 
20181212 ibm aot
20181212 ibm aot20181212 ibm aot
20181212 ibm aot
 
Data Science with Azure Machine Learning and  R
Data Science with  Azure Machine Learning and  RData Science with  Azure Machine Learning and  R
Data Science with Azure Machine Learning and  R
 
KDD, Data Mining, Data Science_I.pptx
KDD, Data Mining, Data Science_I.pptxKDD, Data Mining, Data Science_I.pptx
KDD, Data Mining, Data Science_I.pptx
 
Graphing Grifters: Identify & Display Patterns of Corruption With Oracle Graph
Graphing Grifters: Identify & Display Patterns of Corruption With Oracle GraphGraphing Grifters: Identify & Display Patterns of Corruption With Oracle Graph
Graphing Grifters: Identify & Display Patterns of Corruption With Oracle Graph
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
 
Main principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningMain principles of Data Science and Machine Learning
Main principles of Data Science and Machine Learning
 

More from Craig Knoblock

Learning to Adapt to Sensor Changes and Failures
Learning to Adapt to Sensor Changes and FailuresLearning to Adapt to Sensor Changes and Failures
Learning to Adapt to Sensor Changes and FailuresCraig Knoblock
 
Lessons Learned in Building Linked Data for the American Art Collaborative
Lessons Learned in Building Linked Data for the American Art CollaborativeLessons Learned in Building Linked Data for the American Art Collaborative
Lessons Learned in Building Linked Data for the American Art CollaborativeCraig Knoblock
 
Assigning semantic labels to data sources
Assigning semantic labels to data sourcesAssigning semantic labels to data sources
Assigning semantic labels to data sourcesCraig Knoblock
 
A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...Craig Knoblock
 
From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
From Virtual Museums to Peacebuilding: Creating and Using Linked KnowledgeFrom Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
From Virtual Museums to Peacebuilding: Creating and Using Linked KnowledgeCraig Knoblock
 
Semantics for Big Data Integration and Analysis
Semantics for Big Data Integration and AnalysisSemantics for Big Data Integration and Analysis
Semantics for Big Data Integration and AnalysisCraig Knoblock
 
A Semantic Approach to Retrieving, Linking, and Integrating Heterogeneous Ge...
A Semantic Approach to Retrieving, Linking, and  Integrating Heterogeneous Ge...A Semantic Approach to Retrieving, Linking, and  Integrating Heterogeneous Ge...
A Semantic Approach to Retrieving, Linking, and Integrating Heterogeneous Ge...Craig Knoblock
 
Discovering Alignments in Ontologies of Linked Data
Discovering Alignments in Ontologies of Linked DataDiscovering Alignments in Ontologies of Linked Data
Discovering Alignments in Ontologies of Linked DataCraig Knoblock
 

More from Craig Knoblock (8)

Learning to Adapt to Sensor Changes and Failures
Learning to Adapt to Sensor Changes and FailuresLearning to Adapt to Sensor Changes and Failures
Learning to Adapt to Sensor Changes and Failures
 
Lessons Learned in Building Linked Data for the American Art Collaborative
Lessons Learned in Building Linked Data for the American Art CollaborativeLessons Learned in Building Linked Data for the American Art Collaborative
Lessons Learned in Building Linked Data for the American Art Collaborative
 
Assigning semantic labels to data sources
Assigning semantic labels to data sourcesAssigning semantic labels to data sources
Assigning semantic labels to data sources
 
A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...
 
From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
From Virtual Museums to Peacebuilding: Creating and Using Linked KnowledgeFrom Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
From Virtual Museums to Peacebuilding: Creating and Using Linked Knowledge
 
Semantics for Big Data Integration and Analysis
Semantics for Big Data Integration and AnalysisSemantics for Big Data Integration and Analysis
Semantics for Big Data Integration and Analysis
 
A Semantic Approach to Retrieving, Linking, and Integrating Heterogeneous Ge...
A Semantic Approach to Retrieving, Linking, and  Integrating Heterogeneous Ge...A Semantic Approach to Retrieving, Linking, and  Integrating Heterogeneous Ge...
A Semantic Approach to Retrieving, Linking, and Integrating Heterogeneous Ge...
 
Discovering Alignments in Ontologies of Linked Data
Discovering Alignments in Ontologies of Linked DataDiscovering Alignments in Ontologies of Linked Data
Discovering Alignments in Ontologies of Linked Data
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs using Semantic Web Technologies

  • 1. From Artwork to Cyber Attacks: Lessons Learned in Building Knowledge Graphs using Semantic Web Technologies Craig Knoblock USC Information Sciences Institute U.S. Semantic Technologies Symposium March 1, 2018
  • 2. Center on Knowledge Graphs: People 2
  • 3. Center on Knowledge Graphs: People (cont.) 3
  • 4. Center on Knowledge Graphs: Projects 4Center on Knowledge GraphsUSC Information Sciences Institute
  • 5. Goal: Building Knowledge Graphs raw  messy  disconnected clean  organized  linked hard to query, analyze & visualize easy to query, analyze & visualize 5Center on Knowledge GraphsUSC Information Sciences Institute
  • 6. Questions Addressed in this Talk 1. Where should the Semantic Web data come from? • Triplestores? Linked data? Schema.org? 2. What is the “best” representation of the data in a knowledge graph? • Very detailed domain-specific ontologies? 3. How should we deal with incomplete and incorrect information • Manual curation? Automated data cleaning? 4. How do we organize and store the data for efficient access? • RDF? Triplestore? 6Center on Knowledge GraphsUSC Information Sciences Institute
  • 7. Steps To Build a KG Crawling Extraction DataAcquisition Mapping To Ontology Entity Linking &Similarity Knowledge Graph Deployment Query & Visualization Elastic Search Graph DB schema.org geonames Data Acquisition Feature Extraction Feature Alignment Entity Resolution Graph Construction User Interface 7Center on Knowledge GraphsUSC Information Sciences Institute Feature Extraction
  • 8. Illegal Arms Sales • 100s of web sites • ATF wants to find people buying and selling across state lines • Challenge: extract and align the data across sites USC Information Sciences Institute Center on Knowledge Graphs 8
  • 11. Automated Extraction [Minton et al., Inferlink] Input: A Pile of Pages 11Center on Knowledge GraphsUSC Information Sciences Institute
  • 12. Automated Extraction input: a pile of pages Classify by Templates pages clustered by template 12Center on Knowledge GraphsUSC Information Sciences Institute
  • 13. Automated Extraction input: a pile of pages Classify by Templates pages clustered by template Infer Extractor Infer Extractor Infer Extractor Infer Extractor extractor 13Center on Knowledge GraphsUSC Information Sciences Institute
  • 15. Extraction Evaluation Title Desc Seller Date Price Loc Cat Member Since Expires Views ID Perfect 1.0 (50/50) .76 (37/49) .95 (40/42) .83 (40/48) .87 (39/45) .51 (23/45) .68 (34/50) 1.0 (35/35) .52 (15/29) .76 (19/25) .97 (35/36) Including partial and extra data 1.0 (50/50) .98 (48/49) .95 (40/42) .83 (40/48) .98 (44/45) .84 (38/45) .88 (44/50) 1.0 (35/35) .55 (16/29) 1.0 (25/25) 1.0 (36/36) 10 websites, 5 pages each fields 15Center on Knowledge GraphsUSC Information Sciences Institute
  • 16. Steps To Build a KG Crawling Extraction DataAcquisition Mapping To Ontology Entity Linking &Similarity Knowledge Graph Deployment Query & Visualization Elastic Search Graph DB schema.org geonames Data Acquisition Feature Extraction Feature Alignment Entity Resolution Graph Construction User Interface 16Center on Knowledge GraphsUSC Information Sciences Institute
  • 17. Knowledge Graph for Predicting Cyber Attacks Elastic Search Cyber Domain OntologyBlogs Twitter Conferences CPEs Darkweb marketplaces News CVEs Darkweb Forums Abuse.ch Karma Model Model Microsoft Bulletins 17Center on Knowledge GraphsUSC Information Sciences Institute
  • 18. Cyber Domain Ontology 18 28 Classes 97 Properties Based on Schema.org
  • 19. Karma: Mapping Data to Ontologies Services Relational Sources Karma { JSON-LD } Hierarchical Sources Cyber Ontology 19 [ Knoblock, Szekely, et al. ISWC 2012 ] USC Information Sciences Institute
  • 20. Map Source to Domain Ontology Domain Ontology Source 20 object property data property Software Vulnerability Topic name version author hasVulnerability name description name isTopicOf PostisVulnerabilityOf location mentions datePublished topic hasTopic username Person isAuthorOf Semantic Model: maps source to domain ontology Column 1 Column 2 Column 3 Column 4 Column 5 Bro can you give me a.. English windows xp sp3 CVE-2016-1052 303828 … ‫أنا‬‫جربت‬‫البرنامج‬‫وعمل‬‫ع‬ Arabic jp2_cdef_destroy 147075 salve a tutti, ultimamento … Italian cve-2012-4969 execcommand vuln cve-2012-4969 107075 USC Information Sciences Institute Center on Knowledge Graphs
  • 21. Semantic Types Post Topic Vulnerabilit y Person text language name userId name Post 21 Column 1 Column 2 Column 3 Column 4 Column 5 Bro can you give me a.. English windows xp sp3 CVE-2016-1052 303828 … ‫أنا‬‫جربت‬‫البرنامج‬‫وعمل‬‫ع‬ Arabic jp2_cdef_destroy 147075 salve a tutti, ultimamento … Italian cve-2012-4969 execcommand vuln cve-2012-4969 107075 USC Information Sciences Institute Center on Knowledge Graphs
  • 22. Relationships Post Topic Vulnerability Person text language mentions hasTopic author name userId name 22 Column 1 Column 2 Column 3 Column 4 Column 5 Bro can you give me a.. English windows xp sp3 CVE-2016-1052 303828 … ‫أنا‬‫جربت‬‫البرنامج‬‫وعمل‬‫ع‬ Arabic jp2_cdef_destroy 147075 salve a tutti, ultimamento … Italian cve-2012-4969 execcommand vuln cve-2012-4969 107075 USC Information Sciences Institute Center on Knowledge Graphs
  • 23. Cyber KG Dashboard 23Center on Knowledge GraphsUSC Information Sciences Institute
  • 24. Karma Learns the Source Models Taheriyan et al., ISWC 2013, ICSC 2014 Domain Ontology Learn Semantic Types Sample Data Construct a Graph Generate Candidate Models Rank Results Known Semantic Models 24Center on Knowledge GraphsUSC Information Sciences Institute
  • 25. Learning Semantic Types Requirements: Learn from a small number of examples Distinguish both string and numeric values Can be learned quickly and is highly scalable to large numbers of semantic types Person OrganizationCity State name birthdate name namename Person name date city state workplace 1 Fred Collins Oct 1959 Seattle WA Microsoft 2 Tina Peterson May 1980 New York NY Google Domain Ontology 25Center on Knowledge GraphsUSC Information Sciences Institute
  • 26. Training machine learning model [Pham et al., ISWC 2016] 26
  • 28. Construct a Graph Construct a graph from semantic types and ontology date 28USC Information Sciences Institute
  • 29. Determine Relationships Select minimal tree that connects all semantic types A customized Steiner tree algorithm [Kou & Markowsky, 1981] Initial Model date 29USC Information Sciences Institute
  • 30. Refining the Model Correct Model Impose constraints on Steiner Tree Algorithm 30Center on Knowledge GraphsUSC Information Sciences Institute
  • 31. Knowledge Graphs Karma uses semantic models to create knowledge graphs Karma semi-automatically builds semantic models 31USC Information Sciences Institute Center on Knowledge Graphs
  • 32. American Art Collaborative • Consortium of 14 American art museums • Explore the use of Linked Data for research, education, and outreach • Build 5* Linked Data for the museums • Create tools to support the construction of Linked Data 32Center on Knowledge GraphsUSC Information Sciences Institute [Knoblock et al., ISWC 2017]
  • 33. Example Model of Actor for Amon Carter 33Center on Knowledge GraphsUSC Information Sciences Institute
  • 34. Complete Model of Actor for Amon Carter 34Center on Knowledge GraphsUSC Information Sciences Institute
  • 35. AAC Data Statistics 35Center on Knowledge GraphsUSC Information Sciences Institute
  • 38. Statistics on What Was Mapped 38Center on Knowledge GraphsUSC Information Sciences Institute
  • 39. Steps To Build a KG Crawling Extraction DataAcquisition Mapping To Ontology Entity Linking &Similarity Knowledge Graph Deployment Query & Visualization Elastic Search Graph DB schema.org geonames Data Acquisition Feature Extraction Feature Alignment Entity Resolution Graph Construction User Interface 39Center on Knowledge GraphsUSC Information Sciences Institute
  • 40. Quiet Comfort 25 Noise Cancelling Headphone Bose Electroni c Product 1 Noise Cancelling Headphones Product 2 292 Premium Noise Cancelling Headphones Son y Product 3 599 Dish Washer Bosch Product 4 229 Bose Noise Cancelling Headphones Bos e Product 5 299 price description manufacturerproduct Multi-Type Graph 40 Collective Entity Resolution [Zhu et al, ISWC’16] Identifying and linking instances of the same real world entity
  • 41. Quiet Comfort 25 Noise Cancelling Headphone Bose Electroni c Product 1 Noise Cancelling Headphones Product 2 292 Premium Noise Cancelling Headphones Son y Product 3 599 Dish Washer Bosch Product 4 229 Bose Noise Cancelling Headphones Bos e Product 5 299 price description manufacturerproduct Multi-Type Graph Collective Entity Resolution [Zhu et al, ISWC’16] Identifying and linking instances of the same real world entity 41
  • 42. Common Approach: Pairwise Comparisons Product 5 299 Quiet Comfort 25 Noise Cancelling Headphone Bose Electronic 299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4 599 Dish WasherBoschProduct 3 292 Premium Noise Cancelling HeadphonesSonyProduct 2 Noise Cancelling HeadphonesSonyProduct 1 Price TitleManufacturer Jaro 0.5 distance 0.2 Jaccard 0.3 Acceptance Threshold: 0.8 42USC Information Sciences Institute
  • 43. Graph Summarization: Original Graph Quiet Comfort 25 Noise Cancelling Headphone Bose Electroni c Product 1 Noise Cancelling Headphones Product 2 292 Premium Noise Cancelling Headphones Son y Product 3 599 Dish Washer Bosch Product 4 229 Bose Noise Cancelling Headphones Bos e Product 5 299 price description manufacturerproduct 43Center on Knowledge Graphs
  • 45. Quiet Comfort 25 Noise Cancelling Headphone Bose Electroni c Product 1 Noise Cancelling Headphones Product 2 292 Premium Noise Cancelling Headphones Son y Product 3 599 Dish Washer Bosch 229 Bose Noise Cancelling Headphones Bos e Product 5 299 Product 4 Graph Sumarization: Super-Nodes 45
  • 47. Quiet Comfort 25 Noise Cancelling Headphone Bose Electroni c Product 1 Noise Cancelling Headphones Product 2 292 Premium Noise Cancelling Headphones Son y Product 3 599 Dish Washer Bosch 229 Bose Noise Cancelling Headphones Bos e Product 5 299 Product 4 Super-Links 47Center on Knowledge GraphsUSC Information Sciences Institute
  • 48. Bose Electroni c Product 3 Bosch Bos e Product 5 Product 4 Predict Links In Original Graph 48Center on Knowledge GraphsUSC Information Sciences Institute
  • 49. Bose Electroni c Product 3 Bosch Bos e Product 5 Product 4 Predict Links In Original Graph Bose Electroni c Product 3 Bosch Bos e Product 5 Product 4 49USC Information Sciences Institute
  • 50. Predict Links In Original Graph Bose Electroni c Product 3 Bosch Bos e Product 5 Product 4 50Center on Knowledge GraphsUSC Information Sciences Institute
  • 51. Re-Clustering Improves Reconstruction Quality Bose Electroni c Product 3 Bosch Bos e Product 5 Product 4 Bose Electroni c Product 3 Bosch Bos e Product 5 Product 4 51USC Information Sciences Institute
  • 52. Quality Comparison Precision Recall F-measure Author Paper Product Author Paper Product Author Paper Product Limes-F 0.958 0.827 0.446 0.864 0.761 0.16 0.909 0.792 0.236 Silk-F 0.846 0.877 0.459 0.986 0.756 0.348 0.91 0.812 0.395 Gsum 0.727 0.668 0.01 0.569 0.624 0.587 0.638 0.645 0.02 CoSum-B 0.993 0.871 0.58 0.94 0.611 0.477 0.966 0.718 0.524 Limes-MO 0.912 0.827 0.446 0.944 0.761 0.16 0.928 0.792 0.236 Silk-MO 0.932 0.877 0.459 0.958 0.756 0.348 0.945 0.812 0.395 Serf 0.985 0.837 0.436 0.687 0.808 0.186 0.809 0.822 0.261 CoSum-P 0.999 0.771 0.639 0.997 0.997 0.695 0.998 0.87 0.666 Commercial 0.615 0.63 0.622 AuthorLDA 0.995 52
  • 53. Steps To Build a KG Crawling Extraction DataAcquisition Mapping To Ontology Entity Linking &Similarity Knowledge Graph Deployment Query & Visualization Elastic Search Graph DB schema.org geonames Data Acquisition Feature Extraction Feature Alignment Entity Resolution Graph Construction User Interface 53Center on Knowledge GraphsUSC Information Sciences Institute
  • 54. Counter Human Trafficking 54Center on Knowledge GraphsUSC Information Sciences Institute
  • 55. DIG for Counter Human Trafficking
  • 56. Find the locations where a potential victim was advertised
  • 57. Successfully deployed and used to find victims and prosecute traffickers
  • 58. Graph Construction assembling the data for efficient query & analysis - Data represented in JSON-LD - Stored in ElasticSearch • Cloud-based search engine based on Apache Lucene • Horizontal scaling, replication, load balancing • Queries are fast! • Everything is a document - bulk loading: massive data imports (> 100M web pages) - real-time updates: live, changing data (~5,000 pages/hour) 58Center on Knowledge GraphsUSC Information Sciences Institute
  • 59. Adult Service Offer Person Efficient indexing and query Phone Web Page ElasticSearch Data Model 59Center on Knowledge GraphsUSC Information Sciences Institute
  • 60. Indexing for High Performance Knowledge Graph Queries Avg. Query Times in Milliseconds Single User Query Load 1.2 billion triples State of the Art Graph Database (RDF) DIG indexing deployed in ElasticSearch 60Center on Knowledge GraphsUSC Information Sciences Institute
  • 61. • Index time for 16 million documents ~2.5 Hours • Query times: • Average Query time for Keyword searches: 8 msec • Find a specific CVE: 14 msec • Get all mentions of a MS Bulletin in all sources: 48 msec • Get all Malware named ‘Locky’ and sort results by observedDate: 68 msec • Get all blogs mentioning keyword ‘microsoft’ with a date range: 98 msec • Aggregate and give document counts for each publisher/sensor: 409 msec 61 Knowledge Graph Performance in Cyber Domain USC Information Sciences Institute Center on Knowledge Graphs
  • 62. Questions Addressed in This Talk 1. Where should the Semantic Web data come from? • Triplestores? Linked data? Schema.org? 2. What is the best representation of the data in a knowledge graph? • Do we want to use the most detailed ontology possible 3. How should we deal with missing and incomplete information • Manual curation? Automated data cleaning? 4. How do we organize and store the data for efficient access? • RDF? Triplestore? Questions Addressed in This Talk Lessons Learned 62Center on Knowledge GraphsUSC Information Sciences Institute
  • 63. Questions Addressed in This Talk 1. Where should the Semantic Web data come from? • Triplestores? Linked data? Schema.org? The Web! Waiting for the rest of the world to adopt the Semantic Web and provide the data in RDF is an approach doomed to failure! Questions Addressed in This Talk Lessons Learned 63Center on Knowledge GraphsUSC Information Sciences Institute
  • 64. Questions Addressed in This Talk 1. Where should the Semantic Web data come from? • Triplestores? Linked data? Schema.org? 2. What is the “best” representation of the data in a knowledge graph? • Do we want to use the most detailed ontology possible The simplest one you need for the problem you are trying to solve Overly complicated ontologies that attempt to be comprehensive for a domain, get in the way of solving the real problems Questions Addressed in This Talk Lessons Learned 64Center on Knowledge GraphsUSC Information Sciences Institute
  • 65. Questions Addressed in This Talk 1. Where should the Semantic Web data come from? • Triplestores? Linked data? Schema.org? 2. What is the best representation of the data in a knowledge graph? • Carefully curated domain-specific ontologies? 3. How should we deal with missing and incorrect information • Manual curation? Automated data cleaning? Clean where possible, but need techniques that can face these problems The world is a messy place and the ability to deal with it allows us to solve real-world problems Questions Addressed in This Talk Lessons Learned 65Center on Knowledge GraphsUSC Information Sciences Institute
  • 66. Questions Addressed in This Talk 1. Where should the Semantic Web data come from? • Triplestores? Linked data? Schema.org? 2. What is the best representation of the data in a knowledge graph? • Carefully curated domain-specific ontologies? 3. How should we deal with missing and incomplete information • Manual curation? Automated data cleaning? 4. How do we organize and store the data for efficient access? • RDF? Triplestore? In whatever datastore best meets the goals of the problem! It is a mistake to equate the Semantic Web with triples and triplestores. Questions Addressed in This Talk Lessons Learned 66Center on Knowledge GraphsUSC Information Sciences Institute
  • 67. Important Directions for Future Research 1. Techniques for extracting data from the online sources 2. Approaches to quickly build, refine, and extend ontologies to solve specific problems 3. Methods for semantically annotating data from extracted sources 4. Scalable and configurable techniques for entity resolution 5. Highly scalable algorithms for querying and reasoning 6. Ability to publish and query semantic data on web pages 67Center on Knowledge GraphsUSC Information Sciences Institute

Editor's Notes

  1. Karma offers suggestions on how to do the mapping
  2. Tokenize values in a given labeled column into pure alphabetic, numeric and symbol tokens Extract features from the tokens and the column name and associate them with column’s semantic type
  3. Waiting for the rest of the world to adopt the Semantic Web and provide the data in RDF is an approach doomed to failure!
  4. Overly complicated ontologies that attempt to be comprehensive for a domain, get in the way of solving the real problems
  5. The world is a messy place and the ability to deal with it allows us to solve real-world problems
  6. It is a mistake to equate the Semantic Web with triples and triplestores.