Semantic Analytics, Smart Data

K E E N A N A L Y T I C S 1
Semantic
SPEAKERS:
Dr. Arthur Keen, Principal
Keen Analytics
Thomas Kelly, Practice Director
Cognizant Technology Solutions, Inc.

Operator, get me Klondike 5-397

Data Ecosystems are Growing in Complexity
Tens of thousands of
databases
Millions to billions of
data elements
Dozens of markets
Hundreds to thousands of
social media sites

Analytics without semantics
is like having a multi-lingual
conversation without
interpreters
Semantics Manages the Complexity of Data Variety

Semantic Analytics
Data Science
Domains
Technologies
Analytic Methods
Semantics
• Knowledge
• Expertise
• Abstraction &
Diversity
• Consistency
Semantics
• Data Meaning
• Context
• Relationships
• Vocabulary

Semantic Analytics
Emphasis is on
data relationships,
not just the data
Data focus is on data
concepts (abstraction),
not the diversity of
implementation details
Data assumptions are
made explicit in the
semantic model
The semantics guide
the analytics process,
rather than just the
analyst’s knowledge
SPARQL is a key
component, but not
the only tool in the
semantic analytics
toolbox

Challenges in Semantic Analytics
1
2
3
Semantic models that
do not abstract data
concepts from their
implementation details
4
Semantic models that
are missing semantics
Semantic data that is
missing a semantic
model
Rich, accurate
provenance is required
to establish confidence
in the analytics results
5
Data cleansing must
meet requirements for
accuracy, consistency,
and fitness for the
purpose of the analytic
task and result

Semantics Analytics in the Data to Action Loop
Analyze
Transform
Classify
Correlate
Predict
Interpret
COA’s
Semantics
Which relationships relevant?
What class? What kind of group?
Define New Relationships?
Inference
Tag/Inference
Representation/Provenance
Wisdom
Knowledge
Information
Data
W
K
I
D
W
K
I
D
W
K
I
D
W
K
I
D
Semantics
Intelligence Pyramid
Analytics
Semantic
Analytics

Clustering Through Semantic Tagging
Image Credit: historyinthecity.blogspot.com
Semantic Tags
• Tend to be user- or publisher-defined based on preferences,
including terminology and depth of attribution
• May have ambiguities to resolve (synonyms, reuse/overuse,
too specific, language, jargon)
Key Benefits
• Faster search of content
• Greater precision of search results
Semantic Tags are keywords
used to describe a resource
(webpages, documents, business
transactions)
Source-Directed Tags
• Manual selection and entry by the author
• Automated population by the publisher, such as
professional literature or publication websites
• Automatically excerpted from a corpus through
semantic analysis of the content, guided by a
controlled vocabulary

Source: Implementing Iterative Algorithms with SPARQL http://ceur-ws.org/Vol-1133/paper-36.pdf
DROP GRAPH <urn:ga/g/xjz[i+1]> ;
CREATE GRAPH <urn:ga/g/xjz[i+1]> ;
INSERT { GRAPH <urn:ga/g/xjz[i+1]>
{?s <urn:ga/p/inCluster> ?clus3 } }
WHERE {
{ SELECT ?s (SAMPLE(?clus) AS ?clus3) WHERE {
{ SELECT ?s (MAX(?clusCt) AS ?maxClusCt) WHERE {
SELECT ?s ?clus (COUNT(?clus) AS ?clusCt)
WHERE { ?s <urn:ga/p/hasLink> ?o .
GRAPH <urn:ga/g/xjz[i] > ?clus }
} GROUP BY ?s ?clus
} GROUP BY ?s }
{ SELECT ?s ?clus (COUNT(?clus) AS ?clusCt)
WHERE { ?s <urn:ga/p/hasLink> ?o .
GRAPH <urn:ga/g/xjz[i]>
{ ?o <urn:ga/p/inCluster ?clus }
} GROUP BY ?s ?clus
} FILTER (?clusCt = ?maxClusCt)
} GROUP BY ?s } }
DROP GRAPH <urn:ga/g/xjz0> ;
CREATE GRAPH <urn:ga/g/xjz0> ;
INSERT { GRAPH <urn:ga/g/xjz0>
{?s <urn:ga/p/inCluster> ?s } }
WHERE {
{ SELECT DISTINCT ?s WHERE {
{ SELECT ?s <urn:ga/p/hasLink> ?o . } }
Assign Each Tag Vertex to a Cluster
For Each Tag Vertex, Populate Cluster Assignments of Neighbors
Peer-Pressure Clustering
Observation
• No use of semantics features, such as
vocabulary and knowledge
management capabilities
Strengths
• Effective over large volumes of data
• Comprehensive use of RDF data
structure features

Positive Negative
Ecstatic Pleased Okay Terms used in
Semantic Tags
Common Taxonomy for Semantic Tags

Positive Negative
Ecstatic Inspired Charged
Excited
Exceeds
Need
Very
Satisfied
Satisfied
Somewhat
Satisfied
Preferred Terms,
Synonyms, and
Common
Misspellings
Frequently-Used
Generalizations and
Degrees of Specificity
Knowledge-based Taxonomy for Semantic Tags
Estatic
Extatic
Egstatic

Clustering through Semantic Tagging
Process
Cluster resources with highest frequency semantic tag pairs
Map the semantic tags to an N-level taxonomy of preferred tags,
based on exact and synonym matches, and desired degree of
specificity
Select a set of triples containing URIs of the resources, as well as the
semantic tags assigned to the resources

INSERT
{ ?SemanticTagEdgeURI
rdf:type :SemanticTagEdge ;
:resourceURI ?resource ;
:edgeNode1 ?clusterTagLabel1 ;
:edgeNode2 ?clusterTagLabel2 . }
WHERE {
?SemanticTagURI1
rdf:type :SemanticTag ;
:clusterTagValue ?clusterTagLabel1 .
?SemanticTagURI2
rdf:type :SemanticTag ;
:clusterTagValue ?clusterTagLabel2 .
FILTER ( ?clusterTagLabel1 != ?clusterTagLabel2 )
BIND ( URI( CONCAT( str(?resource),
?clusterTagLabel1, ?clusterTagLabel2 ) ) AS
?SemanticTagEdgeURI ) }
Clustering through Semantic Tagging
:Webpage1 :hasTag “10101” .
:Webpage1 :hasTag “1030303B” .
:Webpage3 :hasTag “10201A” .
:Webpage4 :hasTag “10302A” .
:Webpage5 :hasTag “10101A” . …
INSERT { ?SemanticTagURI :clusterTagValue
?clusterTagLabel }
WHERE {
?SemanticTagURI rdf:type :SemanticTag ;
:hasTag ?tagLabel .
?Concept rdf:type skos:Concept ;
( skos:prefLabel|skos:altLabel|skos:hiddenLabel )
?tagLabel .
OPTIONAL {
?Concept :degreeOfSpecificity :<SPECIFICITY> ;
skos:prefLabel ?clusterTagLabel . }
OPTIONAL {
?Concept :degreeOfSpecificity ?Specificity .
?Concept skos:broader* ?BroaderConcept .
?BroaderConcept :degreeOfSpecificity
?BroaderSpecificity .
FILTER ( ?BroaderSpecificity = :<SPECIFICITY> )
?BroaderConcept skos:prefLabel
?clusterTagLabel . } }
Insert Sample Data
Find Preferred/Generalized
Tag Value
Generate Tag Pairs
Concept
- Preferred Tag Term
- Synonyms, Misspellings
- Broader/Generalized
Concepts
- Degree of Specificity
Taxonomy
• Highest Frequency Tag Pairs
• Highest Frequency Solitary Tags *
• Triple and Quadruple Tag Sets *
Results
* Not depicted

Semantic Analytics in Two Flavors
Semantics on Analytics Analytics on Semantics
Semantic assisted analysis: Money
laundering, fraud detection,
community detection, insider trading…
Understanding Risk (financial
trading & cyber security),
transaction optimization,
vulnerability assessment…

Discover Abnormal BehaviorProbability
Degree Centrality
Rare Occurrence
(Frequent
Communication)
Rare Occurrence
(Infrequent
Communication)
Normal
Communication
Levels

Identifying and predicting behavior changes
Observe Orient Decide Act
Network
Density
Time
Classify and predict group behavior using communication network density
What kind of organization is this?
What is their objective/intent?
Distributing food? Terrorist attack? Cyber attack?
Merger/Acquisition? Bank robbery?
When are they going to act?

Company
Understanding Risk: Systemic Risk Analysis
Transitive risk exposure in a network of trading partners and holding companies
E
F
A
D H
L
J
B
K
C
G
I
M
O
N
Q
relationship
R
P

Company
Systemic Risk Analysis:
E
F
A
D H
L
J
B
K
C
G
I
M
O
N
Q
controlledBy
tradesWith
R
P

Company
E
F
A
D H
L
J
B
K
C
G
I
M
O
N
Q
controlledBy
tradesWith
R
P
Bank

E
F
A
D H
L
J
B
K
C
G
I
M
O
N
Q
controlledBy
tradesWith
R
P
Bank
BankHoldingCompany
A bank holding company controls a bank or controls a bank holding company

E
F
A
D
H
L
J
B
K
C
G
I
M O
N
Q
controlledBy
tradesWith
R
P
Bank
BankHoldingCompany
risk

In SPARQL
PREFIX : <http://pagerank/>
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
SELECT DISTINCT ?node ?rank
WHERE {GRAPH <http://pagerank>{
{?node :to [].}UNION {[] :to ?node}
?node rank:hasRDFRank ?rank .
}}ORDER BY ?node
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { rank:epsilon rank:setParam "0.001" . }

Questions?

Thank you!

Speakers
Thomas (Tom) Kelly
Practice Director, Enterprise Information Management, Cognizant
Thomas Kelly is a Director in Cognizant’s Enterprise Information Management
(EIM) Practice and heads its Semantic Technology Center of Excellence. He has 20-
plus years of technology consulting experience in leading data warehousing,
business intelligence and big data projects, focused primarily on the life sciences,
healthcare, and financial services industries. Tom can be reached at
Thomas.Kelly@cognizant.com.
Dr. Arthur Keen
Principal, Keen Analytics
Arthur Keen possesses a deep understanding of graph analytics, predictive
modeling, unstructured data, categorization, text mining, natural language
processing, data mining algorithms, neural networks, and Artificial Intelligence.
He has used his expertise in these areas to provide thought leadership and
develop applications and evaluations in multiple domains including
intelligence/security informatics, business intelligence, cyber security, financial
analysis, corporate governance, retail and energy. Arthur can be reached at
akeen@keenassoc.com

Semantic Analytics, Smart Data

Recommended

Recommended

More Related Content

Similar to Semantic Analytics, Smart Data

Similar to Semantic Analytics, Smart Data (20)

Recently uploaded

Recently uploaded (20)

Semantic Analytics, Smart Data

Editor's Notes