Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Semantic Analytics, Smart Data
1. K E E N A N A L Y T I C S 1
Semantic
SPEAKERS:
Dr. Arthur Keen, Principal
Keen Analytics
Thomas Kelly, Practice Director
Cognizant Technology Solutions, Inc.
2. K E E N A N A L Y T I C S 2
Operator, get me Klondike 5-397
3. K E E N A N A L Y T I C S 3
Data Ecosystems are Growing in Complexity
Tens of thousands of
databases
Millions to billions of
data elements
Dozens of markets
Hundreds to thousands of
social media sites
4. K E E N A N A L Y T I C S 4
Analytics without semantics
is like having a multi-lingual
conversation without
interpreters
Semantics Manages the Complexity of Data Variety
5. K E E N A N A L Y T I C S 5
Semantic Analytics
Data Science
Domains
Technologies
Analytic Methods
Semantics
• Knowledge
• Expertise
• Abstraction &
Diversity
• Consistency
Semantics
• Data Meaning
• Context
• Relationships
• Vocabulary
6. K E E N A N A L Y T I C S 6
Semantic Analytics
Emphasis is on
data relationships,
not just the data
Data focus is on data
concepts (abstraction),
not the diversity of
implementation details
Data assumptions are
made explicit in the
semantic model
The semantics guide
the analytics process,
rather than just the
analyst’s knowledge
SPARQL is a key
component, but not
the only tool in the
semantic analytics
toolbox
7. K E E N A N A L Y T I C S 7
Challenges in Semantic Analytics
1
2
3
Semantic models that
do not abstract data
concepts from their
implementation details
4
Semantic models that
are missing semantics
Semantic data that is
missing a semantic
model
Rich, accurate
provenance is required
to establish confidence
in the analytics results
5
Data cleansing must
meet requirements for
accuracy, consistency,
and fitness for the
purpose of the analytic
task and result
8. K E E N A N A L Y T I C S 8
Semantics Analytics in the Data to Action Loop
Analyze
Transform
Classify
Correlate
Predict
Interpret
COA’s
Semantics
Which relationships relevant?
What class? What kind of group?
Define New Relationships?
Inference
Tag/Inference
Representation/Provenance
Wisdom
Knowledge
Information
Data
W
K
I
D
W
K
I
D
W
K
I
D
W
K
I
D
Semantics
Intelligence Pyramid
Analytics
Semantic
Analytics
9. K E E N A N A L Y T I C S 9
Clustering Through Semantic Tagging
Image Credit: historyinthecity.blogspot.com
Semantic Tags
• Tend to be user- or publisher-defined based on preferences,
including terminology and depth of attribution
• May have ambiguities to resolve (synonyms, reuse/overuse,
too specific, language, jargon)
Key Benefits
• Faster search of content
• Greater precision of search results
Semantic Tags are keywords
used to describe a resource
(webpages, documents, business
transactions)
Source-Directed Tags
• Manual selection and entry by the author
• Automated population by the publisher, such as
professional literature or publication websites
• Automatically excerpted from a corpus through
semantic analysis of the content, guided by a
controlled vocabulary
10. K E E N A N A L Y T I C S 10
Clustering Through Semantic Tagging
Source: Implementing Iterative Algorithms with SPARQL http://ceur-ws.org/Vol-1133/paper-36.pdf
DROP GRAPH <urn:ga/g/xjz[i+1]> ;
CREATE GRAPH <urn:ga/g/xjz[i+1]> ;
INSERT { GRAPH <urn:ga/g/xjz[i+1]>
{?s <urn:ga/p/inCluster> ?clus3 } }
WHERE {
{ SELECT ?s (SAMPLE(?clus) AS ?clus3) WHERE {
{ SELECT ?s (MAX(?clusCt) AS ?maxClusCt) WHERE {
SELECT ?s ?clus (COUNT(?clus) AS ?clusCt)
WHERE { ?s <urn:ga/p/hasLink> ?o .
GRAPH <urn:ga/g/xjz[i] > ?clus }
} GROUP BY ?s ?clus
} GROUP BY ?s }
{ SELECT ?s ?clus (COUNT(?clus) AS ?clusCt)
WHERE { ?s <urn:ga/p/hasLink> ?o .
GRAPH <urn:ga/g/xjz[i]>
{ ?o <urn:ga/p/inCluster ?clus }
} GROUP BY ?s ?clus
} FILTER (?clusCt = ?maxClusCt)
} GROUP BY ?s } }
DROP GRAPH <urn:ga/g/xjz0> ;
CREATE GRAPH <urn:ga/g/xjz0> ;
INSERT { GRAPH <urn:ga/g/xjz0>
{?s <urn:ga/p/inCluster> ?s } }
WHERE {
{ SELECT DISTINCT ?s WHERE {
{ SELECT ?s <urn:ga/p/hasLink> ?o . } }
Assign Each Tag Vertex to a Cluster
For Each Tag Vertex, Populate Cluster Assignments of Neighbors
Peer-Pressure Clustering
Observation
• No use of semantics features, such as
vocabulary and knowledge
management capabilities
Strengths
• Effective over large volumes of data
• Comprehensive use of RDF data
structure features
11. K E E N A N A L Y T I C S 11
Clustering Through Semantic Tagging
Positive Negative
Ecstatic Pleased Okay Terms used in
Semantic Tags
Common Taxonomy for Semantic Tags
12. K E E N A N A L Y T I C S 12
Clustering Through Semantic Tagging
Positive Negative
Ecstatic Inspired Charged
Excited
Exceeds
Need
Very
Satisfied
Satisfied
Somewhat
Satisfied
Preferred Terms,
Synonyms, and
Common
Misspellings
Frequently-Used
Generalizations and
Degrees of Specificity
Knowledge-based Taxonomy for Semantic Tags
Estatic
Extatic
Egstatic
13. K E E N A N A L Y T I C S 13
Clustering through Semantic Tagging
Process
Cluster resources with highest frequency semantic tag pairs
Map the semantic tags to an N-level taxonomy of preferred tags,
based on exact and synonym matches, and desired degree of
specificity
Select a set of triples containing URIs of the resources, as well as the
semantic tags assigned to the resources
14. K E E N A N A L Y T I C S 14
INSERT
{ ?SemanticTagEdgeURI
rdf:type :SemanticTagEdge ;
:resourceURI ?resource ;
:edgeNode1 ?clusterTagLabel1 ;
:edgeNode2 ?clusterTagLabel2 . }
WHERE {
?SemanticTagURI1
rdf:type :SemanticTag ;
:resourceURI ?resource ;
:clusterTagValue ?clusterTagLabel1 .
?SemanticTagURI2
rdf:type :SemanticTag ;
:resourceURI ?resource ;
:clusterTagValue ?clusterTagLabel2 .
FILTER ( ?clusterTagLabel1 != ?clusterTagLabel2 )
BIND ( URI( CONCAT( str(?resource),
?clusterTagLabel1, ?clusterTagLabel2 ) ) AS
?SemanticTagEdgeURI ) }
Clustering through Semantic Tagging
:Webpage1 :hasTag “10101” .
:Webpage1 :hasTag “1030303B” .
:Webpage2 :hasTag “10201” .
:Webpage2 :hasTag “1030301” .
:Webpage3 :hasTag “1030303B” .
:Webpage3 :hasTag “10201A” .
:Webpage4 :hasTag “10101B” .
:Webpage4 :hasTag “10302A” .
:Webpage5 :hasTag “1030301” .
:Webpage5 :hasTag “10101A” . …
INSERT { ?SemanticTagURI :clusterTagValue
?clusterTagLabel }
WHERE {
?SemanticTagURI rdf:type :SemanticTag ;
:hasTag ?tagLabel .
?Concept rdf:type skos:Concept ;
( skos:prefLabel|skos:altLabel|skos:hiddenLabel )
?tagLabel .
OPTIONAL {
?Concept :degreeOfSpecificity :<SPECIFICITY> ;
skos:prefLabel ?clusterTagLabel . }
OPTIONAL {
?Concept :degreeOfSpecificity ?Specificity .
?Concept skos:broader* ?BroaderConcept .
?BroaderConcept :degreeOfSpecificity
?BroaderSpecificity .
FILTER ( ?BroaderSpecificity = :<SPECIFICITY> )
?BroaderConcept skos:prefLabel
?clusterTagLabel . } }
Insert Sample Data
Find Preferred/Generalized
Tag Value
Generate Tag Pairs
Concept
- Preferred Tag Term
- Synonyms, Misspellings
- Broader/Generalized
Concepts
- Degree of Specificity
Taxonomy
• Highest Frequency Tag Pairs
• Highest Frequency Solitary Tags *
• Triple and Quadruple Tag Sets *
Results
* Not depicted
15. K E E N A N A L Y T I C S 15
Semantic Analytics in Two Flavors
Semantics on Analytics Analytics on Semantics
Semantic assisted analysis: Money
laundering, fraud detection,
community detection, insider trading…
Understanding Risk (financial
trading & cyber security),
transaction optimization,
vulnerability assessment…
16. K E E N A N A L Y T I C S 16
Discover Abnormal BehaviorProbability
Degree Centrality
Rare Occurrence
(Frequent
Communication)
Rare Occurrence
(Infrequent
Communication)
Normal
Communication
Levels
17. K E E N A N A L Y T I C S 17
Identifying and predicting behavior changes
Observe Orient Decide Act
Network
Density
Time
Classify and predict group behavior using communication network density
What kind of organization is this?
What is their objective/intent?
Distributing food? Terrorist attack? Cyber attack?
Merger/Acquisition? Bank robbery?
When are they going to act?
18. K E E N A N A L Y T I C S 18
Company
Understanding Risk: Systemic Risk Analysis
Transitive risk exposure in a network of trading partners and holding companies
E
F
A
D H
L
J
B
K
C
G
I
M
O
N
Q
relationship
R
P
19. K E E N A N A L Y T I C S 19
Company
Systemic Risk Analysis:
Transitive risk exposure in a network of trading partners and holding companies
E
F
A
D H
L
J
B
K
C
G
I
M
O
N
Q
controlledBy
tradesWith
R
P
20. K E E N A N A L Y T I C S 20
Company
Systemic Risk Analysis:
Transitive risk exposure in a network of trading partners and holding companies
E
F
A
D H
L
J
B
K
C
G
I
M
O
N
Q
controlledBy
tradesWith
R
P
Bank
21. K E E N A N A L Y T I C S 21
Systemic Risk Analysis:
Transitive risk exposure in a network of trading partners and holding companies
E
F
A
D H
L
J
B
K
C
G
I
M
O
N
Q
controlledBy
tradesWith
R
P
Bank
BankHoldingCompany
A bank holding company controls a bank or controls a bank holding company
22. K E E N A N A L Y T I C S 22
Systemic Risk Analysis:
Transitive risk exposure in a network of trading partners and holding companies
E
F
A
D
H
L
J
B
K
C
G
I
M O
N
Q
controlledBy
tradesWith
R
P
Bank
BankHoldingCompany
risk
23. K E E N A N A L Y T I C S 23
In SPARQL
PREFIX : <http://pagerank/>
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
SELECT DISTINCT ?node ?rank
WHERE {GRAPH <http://pagerank>{
{?node :to [].}UNION {[] :to ?node}
?node rank:hasRDFRank ?rank .
}}ORDER BY ?node
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { rank:epsilon rank:setParam "0.001" . }
26. K E E N A N A L Y T I C S 26
Speakers
Thomas (Tom) Kelly
Practice Director, Enterprise Information Management, Cognizant
Thomas Kelly is a Director in Cognizant’s Enterprise Information Management
(EIM) Practice and heads its Semantic Technology Center of Excellence. He has 20-
plus years of technology consulting experience in leading data warehousing,
business intelligence and big data projects, focused primarily on the life sciences,
healthcare, and financial services industries. Tom can be reached at
Thomas.Kelly@cognizant.com.
Dr. Arthur Keen
Principal, Keen Analytics
Arthur Keen possesses a deep understanding of graph analytics, predictive
modeling, unstructured data, categorization, text mining, natural language
processing, data mining algorithms, neural networks, and Artificial Intelligence.
He has used his expertise in these areas to provide thought leadership and
develop applications and evaluations in multiple domains including
intelligence/security informatics, business intelligence, cyber security, financial
analysis, corporate governance, retail and energy. Arthur can be reached at
akeen@keenassoc.com
Editor's Notes
Recent sessions on semantic analytics focus on data prep
-- how semantics supports analytics
We’ll talk about how the analytics are based on and enabled by semantics
Bell Telephone Operator story
-- everyone becomes an operator
-- everyone will become a data analyst or data scientist
How can semantics fundamentally change how analytics are performed?
-- by experts and non-experts
We’re seeing an explosion of data resources that are available to support analytic activities. But with each new set of data comes new terminology, data formats, definitions, context, and more. Our ability to analyze a set of data is no longer impeded by technology’s limitations but, rather, our own abilities to absorb and understand the variety of data that we will leverage to solve problems.
While many technologies can ingest standard data formats, disparate encoding formats and the lack of data meaning captured by these systems causes lengthy delays in onboarding new data into an organization’s data ecosystem. Further, once the data has been onboarded and rationalized, the variety of data repositories and the types of data that they manage far exceed a human’s ability to remember which (databases and data elements) are best for which purposes.
Data Science combines expertise in Domains, Technology, and Analytic Methods
Semantics capture and embed data meaning, context, and predefined data integration.
Semantic data that is missing a semantic model
-- We may have triples, but no class or property definitions
Semantic models that are missing semantics
-- Data and relationship definitions, but no properties or annotations that describe the data
Semantic models that do not abstract data concepts from their implementation details
--
Optional – Check if known term in hierarchy. If not, bind tag and stop
3-4 Minutes: Semantic analytics comes in two flavors. The first is Semantics on Analytics, or semantically assisted analytics, where semantic models including vocabularies, ontologies, and provenance are consulted before, during, and after analytics.
Typical use cases fraud detection, money laundering detection, community detection, insider trading detection. The second involves applying the analytics to the semantic model (network). This is used for understanding behavior of complex systems and is used for risk analysis, cyber security, vulnerability assessments.
3-4 minutes: We are looking for abnormal behavior in communication. In a complex graph with high dimensionality (lots of properties) we need guidance on which relationships constitute communication. We can use this to restrict the relationships being considered. In this example we use degree centrality as a metric and compute a frequency distribution for it in order to find rare behavior.
Seeking anomalies on either end of the tail
Hacker with no internal communication
Sharp increase/decrease in communication between trading partners
3-4 Minutes: Describe how metrics like network density (or triangle counts) indicate kinds of behavior and behavior changes and how this can be used to predict behavior
Slides 13 to 18 are like flash cards. 3-4 minutes for the set. Spend no more than 20 seconds on each. Given a complex graph representing interactions between trading partners. We would like to understand the relative risk being absorbed by different entities that results from the trading activity
PREFIX : <http://www.example/fs/>
INSERT DATA {
:A :tradesWith :D.
:D :tradesWith :A.
:B :tradesWith :D.
:D :tradesWith :B.
:C :tradesWith :D.
:D :tradesWith :C.
:D :tradesWith :E.
:E :tradesWith :D.
:D :tradesWith :I.
:I :tradesWith :D.
:D :tradesWith :H.
:H :tradesWith :D.
:D :tradesWith :G.
:G :tradesWith :D.
:E :tradesWith :F.
:F :tradesWith :E.
:F :tradesWith :H.
:H :tradesWith :F.
:H :tradesWith :J.
:J :tradesWith :H.
:G :tradesWith :J.
:J :tradesWith :G.
:H :tradesWith :L.
:L :tradesWith :H.
:J :tradesWith :L.
:L :tradesWith :J.
:I :tradesWith :K.
:K :tradesWith :I.
:K :tradesWith :L.
:L :tradesWith :K.
}
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA { _:b1 rank:compute _:b2. }
PREFIX : <http://risk/example/>
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
SELECT DISTINCT *
WHERE {GRAPH <http://risk/example>{
?bank :tradesWith [] .
?bank rank:hasRDFRank ?rank .
}}ORDER BY DESC(?rank)
bank
rank
1 http://risk/example/D "1.00"^^xsd:float
2 http://risk/example/H "0.51"^^xsd:float
3 http://risk/example/J "0.38"^^xsd:float
4 http://risk/example/L "0.38"^^xsd:float
5 http://risk/example/E "0.27"^^xsd:float
6 http://risk/example/I "0.27"^^xsd:float
7 http://risk/example/F "0.26"^^xsd:float
8 http://risk/example/K "0.26"^^xsd:float
9 http://risk/example/G "0.25"^^xsd:float
10 http://risk/example/A "0.13"^^xsd:float
11 http://risk/example/B "0.13"^^xsd:float
12 http://risk/example/C "0.13"^^xsd:float
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
INSERT DATA {_:b1 rank:computeIncremental "true"}
We identify the controlledBy and tradesWith relationships as relationships that propagate risk. In a real deployment you would use actual transactions rather than abstracting it like this.
We identify the organizations that are banks
And infer the bank holding companies using the bank holding company rule
We apply pageRank algorithm to the topology to provide an overall picture of relative risk of the organizations.
If running out of time, skip over this one. Briefly, here is how this is done in SPARQL. Can explain in more detail after session. If asked, I used a combination of Ontotext, Linkurious, and Neo4j to do this.