An Ontology-based Technique for Online Profile Resolution

www.insight-centre.org

An Ontology-based Technique for
Online Profile Resolution
Keith Cortis, Simon Scerri, Ismael Rivera,
Siegfried Handschuh

International Conference on Social Informatics
Kyoto, Japan

27th November 2013

Introduction (1)



Instance Matching : if two instances /
representations refer to the same real world
entity or not e.g., persons

 Research Challenge : Discovery of multiple
online profiles that refer to the same person
identity on heterogeneous social networks

Introduction (2)



Improved profile matching system extended
with:
 Named

Entity Recognition
 Linked Open Data
 Semantic Matching

Additional Benefit: Ontology used
background schema
 Advantage: Standard schema enables
cross-network interoperability


as

a

Motivation

 Contact Matcher Applications:
 Control sharing of personal data
 Detection of fully or partly anonymous
contacts
o

> 83 million fake accounts

 New contacts suggestions that are of direct
interest to user

Profile Resolution Technique
1
User Profile
Data Extraction
NCO

2
Semantic Lifting

3
Named Entity Recognition
Name
ANNIE
IE System

Surname

Large KB
Gazetteer

City

4
Hybrid Matching
Process
a
Attribute
Value
Matching

b

c

Semantic-based
Matching Extension
City

Country

Country
country

5
Online Profile Suggestions

6
Online Profile Merging

Attribute Weighting
Function

1
User Profile
Data Extraction

2
Semantic Lifting

Semantic Lifting

 Lifting semi-/un-structured profile information
from a remote schema

 Transform information to instances of the
Contact Ontology (NCO)
 NCO - Identity-related online profile information

1
User Profile
Data Extraction
NCO

2
Semantic Lifting

3
Name
ANNIE
IE System

Large KB
Gazetteer

Surname

City

4
Hybrid Matching
Process
a
Attribute
Value
Matching

Country

Attribute Value Matching

 Direct Value Comparison

 String Matching
Best string matching metric for each
attribute type

1
User Profile
Data Extraction
NCO

2
Semantic Lifting

3
Name
ANNIE
IE System

Large KB
Gazetteer

Surname

City

4
Hybrid Matching
Process
a
Attribute
Value
Matching

b
Semantic-based
Matching Extension
City

Country
country

Country

Semantic-based Matching

 Indirect semantic relations at a schema level
 Use-case: Location-related profile attributes
 Location sub-entities being semantically
compared are: city, region and country
 Find the semantic relations between the subentities in question in a bi-directional manner
 E.g. Galway (profile 1) vs. Ireland (profile 2)
Galway

locatedWithin

Ireland

Ireland

country
isPartOf

isLocationOf
containsLocation

Galway
capital
largestCity

1
User Profile
Data Extraction
NCO

2
Semantic Lifting

3
Name
ANNIE
IE System

Surname

Large KB
Gazetteer

City

4
Hybrid Matching
Process
a
Attribute
Value
Matching

b

c

Semantic-based
Matching Extension
City

Country
country

Country

Attribute Weighting
Function

Attribute Weighting Function

 Approach 1: Direct Similarity Score
Name

Justin Bieber

Similarity Value

J. Bieber
0.90

 Approach 2: Normalised Similarity Score
based on a threshold for each attribute type
Attribute Threshold for Name : 0.70
Name

Justin Bieber

J. Bieber

Metric Similarity Value

0.90

Similarity Value

1.0

Name

Justin Bieber

Joffrey Baratheon

Metric Similarity Value

0.4

Similarity Value

0.0

1
User Profile
Data Extraction
NCO

2
Semantic Lifting

3
Name
ANNIE
IE System

Surname

Large KB
Gazetteer

City

4
Hybrid Matching
Process
a
Attribute
Value
Matching

b

c

Semantic-based
Matching Extension
City

Country

Country
country

5

Attribute Weighting
Function


Name

Joffrey Baratheon

Joff Baratheon

City

King’s Landing

King’s Landing

Role

King

King

286AL

286AL

Date of Birth
Similarity Score

0.95
Similarity Threshold: 0.90

Name

Joffrey Baratheon

Joffrey Bieber

City

King’s Landing

London, Ontario

Role

King

Singer

286AL

01/03/1994

Date of Birth
Similarity Score

0.30

Experiments & Evaluation

 Two-staged evaluation:
1. Technique
a) Best attribute similarity score approach
b) If NER & semantic-based matching extension
improve overall technique
c) The computational performance of hybrid
technique against the syntactic-based one
d) A similarity threshold that determines profile
equivalence within a satisfactory degree of
confidence

2. Usability
e) Level of precision for the profile matching

Technique Evaluation

 Two Datasets:
1. A controlled dataset of public profiles obtained
from the Web (LinkedIn and Twitter)
 182 online profiles
–
–

112 ambiguous real-world
persons (common attributes)
70 refer to 35 well-known
sports journalists

 Maximised False Positives

2. Private personal and contact-list profiles
obtained from 5 consenting participants

Technique Evaluation – Experiment 1

 Profile attribute similarity score that fares best
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

Normalised Approach

Precision
Recall
F1-Measure

0.7

0.75

0.8

0.85

Threshold value

0.9

Results

Result

Direct Approach
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

Precision
Recall
F1-Measure

0.7

0.75

0.8

0.85

0.9

Threshold value

 Direct Approach outperforms Normalised Approach
 8631 online profile pair comparisons


1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

String
Technique

Precision
Recall
F1-Measure

0.7

0.75
Threshold value

0.8

Result

Result

 String-based technique vs. String + NER + Semanticbased technique
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

Hybrid
Technique

Precision
Recall
F1-Measure

0.7

0.75

0.8

Threshold value

 New hybrid technique improves the results
considerably over the string-only based one
 F-measure -> more or less stable for thresholds of
0.75 and 0.8.


 Computational performance of hybrid technique vs.
syntactic-only based one
 For this test we selected profile pairs:
 Having a number of common attributes
 At least 1 attribute candidate for semantic matching
40
35

Time (ms)

30
25
20

Syntactic

15

Hybrid

10
5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of Common Attributes

 On average hybrid technique takes ≈15ms more


 Find a deterministic similarity threshold with the
highest degree of confidence
1.0
0.9
0.8
0.7

Result

0.6
0.5
0.4
0.3
0.2
0.1
0.0

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

Precision

0.290

0.317

0.550

0.694

0.806

0.876

0.940

0.947

0.988

Recall

0.805

0.784

0.654

0.600

0.584

0.573

0.508

0.486

0.454

F1-Measure 0.426

0.452

0.598

0.643

0.677

0.693

0.660

0.643

0.622

 Optimal threshold is 0.9 -> F-measure of 0.693

Usability Evaluation (1)

 Quantitative & Qualitative
 Performance of profile matching technique
 Contact matcher run against the two social
networks that user is most active
 Social Networks chosen:
 Number of participants: 16
 Person suggestion page
 Short survey about their user experience


 Usability Evaluation Results:
#Distinct Profiles: 8,415
#Average Profiles per Social Network per
Participant: 262
#Comparisons: 1,041,279
#Person Matching Suggestions: 1,195
#Correct Matches: 975
#Incorrect Matches: 220
#Precision rate: 0.816


 Statistics & Results:
Social Network Integration
– 56.25% : LinkedIn and Facebook
– 25% : LinkedIn and Twitter
– 18.75% : Facebook and Twitter

User Satisfaction
– 50% : Extremely
– 43.8% : Quite a bit
– 0% : Moderately
– 6.3% : A little
– 0% : Not at all


Application 1: Management & Sharing

Application 2: Enhanced Security

Application 3: Networking & Suggestions

Limitations

 Person’s gender is not provided by all social
network APIs
Identify gender based on first name or
surname through NER
 Weights of some profile attributes e.g., first
name, surname are too high
 In some cases they impact the final result too
strongly
More experiments will be conducted to finetune these weights

Future Work

 Consider identification of higher degrees of
semantic relatedness

country

 Enrich technique with other LOD cloud datasets
 Additional social networks targeted

Conclusion

 Profile matching algorithm with:
Semantic Lifting
NER on semi-/un-structured profile information
Linked Open Data to improve the NER process
Semantic matching at the schema level to find
any possible indirect semantic relations
Weighted Profile Attribute Matching

 Quantitative & Qualitative Evaluation
Thank you for your attention

Related Work Comparison

 Existing Profile Matching Approaches based on:
User’s friends
Specific Inverse Functional Properties e.g., email
address
String matching of all profile attribute
Semantic relatedness between text, depending
on remote Knowledge Bases e.g., Wikipedia

 Evaluation of these Approaches:
Technique Evaluation on controlled datasets
No Usability Evaluation

An Ontology-based Technique for Online Profile Resolution

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to An Ontology-based Technique for Online Profile Resolution

Similar to An Ontology-based Technique for Online Profile Resolution (20)

Recently uploaded

Recently uploaded (20)

An Ontology-based Technique for Online Profile Resolution