Successfully reported this slideshow.

An Ontology-based Technique for Online Profile Resolution

590 views

Published on

This paper was presented at the 5th International Conference on Social Informatics (http://www.socinfo2013.com/) in Kyoto, Japan on 27 November 2013.

The full paper can be found at: http://link.springer.com/chapter/10.1007%2F978-3-319-03260-3_25

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

An Ontology-based Technique for Online Profile Resolution

  1. 1. www.insight-centre.org An Ontology-based Technique for Online Profile Resolution Keith Cortis, Simon Scerri, Ismael Rivera, Siegfried Handschuh International Conference on Social Informatics Kyoto, Japan 27th November 2013
  2. 2. Introduction (1) www.insight-centre.org  Instance Matching : if two instances / representations refer to the same real world entity or not e.g., persons  Research Challenge : Discovery of multiple online profiles that refer to the same person identity on heterogeneous social networks
  3. 3. Introduction (2) www.insight-centre.org  Improved profile matching system extended with:  Named Entity Recognition  Linked Open Data  Semantic Matching Additional Benefit: Ontology used background schema  Advantage: Standard schema enables cross-network interoperability  as a
  4. 4. Motivation www.insight-centre.org  Contact Matcher Applications:  Control sharing of personal data  Detection of fully or partly anonymous contacts o > 83 million fake accounts  New contacts suggestions that are of direct interest to user
  5. 5. Profile Resolution Technique www.insight-centre.org 1 User Profile Data Extraction NCO 2 Semantic Lifting 3 Named Entity Recognition Name ANNIE IE System Surname Large KB Gazetteer City 4 Hybrid Matching Process a Attribute Value Matching b c Semantic-based Matching Extension City Country Country country 5 Online Profile Suggestions 6 Online Profile Merging Attribute Weighting Function
  6. 6. Profile Resolution Technique www.insight-centre.org 1 User Profile Data Extraction 2 Semantic Lifting
  7. 7. Semantic Lifting www.insight-centre.org  Lifting semi-/un-structured profile information from a remote schema  Transform information to instances of the Contact Ontology (NCO)  NCO - Identity-related online profile information
  8. 8. Profile Resolution Technique www.insight-centre.org 1 User Profile Data Extraction NCO 2 Semantic Lifting 3 Named Entity Recognition Name ANNIE IE System Large KB Gazetteer Surname City 4 Hybrid Matching Process a Attribute Value Matching Country
  9. 9. Attribute Value Matching www.insight-centre.org  Direct Value Comparison  String Matching Best string matching metric for each attribute type
  10. 10. Profile Resolution Technique www.insight-centre.org 1 User Profile Data Extraction NCO 2 Semantic Lifting 3 Named Entity Recognition Name ANNIE IE System Large KB Gazetteer Surname City 4 Hybrid Matching Process a Attribute Value Matching b Semantic-based Matching Extension City Country country Country
  11. 11. Semantic-based Matching www.insight-centre.org  Indirect semantic relations at a schema level  Use-case: Location-related profile attributes  Location sub-entities being semantically compared are: city, region and country  Find the semantic relations between the subentities in question in a bi-directional manner  E.g. Galway (profile 1) vs. Ireland (profile 2) Galway locatedWithin Ireland Ireland country isPartOf isLocationOf containsLocation Galway capital largestCity
  12. 12. Profile Resolution Technique www.insight-centre.org 1 User Profile Data Extraction NCO 2 Semantic Lifting 3 Named Entity Recognition Name ANNIE IE System Surname Large KB Gazetteer City 4 Hybrid Matching Process a Attribute Value Matching b c Semantic-based Matching Extension City Country country Country Attribute Weighting Function
  13. 13. Attribute Weighting Function www.insight-centre.org  Approach 1: Direct Similarity Score Name Justin Bieber Similarity Value J. Bieber 0.90  Approach 2: Normalised Similarity Score based on a threshold for each attribute type Attribute Threshold for Name : 0.70 Name Justin Bieber J. Bieber Metric Similarity Value 0.90 Similarity Value 1.0 Name Justin Bieber Joffrey Baratheon Metric Similarity Value 0.4 Similarity Value 0.0
  14. 14. Profile Resolution Technique www.insight-centre.org 1 User Profile Data Extraction NCO 2 Semantic Lifting 3 Named Entity Recognition Name ANNIE IE System Surname Large KB Gazetteer City 4 Hybrid Matching Process a Attribute Value Matching b c Semantic-based Matching Extension City Country Country country 5 Online Profile Suggestions Attribute Weighting Function
  15. 15. Online Profile Suggestions www.insight-centre.org Name Joffrey Baratheon Joff Baratheon City King’s Landing King’s Landing Role King King 286AL 286AL Date of Birth Similarity Score 0.95 Similarity Threshold: 0.90 Name Joffrey Baratheon Joffrey Bieber City King’s Landing London, Ontario Role King Singer 286AL 01/03/1994 Date of Birth Similarity Score 0.30
  16. 16. Online Profile Suggestions www.insight-centre.org
  17. 17. Profile Resolution Technique www.insight-centre.org 1 User Profile Data Extraction NCO 2 Semantic Lifting 3 Named Entity Recognition Name ANNIE IE System Surname Large KB Gazetteer City 4 Hybrid Matching Process a Attribute Value Matching b c Semantic-based Matching Extension City Country Country country 5 Online Profile Suggestions 6 Online Profile Merging Attribute Weighting Function
  18. 18. Experiments & Evaluation www.insight-centre.org  Two-staged evaluation: 1. Technique a) Best attribute similarity score approach b) If NER & semantic-based matching extension improve overall technique c) The computational performance of hybrid technique against the syntactic-based one d) A similarity threshold that determines profile equivalence within a satisfactory degree of confidence 2. Usability e) Level of precision for the profile matching
  19. 19. Technique Evaluation www.insight-centre.org  Two Datasets: 1. A controlled dataset of public profiles obtained from the Web (LinkedIn and Twitter)  182 online profiles – – 112 ambiguous real-world persons (common attributes) 70 refer to 35 well-known sports journalists  Maximised False Positives 2. Private personal and contact-list profiles obtained from 5 consenting participants
  20. 20. Technique Evaluation – Experiment 1 www.insight-centre.org  Profile attribute similarity score that fares best 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Normalised Approach Precision Recall F1-Measure 0.7 0.75 0.8 0.85 Threshold value 0.9 Results Result Direct Approach 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Precision Recall F1-Measure 0.7 0.75 0.8 0.85 0.9 Threshold value  Direct Approach outperforms Normalised Approach  8631 online profile pair comparisons
  21. 21. Technique Evaluation – Experiment 2 www.insight-centre.org 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 String Technique Precision Recall F1-Measure 0.7 0.75 Threshold value 0.8 Result Result  String-based technique vs. String + NER + Semanticbased technique 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Hybrid Technique Precision Recall F1-Measure 0.7 0.75 0.8 Threshold value  New hybrid technique improves the results considerably over the string-only based one  F-measure -> more or less stable for thresholds of 0.75 and 0.8.
  22. 22. Technique Evaluation – Experiment 3 www.insight-centre.org  Computational performance of hybrid technique vs. syntactic-only based one  For this test we selected profile pairs:  Having a number of common attributes  At least 1 attribute candidate for semantic matching 40 35 Time (ms) 30 25 20 Syntactic 15 Hybrid 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Number of Common Attributes  On average hybrid technique takes ≈15ms more
  23. 23. Technique Evaluation – Experiment 4 www.insight-centre.org  Find a deterministic similarity threshold with the highest degree of confidence 1.0 0.9 0.8 0.7 Result 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 Precision 0.290 0.317 0.550 0.694 0.806 0.876 0.940 0.947 0.988 Recall 0.805 0.784 0.654 0.600 0.584 0.573 0.508 0.486 0.454 F1-Measure 0.426 0.452 0.598 0.643 0.677 0.693 0.660 0.643 0.622  Optimal threshold is 0.9 -> F-measure of 0.693
  24. 24. Usability Evaluation (1) www.insight-centre.org  Quantitative & Qualitative  Performance of profile matching technique  Contact matcher run against the two social networks that user is most active  Social Networks chosen:  Number of participants: 16  Person suggestion page  Short survey about their user experience
  25. 25. Usability Evaluation (2) www.insight-centre.org  Usability Evaluation Results: #Distinct Profiles: 8,415 #Average Profiles per Social Network per Participant: 262 #Comparisons: 1,041,279 #Person Matching Suggestions: 1,195 #Correct Matches: 975 #Incorrect Matches: 220 #Precision rate: 0.816
  26. 26. Usability Evaluation (3) www.insight-centre.org  Statistics & Results: Social Network Integration – 56.25% : LinkedIn and Facebook – 25% : LinkedIn and Twitter – 18.75% : Facebook and Twitter User Satisfaction – 50% : Extremely – 43.8% : Quite a bit – 0% : Moderately – 6.3% : A little – 0% : Not at all
  27. 27. Usability Evaluation (4) www.insight-centre.org Application 1: Management & Sharing Application 2: Enhanced Security Application 3: Networking & Suggestions
  28. 28. Limitations www.insight-centre.org  Person’s gender is not provided by all social network APIs Identify gender based on first name or surname through NER  Weights of some profile attributes e.g., first name, surname are too high  In some cases they impact the final result too strongly More experiments will be conducted to finetune these weights
  29. 29. Future Work www.insight-centre.org  Consider identification of higher degrees of semantic relatedness country  Enrich technique with other LOD cloud datasets  Additional social networks targeted
  30. 30. Conclusion www.insight-centre.org  Profile matching algorithm with: Semantic Lifting NER on semi-/un-structured profile information Linked Open Data to improve the NER process Semantic matching at the schema level to find any possible indirect semantic relations Weighted Profile Attribute Matching  Quantitative & Qualitative Evaluation Thank you for your attention
  31. 31. Related Work Comparison www.insight-centre.org  Existing Profile Matching Approaches based on: User’s friends Specific Inverse Functional Properties e.g., email address String matching of all profile attribute Semantic relatedness between text, depending on remote Knowledge Bases e.g., Wikipedia  Evaluation of these Approaches: Technique Evaluation on controlled datasets No Usability Evaluation

×