Your SlideShare is downloading. ×
0
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 - ESWC 2012)

245

Published on

This paper was presented at the Fifth International Workshop on Resource Discovery (RED 2012: http://www.labf.usb.ve/RED2012/) at ESWC 2012 (http://2012.eswc-conferences.org/) Conference in Heraklion, …

This paper was presented at the Fifth International Workshop on Resource Discovery (RED 2012: http://www.labf.usb.ve/RED2012/) at ESWC 2012 (http://2012.eswc-conferences.org/) Conference in Heraklion, Crete, Greece on 27 May 2012.

The full paper can be found at: http://ceur-ws.org/Vol-862/REDp5.pdf

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
245
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • -Users are currently required to create and separately manage duplicated personal data in numerous, heterogeneous online account services-Walled Garden: separate handling of data results in creating a wall around connections and personal data as reflected in the image -> portability, identity, linkability, privacy-Personal data In these accounts: static identity-related information to more dynamic information, as well as physical and online presence.
  • -Focus of study not a straightforward task:1. no common standards exist for modelling profile data in online accounts -> retrieval and integration of federated heterogeneous personal data is instantly a hard task 2. some personal data is dynamic (known contacts and presence information) -> Dealing with the multiple user digital identities can result in being a complex task
  • Aim: enable user to create, aggregate and merge multiple online profiles in one digital identity -One digital identity through Digital.Meuserware: i)a single access point to the user’s personal information sphere, ii) refers to personal data on a user’s multiple devices such as laptops, tablets and smartphones (after challenge) – online profile, their attributes and shared posts.Focus: Integration of multiple user online profiles - (e.g. health, bank, government, social related) but currently our focus is on social networksProposal: This comes in the form of a comprehensive ontology framework, which serves as a standard format for handling static and dynamic profile data (a set of re-used, extended and new vocabularies)
  • Pyramid of the OSCAF Ontologies – adopted by di.meframwork (reused, extended, new) PIM representation uses these ontologies. – based on PIMO, NCO, DLPO For the problem in question (multiple identity integration), of particular relevance are the:NCO : modelling profile attributesPIMO: modelling user’s interests & who knows whom (NCO, PIMO all are established) - glues together knowledge represented by all the other domain upper-level ontologiesLivePost Ontology: modelling online posts (just 1 of a no. of new ontologies being engineered)Other targeting domains: user presence (DPO), context (DCON), history (DUHO), rules (DRMO), devices (DDO), accounts (DAO)In Di.me a no. of established ontologies have been brought together to offer a representation solution tailored for the project's objectives (reused, extended, new)
  • -IFP : a property which uniquely identifies a user : linking based on IFP only is shallow since users can create multiple accounts within the same social network, with a diff email-Personal Information Model -  an instance of PIMO ontology : main KB for semantic matching, knowledge from external KBs-PIM: initially populated with any personal info integrated from a part. online account/crawled from a device. If there is no match of a particular entity, a new instance is created. (there will be one user profile initially)-Adv of PIM: contains info that is of direct interest to the user, thus more relevant to user than external KB – bound to yield more accurate results-remote KBs such as DPBedia or any other dataset that is part of the LOD cloud, will be accessed to determine any possible semantic relationship if no data exists in PIM
  • Online profile matching approach involvesfour successive processes as outlined in the image presented.
  • -Retrieve user’s profile information available through the service account APIs. Info targeted: user’s own identity-related information, online posts, contact’s info. - All crawled info. is aggregated into what we refer to as the user’s ‘super profile’
  • Mapping of attributes for each represented online profile with the equivalent attributes for the super profile -The use of ontologies and RDF (main data representation) -> mapping we pursue considers both syntactic as well as semantic similarities in between online profile data
  • Identity-related online profile information is stored as an instance of the NCO ontology – represents info that is related to a part contactPresence and online post data for the user is stored as instances of DLPO – represents personal presence info that is popularly shared in online accounts e.g. stat msg, checkin, etc.
  • Contacts (NCO instance) and Liveposts (DLPO instance) are linked to instances of accounts (dao:Account), that refer to a particular account e.g. di.me, LinkedIn, Facebook, Twitter
  • -Matching the user profile attributes - we consider the data both at a semantic and syntactic level. It involves four successive processes as outlined in (C)
  • 1. Linguistic Analysis: - on the profile attributes that may contain complex/unstructured information such as a postal address, unlike the ones with an atomic value (person’s name, phone number). Required for discovering further knowledge from a particular value. Also, hyperlink resolution if not enough info within profile.
  • 2. Syntactic Matching: -Value Matching: for attr. of a non-string literal type (e.g. dob or geo pos), since these have a strict, predefined structure -Direct String Matching: for attr. of type ‘string’, if their ontology type (e.g. name, addr) is either known beforehand or discovered through NER -Indirect String Matching: applied if attr. entity remains unknown even after NER is performed, over all PIM instances, regardless of their type -string matching metric – Monge and Elkan: user profile attribute values online to attributes stored in PIM KB
  • 3. Semantic Search Extension: -To find if 2 attributes are semantically related, given that they don’t syntactically match. -user’s PIM is the main KB used, whilst remote KBs e.g. DBPedia or any other dataset in LOD cloud will also be used to determine any possible semantic relationship, if required data not found within the PIM.
  • 4. Ontology-enhanced Attribute Weighting: an appropriate metric is required for weighting the attributes which were syntactically and/or semantically matched
  • -Based on the ontology attribute weighting metric, we establish a threshold which determines semantic equivalence between user online profile and their personal identity which is already known and represented at the PIM level.-Given that 2 profiles are sem. eq., a user can be suggested to merge profile info that’s known over multiple online accounts-Integration of semantically-equivalent personal info across distributed sources will create unique user representation in the PIM
  • XSPARQL - transformation between the XML social data into our RDF representation (Turtle) is declaratively expressed in a XSPARQL queryJSONLib– used to translate JSON into XMLANNIE – contains several main processing resources for common NLP tasks, such as a: tokeniser, sentence splitter, POS tagger, gazetteer, finite state transducer, orthomatcher and coreference resolver -> pre-defined gazetteers for common entity types (e.g. location, organizations, etc.), which we extended with acr. or abbr. where necessaryLarge KB Gazetteer - to make use of the information stored within the user’s PIM, since it can get populated dynamically by loading any ontology from RDF data.
  • -User’s Personal Information Model (PIM) - glues together personal info from different sources in this case:-from an online account (OnlineAccountX) & the user’s super profile (Digital.MeAccount)-attributes of the user online profiles will be mapped to their corresponding properties within the di.me ontology framework-five identity-related profile attributes mapped within NCO (affiliation, organization, phone numer, person name, postal address) -e.g. label of org within the nco:org property i.e. ’Digital Enterprise Research Institute’ is matched against other org instances within the PIM The super profile instance ’DERI’ is one example of other PIM instances having the same type.-Presence-related profile info. available in the form of a complex type ’livepost’, is composed of… - ”Having a beer with Anna @ESWC12 in Iraklion” -> Status & Checkin & Event Post -> result of Linguistic analysis on online post -Semantic search example:-user’s addr in super profile listed as ‘Iraklion, is related to a pimo:City instance – ‘Heraklion’-user’s addr in online profile is ‘GR’, is related to pimo:Country instance –’Greece.’-two addr’s don’t syntactically match but are semantically related-through PIM KB, system knows that city and country instances related to both addr’s are related through ‘locatedWithin’ property -> partial semantic searchAdv of using ontologies: - resources can be linked at the semantic level, rather than the syntactic or format level.pimo:groundingOccurrence property, which relates an ’abstract’ but unique subject to one or more of its occurrences.-upper part of Fig. T-Box -> the ontological classes and attributes / lower part of Fig. A-Box -> egs of how the ontologiescan be used in practice -straight lines between the A- and T-box denote an instance-of relationship
  • Integration of further online service accounts to our current system e.g. Health (RunKeeper), bank, government, social related accounts (Foursquare, Dropbox, Flickr)Metric: takes into account all the resulting weighted matches which were syntactically and/or semantically matched or partially matched>Threshold: determines whether two or more online profile refer to the same person-Evaluation: performed on 3 levels: syntactic matching, ii) semantic matching, and iii) a combination of
  • -Overall di.me Objective: integrating all personal data in a personal information sphere by a single, user-controlled single point of access: the di.meuserware.-Our part in di.me: WP3 – Objectives and Tasks mentioned in slide
  • Transcript

    • 1. Digital Enterprise Research Institute www.deri.ie Discovering Semantic Equivalence of People behind Online Profiles Keith Cortis, Simon Scerri, Ismael Rivera, Siegfried Handschuh REsource Discovery (RED), Workshop at ESWC 2012 27th May 2012 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Enabling Networked Knowledge
    • 2. Motivation Digital Enterprise Research Institute www.deri.ie Current situation:  Personal data is unnecessarily duplicated over different platforms  No possibility to merge or port such data  Separate handling of this data Social Networking Sites as Walled Gardens – David Simonds Enabling Networked Knowledge
    • 3. Problem Specification Digital Enterprise Research Institute www.deri.ie  No common standards exist for modelling profile data in online accounts  Personal data (known contacts and presence information) is dynamic and continuously changing Enabling Networked Knowledge
    • 4. Objectives Digital Enterprise Research Institute    www.deri.ie Aim: User represented through one digital identity Main Challenge: Discovery of semantic equivalence between contacts described in online profiles Proposal: Use a comprehensive ontology framework for handling online profile data Enabling Networked Knowledge
    • 5. di.me Ontology Framework Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge
    • 6. Related Work Comparison Digital Enterprise Research Institute  www.deri.ie Existing Profile Linking Approaches based on: o o Specific Inverse Functional Properties (e.g. email address) o Syntactic matching of all profile attributes o  User’s friends Semantic relatedness between text, depending on Knowledge Bases (KB) such as Wikipedia Our Approach: Similarity measure based on user’s Personal Information Model (PIM) PIM Enabling Networked Knowledge
    • 7. Approach (1) Digital Enterprise Research Institute www.deri.ie A User Profile Data B Ontology Mapping C Matching Attributes D Value Matching Indirect String Matching Linguistic Analysis 2 Syntactic Matching Direct String Matching 1 3 4 Semantic Search Extension Ontologyenhanced Attribute Weighting Online Profile Resolution Enabling Networked Knowledge
    • 8. Approach (2) Digital Enterprise Research Institute www.deri.ie A User Profile Data B Ontology Mapping C Matching Attributes D Value Matching Indirect String Matching Linguistic Analysis 2 Syntactic Matching Direct String Matching 1 3 4 Semantic Search Extension Ontologyenhanced Attribute Weighting Online Profile Resolution Enabling Networked Knowledge
    • 9. Approach (3) Digital Enterprise Research Institute www.deri.ie A User Profile Data B Ontology Mapping C Matching Attributes D Value Matching Indirect String Matching Linguistic Analysis 2 Syntactic Matching Direct String Matching 1 3 4 Semantic Search Extension Ontologyenhanced Attribute Weighting Online Profile Resolution Enabling Networked Knowledge
    • 10. Approach (3) Digital Enterprise Research Institute www.deri.ie  Identity-related online profile information - NCO  Presence and online post data for the user – DLPO Enabling Networked Knowledge
    • 11. Approach (3) Digital Enterprise Research Institute  www.deri.ie Account Ontology (DAO) – for modelling service account representations DLPO representative Contact DAO LivePost MultimediaPost PresencePost WebDocumentPost Message Account source source hasCredentials Credentials nao:externalIdentifier rdfs:label rdfs:label userID password xsd:string hasCustomAttribute NCO PersonContact photo key sound foafUrl OrganizationContact rdfs:Resource websiteUrl blogUrl nie:DataObject EmailAddress hasEmailAddressbelongsToGroup ContactGroup PostalAddress hasPostalAddress PhoneNumber hasPhoneNumber hasLocation geo:Point hasIMAccount Name IMAccount hasName Enabling Networked Knowledge
    • 12. Approach (4) Digital Enterprise Research Institute www.deri.ie A User Profile Data B Ontology Mapping C Matching Attributes D Value Matching Indirect String Matching Linguistic Analysis 2 Syntactic Matching Direct String Matching 1 3 4 Semantic Search Extension Ontologyenhanced Attribute Weighting Online Profile Resolution Enabling Networked Knowledge
    • 13. Approach (4) Digital Enterprise Research Institute www.deri.ie A User Profile Data B Ontology Mapping C Matching Attributes D Value Matching Indirect String Matching Linguistic Analysis 2 Syntactic Matching Direct String Matching 1 3 4 Semantic Search Extension Ontologyenhanced Attribute Weighting Online Profile Resolution Enabling Networked Knowledge
    • 14. Approach (4) Digital Enterprise Research Institute www.deri.ie A User Profile Data B Ontology Mapping C Matching Attributes D Value Matching Indirect String Matching Linguistic Analysis 2 Syntactic Matching Direct String Matching 1 3 4 Semantic Search Extension Ontologyenhanced Attribute Weighting Online Profile Resolution Enabling Networked Knowledge
    • 15. Approach (4) Digital Enterprise Research Institute www.deri.ie A User Profile Data B Ontology Mapping C Matching Attributes D Value Matching Indirect String Matching Linguistic Analysis 2 Syntactic Matching Direct String Matching 1 3 4 Semantic Search Extension Ontologyenhanced Attribute Weighting Online Profile Resolution Enabling Networked Knowledge
    • 16. Approach (4) Digital Enterprise Research Institute www.deri.ie A User Profile Data B Ontology Mapping C Matching Attributes D Value Matching Indirect String Matching Linguistic Analysis 2 Syntactic Matching Direct String Matching 1 3 4 Semantic Search Extension Ontologyenhanced Attribute Weighting Online Profile Resolution Enabling Networked Knowledge
    • 17. Approach (5) Digital Enterprise Research Institute www.deri.ie A User Profile Data B Ontology Mapping C Matching Attributes D Value Matching Indirect String Matching Linguistic Analysis 2 Syntactic Matching Direct String Matching 1 3 4 Semantic Search Extension Ontologyenhanced Attribute Weighting Online Profile Resolution Enabling Networked Knowledge
    • 18. Implementation Digital Enterprise Research Institute www.deri.ie  Transformation  Linguistic Analysis ANNIE Information Extraction System Large KB Gazetteer Lookup “DERI, Lower Dangan, Galway, Ireland” PIM Organisation Street City Country Enabling Networked Knowledge
    • 19. Final Objective Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge
    • 20. Summary Digital Enterprise Research Institute  www.deri.ie  Objectives o o Future Work Aggregated profile data is lifted onto a unique PIM representation and integrated in a super profile o Integration of further online accounts o Semantic extension to the syntactic-based profile attribute matching o Definition of a metric o Analysis of online posts from multiple accounts o Determination of semantic equivalence between contacts described in online profiles Evaluation of artefact Thank you for your attention keith.cortis@deri.org Enabling Networked Knowledge

    ×