1. PhD candidate: Hussein Hazimeh
Director: Prof. Philippe Cudré-Mauroux / UNI-FR
Co-Director: Prof. Elena Mugellini / HES-SO
28.06.2019
Automatic Knowledge Graph
Entity Refinement Based on Social
Networks
4. • What is a knowledge graph?
Objectives:
• KGs Allow users to visualize knowledge facts about real-world entities (nodes) and the interrelations between
them (edges).
Data source:
• Incorporate knowledge from structured repositories such as DBpedia.
• Extract knowledge from semi-structured web resources such as Wikipedia.
Privacy:
• KGs composed into private (can’t use/analyse its knowledge) and public (can use/analyse its knowledge).
29.06.2019 Hussein Hazimeh PhD presentation 4
Introduction: Knowledge Graphs (1/3)
Introduction Challenges & RQs SOA Contributions Conclusions Future work
5. 29.06.2019 Hussein Hazimeh PhD presentation 5
Introduction: Knowledge Graphs (2/3)
Introduction Challenges & RQs SOA Contributions Conclusions Future work
• Why using knowledge graphs?
Transfer data into knowledge.
Knowledge represented in the form of entity and relations.
Connect different types of data.
Improve decisions by finding things faster.
Re-use of publicly available industry graphs and ontologies.
Readable by humans and machines.
Have been in use for all types of industries (gas, pharmatical, banking, and retail).
6. 29.06.2019 Hussein Hazimeh PhD presentation 6
Introduction: Knowledge Graphs facts (3/3)
Introduction Challenges & RQs SOA Contributions Conclusions Future work
Google knowledge panel Wikidata KG YAGO KG
8. 29.06.2019 Hussein Hazimeh PhD presentation 8
Research problems – (1) knowledge graphs
Introduction Challenges & RQs SOA Contributions Conclusions Future work
• Importance of solving this problem:
1. Automation
• Manual => time-consuming
• Automatic => faster
2. Utility for other systems
• Digital libraries
• Recommendation systems
• Recruiting systems
• Problem (P1) – missed links in knowledge graphs:
• Knowledge graphs (KGs) are missing entity's social links to
(Facebook, Twitter).
• Unsolving this problem leads to a
time-consuming manual search for these links.
• Consequently, because of many systems (data sources) rely on these
KGs, they will be missing these links as well.
9. • Problem (P2) – matching challenges on social networks:
• Online Social Networks (OSNs) number of users is increasing, => user profile de-anonymization task
becomes more hardly.
75,980 Facebook user’s named “John Smith”.
• Privacy and access control policies users apply them, limit the access to certain information about
individuals.
• Un-updated user profile data (location, work, profile image, etc.).
• Non-identical information (users can share different information on their OSN profiles).
• Importance of solving this problem
• Automate the user linking process in order to transfer into a time-efficient approach.
• Find new privacy-aware attributes for matching (biographies).
• Find new context-aware attributes for matching(life events).
29.06.2019 Hussein Hazimeh PhD presentation 9
Research Challenges – (2) online social networks
Introduction Challenges & RQs SOA Contributions Conclusions Future work
2008<100M 2019>2B
10. • 100 entities per each class are examined.
• Both private (Google) and public (Wikidata)
• Knowledge graphs are considered.
• Examination results show that:
• The average # of social links does not exceed 50%.
• Academic entities have the lowest # of social links <4%.
29.06.2019 Hussein Hazimeh PhD presentation 10
Motivating scenario
Introduction Challenges & RQs SOA Contributions Conclusions Future work
Motivation scenario
12. 1. General research question to address:
• Given a certain knowledge graph, how can we reinforce such entities inside this
graph?
2. Specific research questions to address:
• RQ1: How to profile online social networking users platforms?
• RQ2: Given an entity and its knowledge graph, what are the likely
corresponding OSN profile links of this entity?
• RQ3: Given a user profile on Facebook, what are the corresponding user profile
links on other OSNs?
29.06.2019 Hussein Hazimeh PhD presentation 12
Research Questions
Introduction Challenges & RQs SOA Contributions Conclusions Future work
Concerns P1.
Concerns P2.
14. User Profile Analytics and Discovery on Social Networks.C1
Comparative Review of Social Network Profile Matching Methods.C2
Novel Method for Interlinking User Profiles on Social Networks.C3
Automatic Embedding of Academic Entity Social Links into Knowledge Graphs.C4
Automatic Embedding of Social Event Sentiment Polarities into Knowledge Graphs.C5
29.06.2019 Hussein Hazimeh PhD presentation 14
Contributions
Introduction Challenges & RQs SOA Contributions Conclusions Future work
Google
scholar
Data sources
KB
Knowledge
bases
Existing
New KGs
Embedding
Building new KGs
Life events
Find social link and sentimentProfiling
Sentiment
Biographies
Sentiment features
C2
C3
C3C4
C3C5
Answers RQ1.
Answers RQ3.
Answers RQ2.
15. • Knowledge graph embedding:
• Approaches for knowledge graph completion:
• General methods: internal and external.
• Translation Based Methods: tanslate entity/relation from head
to tail: TransE [9], TransH [102], TransR [56], TranSpace
[43].
• Dataset profiling:
• Social networks [65] YouTube, Flickr, Orkut.
• Other datasets: knowledge graphs [78].
• Sentiment analysis on social events:
• Lexicon-based [48, 105].
• Supervised-based [58].
29.06.2019 Hussein Hazimeh PhD presentation 15
Related work
Introduction Challenges & RQs SOA Contributions Conclusions Future work
Reference Method Sources
[103] External Search engines
[46] Internal Reinforcement learning
[96] External Social media
[86] External DBLP, Microsoft academic search
[31] Internal LSH
[19] Internal,
external
External knowledge graphs
[92] External Social networks
[114, 89, 26] Internal Machine leraning
car driverfarmer
Farming
Skills
16. C1: User Profile Analytics and
Discovery on Social Networks
29.06.2019 Hussein Hazimeh PhD presentation 16
17. • Dataset profiling is the task of creating descriptive metadata about such entities.
An entity can be a person, organization, database, etc.
• Profiling methods and measures.
• Global measures: measure the facts of a dataset; its size, its shape, etc.
• Platform measures: study the dataset features according to its shape, database or graph
dataset, for example.
• Our task: analyze datasets of user profiles from 4 OSNs:
• Facebook DF, Google+ DG+, LinkedIn DL, Twitter DT
29.06.2019 | Hussein Hazimeh PhD presentation 17
User Profile Analytics and Discovery (UPAD)
Introduction Challenges & RQs SOA Contributions Conclusions Future work
C2
C2
C3 C4 C5C1
I. Platform-based measure
(1) Attribute
availability
(2) Activity
frequency
(3) Profile
completeness
(4) Mutability
index
II. Entity-based
measure
(1) Profile
confidentiality
18. • Study for each attribute in DS its % of availability.
• Availability of AS on DS.
• Results:
• Highest available: screennames.
• LinkedIn and Google+: highest and lowest image availability respectively.
• Google+: lowest education compared to Facebook; location rarely available.
• LinkedIn: location available; however, medium range between Facebook and Twitter.
• Bio on LinkedIn and Google+ highly available respectively, and similarly exists on Facebook and Twitter.
29.06.2019 18
Platform-based Measure: (1) Attribute Availability
Results
Introduction Challenges & RQs SOA Contributions Conclusions Future work
35.6%
38.1%
| Hussein Hazimeh PhD presentation C2 C3 C4 C5C1
19. • Activity frequency: the amount of public content.
• (tweets / posts / shares / retweets) published on a user
timeline (wall).
• 5 features analyzed
• Only the highly available features are considered.
• Results
• Text highest available content on all OSNs except
LinkedIn.
• Facebook: highest content/user.
29.06.2019 19
Platform-based Measure: (2) Activity Frequency
Results
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation
Number
of links (l)
Text (t)
Check-ins
(c)
Photos (p) Tags (t)
Mentions
(m)
C2 C3 C4 C5C1
20. • Profile completeness: the portion of
public-only attributes.
• Classes:
• Results:
• Surprisingly, only a very small portion of
user profiles on OSNs are totally
incomplete.
29.06.2019 20
Platform-based Measure: (3) User Profile Completeness
Results
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation
complete-
without-A
only-A
fully
complete
uncomplete
C2 C3 C4 C5C1
21. • Mutability index:
• A user U has an attribute A, where the old value of A = Ao
and the new value of A = An.
• Results:
• Facebook, Google+, and Twitter: the biography is the
highest mutable attribute.
• LinkedIn: publication is highest mutable.
• Contrary, screenname, gender, and birthdate were the lowest
mutable profile attributes.
29.06.2019 21
Platform-based Measure: (4) Mutability Index
Hypothesis
Results
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C2 C3 C4 C5C1
22. • Confidentiality model: responsible for calculating the probability of a
user account if confident (real) or non-confident (fake).
• To study the confidentiality: we consider a set of profile attributes and
assign a weight for each one.
• Results:
• LinkedIn and Facebook profiles are the highest confident, compared to
other OSNs having 95.3% and 83.78% respectively.
• Google+ and Twitter: mostly “unlikely confident” having 22.2% non-
confident score for Google+ and 37.76% for Twitter.
29.06.2019 22
Entity-based Measure: (1) Profile Confidentiality
Model
Results
Introduction Challenges & RQs SOA Contributions Conclusions Future work
95.3%
83.7%
37.7%
22.2%
| Hussein Hazimeh PhD presentation C2 C3 C4 C5C1
Hussein Hazimeh, Elena Mugellini, Omar Abou Khaled. « Reliable User Profile Analytics and Discovery on
Social Networks. » In 8th International Conference on Software and Computer Applications - ACM (ICSCA
2019). Penang, Malaysia
23. C2: Comparative Review of Social
Network Profile Matching Methods
29.06.2019 23| Hussein Hazimeh PhD presentation
24. • Problem:
Existence of user profiles belonging to a single
user across different social networking sites.
• Challenge:
Link same profiles across OSNs, because of
different information between these profiles.
• How?
Leverage a set of features from user profiles and
social network.
Introduce a similarity measure depending on the
context of the attribute value (text, date, etc.).
Develop the matching algorithm.
29.06.2019 24
Profile matching definition
Kiwifruit
Problem: match the Kiwifruit on
both baskets (b1 and b2).
• color: green
• Shape: oval
• Seeds: edible
• Texture: soft
Features
b1 b2
John profiles on multiple
OSNs
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C3 C4 C5C1 C2
25. • More than 16 distinct features have been used.
• Attribute-based:
• Location, Work, Image
• Behavioral-based:
• Timestamps, Writing style, Topic detection.
• Do all features used by the SOA?
• NO!
• Do all features lead to efficient matching results?
• NO!
• Behavioral-based cons:
• The Trade-off in activity between different accounts.
• Attribute-based features cons:
• Privacy and access control methods users apply.
29.06.2019 25
Profile matching approaches
Comparative analysis
Introduction Challenges & RQs SOA Contributions Conclusions Future work
Faceboo
k
36%
Twitter
33%
Google+
9%
LinkedIn
12%
Flickr
10%
FB<-
>TW
44%
FB<->Lin
17%
TW<->Lin
22%
TW<->FL
17%
1 1
3
2
3 3
5
6 6
0
1
2
3
4
5
6
7
2007 2009 2011 2013 2015
Content and
behavioral
39%
Profile
attributes
55%
Both
6%
Resources Associations Feature types
Publications by year Similarity methods tree
| Hussein Hazimeh PhD presentation C3 C4 C5C1 C2
Hussein Hazimeh, Elena Mugellini, Omar Abou Khaled, Philippe Cudré-Mauroux. «Linking user profiles in social networks: a
comparative review. » International Journal of Social Network Mining, Volume 2: 333-361 - 2017.
26. C3: Automatic Embedding of Academic
Entity Social Links into Knowledge Graphs
29.06.2019 26| Hussein Hazimeh PhD presentation
27. • We propose a query-based approach for
social link embedding.
1. Query: entity name
A. Knowledge acquisition:
i. Google Scholar mainly
ii. Wikidata Metaphacts.
2. Profile matching
A. Knowledge base to social network matching
i. F-Link
ii. K-Link
3. Machine learning
A. Bottom-up paradigm
i. Clustering
ii. Classification
4. Enrichment and storage
A. Results embedding to Wikidata
B. Semantic storage and visualization
29.06.2019 27
Embedding social profile links to knowledge graphs
Architecture
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
28. • F-Link is an algorithm that is responsible to find the
Facebook profile link of a specific entity.
• It is composed of 5 main steps:
1. Knowledge base construction.
2. Facebook search (by name).
3. Similarity calculation.
4. Classification/clustering.
5. Facebook profile link output (with knowledge base).
29.06.2019 28
Embedding social profile links to knowledge graphs
F-Link algorithm (Matcher 1 – M1)
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
Knowledge
base
Facebook
search
Similarity
calculation
Classification/
Clustering
Facebook profile
link
F-Link steps
29. • Knowledge acquisition
• The initial Knowledge Base (KB) is constructed
from Google Scholar (GS) and Wikidata (WD).
𝐾𝐵 = 𝐺𝑆 ⊕ 𝑊𝐷
• Knowledge from both resources is integrated into
one KB.
• GS feature extraction:
• Profile headers.
• Profile publications.
• Wikidata feature extraction
• The Wikidata knowledge graph of the corresponding
entity name:
• Result: knowledge base.
29.06.2019 29
F-link method: Knowledge base construction
Google scholar sample
Wikidata sample
Introduction Challenges & RQs SOA Contributions Conclusions Future work
⊕
KB
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
Knowledge
base
Facebook
search
Similarity
calculation
Classification/
Clustering
Facebook profile
link
F-Link steps
30. • F-link: F stands for Facebook; goal: finds the Facebook profile link.
1. Facebook (FB):
• Profile data extraction.
• Extracted data is stored in knowledge base 𝐾𝐵𝑓.
• Other profile extractions, such as: workplace and living place.
2. Google scholar:
• Profile headers.
• Biography from PDF publications.
3. Wikidata
4. F-Link produces a link 𝐿 𝑓, 𝐿𝑓 = 𝐾𝐵 ⊗ 𝐾𝐵𝑓
29.06.2019 30
F-link algorithm
Screenname first name and last name
Biography short description about the profile owner
Content collection of posts, sharing, etc…
Introduction Challenges & RQs SOA Contributions Conclusions Future work
KB KBf
⊗
𝑛 𝑝𝑟𝑜𝑓𝑖𝑙𝑒𝑠
1 𝑝𝑟𝑜𝑓𝑖𝑙𝑒
𝑛𝑎𝑚𝑒
facebook.com/rob.tibshirani
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
Knowledge
base
Facebook
search
Similarity
calculation
Classification/
Clustering
Facebook profile
link
F-Link steps
31. • Each profile feature has a context:
• Semantic
• Syntactic
• Pairs {𝑠1, 𝑠2} of features from 𝐾𝐵 𝑎𝑛𝑑 𝐾𝐵𝑓 are
matched using the compatible similarity measure.
• Entity name
• N-gram
• [𝐾𝐵 : “Robert Tibshirani”] , [𝐾𝐵𝑓: “Rob
Tibshirani”]
• Affiliation and biography
• Stop words removal
• NER
• Cosine similarity
29.06.2019 31
F-link method: similarity measures
Affiliation
• Text
Tokenize
• NER
Syntactic
• Metric
Affiliation similarity workflow
Professor at the
University of
Stanford
e = [“University of
Stanford”: Organization,
“Professor”: Object]
[0, 1]
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
Knowledge
base
Facebook
search
Similarity
calculation
Classification/
Clustering
Facebook profile
link
F-Link steps
32. • K-link: K stands for Twitter and LinkedIn; goal: finds the
Twitter and LinkedIn profile links (FB, TW) and (FB,
LIn).
• It is composed of 5 main steps:
1. Knowledge base construction.
2. Twitter/LinkedIn search (by name).
3. Similarity calculation.
4. Classification/clustering.
5. Twitter/LinkedIn profile link output.
29.06.2019 32
Embedding social profile links to knowledge graphs
K-Link algorithm (Matcher 2 – M2)
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
Knowledge
base
Twitter/LinkedIn
search
Similarity
calculation
Classification/
Clustering
Twitter/LinkedIn
profile link
K-Link steps
33. • Matcher initial input: “reliable” Facebook set of profile links
from the F-Link matcher.
• The knowledge base is constructed from the data inside each
profile.
• Screenname, affiliation, life events, biography, etc.
• The key features of this matcher:
1. Life events
2. Biographies
29.06.2019 33
K-link algorithm
Introduction Challenges & RQs SOA Contributions Conclusions Future work
KBf KBk
⊗
𝑛 𝑝𝑟𝑜𝑓𝑖𝑙𝑒𝑠
1 𝑝𝑟𝑜𝑓𝑖𝑙𝑒
𝑛𝑎𝑚𝑒
linkedin.com/rob.tibshirani
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
Knowledge
base
Twitter/LinkedIn
search
Similarity
calculation
Classification/
Clustering
Twitter/LinkedIn
profile link
K-Link steps
34. • Facebook, Twitter, and LinkedIn
1. Basic profile attributes
2. Life events
3. Biography
• Formally, let 𝑇 , 𝐿 and 𝐹 represent Twitter and
LinkedIn OSNs, respectively.
• The profile of a user i in either T , L or F is represented
as 𝑃𝑖
𝑠
where s ∈ {T, L, F}.
• The profile attributes of a user i is modeled as follows
𝑃𝑖
𝑠
= {𝑛𝑖
𝑠
, 𝑙𝑖
𝑠
, 𝑒𝑖
𝑠
, 𝑏𝑖
𝑠
, 𝑝𝑖
𝑠
, 𝑑𝑖
𝑠
}.
• n denotes the screenname, l denotes the location, e
denotes the life events, b denotes the profile biography,
p denotes the profession and d denotes his birthday
date.
29.06.2019 34
K-link method: feature extraction
Introduction Challenges & RQs SOA Contributions Conclusions Future work
Life event
• Text
Semantic
• LDA
Life event similarity workflow
Started a new
job at Google
e = “new job”
Life event search mechanism
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
Knowledge
base
Twitter/LinkedIn
search
Similarity
calculation
Classification/
Clustering
Twitter/LinkedIn
profile link
K-Link steps
35. • Similarity measure is conducted over a set of attributes:
• Screennames
• Life events
• Biographies
• Affiliations
• Locations
• Birthdates
• IDs
• Each attribute is matched with a specific similarity metric.
29.06.2019 35
K-link method: similarity measures
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
Knowledge
base
Twitter/LinkedIn
search
Similarity
calculation
Classification/
Clustering
Twitter/LinkedIn
profile link
K-Link steps
36. • In step 3, matching results is manipulated in a bottom-up machine learning
paradigm.
1. Clustering
2. Classification
• The dataset used for this task includes the similarity result of each pair of
attributes from two social networks.
• Why did we combine clustering and classification models?
29.06.2019 36
Combine clustering and classification methods to
find the social profile links
Introduction Challenges & RQs SOA Contributions Conclusions Future work
ClassificationClustering
• Label
prediction
• More
Powerful
• Class
prediction
Insufficientl
labeleddata
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
Knowledge
base
Twitter/LinkedIn
search
Similarity
calculation
Classification/
Clustering
Twitter/LinkedIn
profile link
K-Link steps
Knowledge
base
Facebook
search
Similarity
calculation
Classification/
Clustering
Facebook profile
link
F-Link steps
37. • Clustering
• Each profile similarity represents a feature vector.
• Only the cluster with the highest confidence (𝐶𝑓).
• Each cluster average is calculated.
• The clusters are filtered and the one with the highest average is
considered in the classification task.
29.06.2019 37
1- Data clustering
C1=0.2,0.2,0.5
C2=0.4,0.4,0.5
C3=0.8,0.8,0.8
Average
C3
Introduction Challenges & RQs SOA Contributions Conclusions Future work
𝑪 𝒇 = 0.8
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
Knowledge
base
Twitter/LinkedIn
search
Similarity
calculation
Classification/
Clustering
Twitter/LinkedIn
profile link
K-Link steps
Knowledge
base
Facebook
search
Similarity
calculation
Classification/
Clustering
Facebook profile
link
F-Link steps
38. • Classification
• Each profile in the previous confidence cluster is classified.
• The binary classification task is composed of two classes: “match”
or “not match”.
• Only the cluster with the highest confidence (𝐶𝑓).
• We use Bayesian naïve classifier (BNC).
• The feature vector
𝑣 = [𝑥1, 𝑥2, … , 𝑥 𝑛].
𝑥 𝑛 = similarity between a pair of attributes (life event for e.g.).
• In addition, we compared the performance of BNC with two other
methods: SVM and decision trees.
• Result of the classifier:
• Facebook link (𝐿 𝑓) and Twitter/LinkedIn link (𝐿 𝑘).
29.06.2019 38
2- Data classification
BNC tree example
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
Knowledge
base
Twitter/LinkedIn
search
Similarity
calculation
Classification/
Clustering
Twitter/LinkedIn
profile link (𝐿 𝑘).
K-Link steps
Knowledge
base
Facebook
search
Similarity
calculation
Classification/
Clustering
Facebook profile
link (𝐿 𝑓)
F-Link steps
39. • Data sources
• 5,694 are used in M1 and M2.
• Maximum life events / user profile is 8 events. In addition,
each class of event has a total of 2.2K at maximum.
• Up to 83 name matches from Facebook.
• Machine learning dataset:
• Manually labeled 300 instances..
• Each new classified instance is added to the main
dataset to increase its performance.
29.06.2019
39
Implementation: dataset facts
Source Type
Google scholar Scholary
Wikidata Knowledge graph
Facebook, Twitter, and
LinkedIn
Online social networks
Life events statistics
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
M1 – name search matching results
83
50
25
10
40. • We compare the precision and recall trade-
offs across multiple domains.
• Four domains are considered in our study:
• (CS = Computer Science, P = Physics, C =
Chemistry, M = Medicine).
• Results
• LIn has the highest precision and recall.
• Platform is more structured.
• User information is more confident.
• Profiles are updated regularly.
29.06.2019 40
Evaluation on different domains
precision and recall
Multiple domain comparison results
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
We validate that our approach can match multiple
domains
41. • HYDRA [61]: a system for linking identical user accounts by analyzing and comparing the
behavior of users.
• BM25 [39]: an approach for identifying a user across social networks by comparing their
tagging practice and usernames.
• MOBIUS [113]: they connect user profiles across social networks by comparing the
behavioral characteristics such as timestamp between posts
• OPL [115]: an approach for connecting social networking user profiles using internal and
external features.
29.06.2019 41
Benchmarking with baselines (1/2)
precision and recall
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
42. • Comparison with other
knowledge graphs
• Automatic method better than
manual.
• Multiple domains are compared.
• K-link baselines
29.06.2019 42
Benchmarking with baselines (2/2)
precision and recall
• Baseline: other approaches.
• HYDRA [61]: link users by behavior matching.
• Baseline: with/without named entities.
Successfully founded links
Comparing to other approaches
Comparing to other approaches
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
We validate that our approach outperforms many
baselines, and similar to HYDRA in precision and
recall
43. • Cases: only supervised methods, only un-
supervised, combination.
• With biographies (WB) and without (WoB).
• With life events (WL) and without (WoL).
• Using both methods yields to a higher
precision and recall compared to using one
only.
29.06.2019 43
Evaluation on supervised, un-supervised, & both
precision and recall
Comparing the matching results on 3 machines learning cases
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
We validate that biographies and life events
enhance the precision and recall. Combined
class/clus as well enhance the results in our case.
44. • We study the impact of considering
the profile biographies that exist
inside the PDF publications.
• We show how the #matches before
using any biography is enhanced
clearly after using one or more
biographies.
• Why the #matches is enhanced?
• Extra information inside biographies.
29.06.2019 44
Positive impact of PDF biographies
Comparing the results of matching 4 researchers before and after using biographies
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
We validate that biographies can enhance the
accuracy of the matching results.
Name #matches before using
biographies (b=0)
(Total=100)
#matches after using
biographies (b≥1)
(Total=100)
Jeff Offutt 64 78
Trevor Hastie 55 66
Eric Yu 61 70
Robert Tibshirani 71 84
Average 62.75 74.5 11.75%
45. • Pros of including biographies:
• Solve the problem of private user profile information.
• Image is the highest available attribute.
• Related approaches did not cosider it.
• Pros of including life events:
• Content characteristic:
• Solve the content trade-offs among OSNs (text-only on Facebook VS
image-only on Twitter).
• Solve the problem of un-updated profiles (last post on Twitter 3 months
before the last Facebook’s post date).
• Solve the volume trade-offs problem: if we have zero tweets on Twitter
compared to a n posts on Facebook and LinkedIn.
• Solve the Language difference issue (English on Facebook VS Chinese on
Twitter).
29.06.2019 45
Pros of our approaches
case studies
Biography VS attribute-based approaches
Life events VS behavioral-based approaches
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
46. • We study the accuracy of each similarity
function.
• We consider 6 profile attributes: screenname,
location, life event, biography, profession, date
of birth, and gender.
• Similarity scores for biographies are closed to
[0.4, 0.7].
• Location, birthdate, and profession usually have
scores closed to [0.8, 1].
• Specific and limited value (e.g., gender (Male,
Female).
29.06.2019 46
Similarity measure scores
Similarity measure scores for different attributes
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation C5C2 C3 C4C1
K-Link scores F-Link scores
Hussein Hazimeh, Elena Mugellini, Simon Ruffieux, Omar Abou Khaled, Philippe Cudré-Mauroux.« Automatic Embedding of Social Network Profile Links into Knowledge
Graphs. » In 9th International Symposium on Info & Communication Technology - ACM (SoICT 2018). Da Nang, Vietnam.
Hussein Hazimeh, Elena Mugellini, Omar Abou Khaled, Philippe Cudré-Mauroux. «SocialMatching++: A Novel Approach for Interlinking User Profiles on Social Networks. » In
PROFILES@ISWC 2017. Vienna, Austria.
48. • In this thesis, we introduced new methods to reinforce entities in a knowledge graph.
• Main contributions recap:
• (1) comparative review on user profile matching on OSNs.
• (2) profiling online social networking users.
• (3) embedding social network profile links to academic entities extracted from the Wikidata knowledge graph.
• (4) introduced a new method for linking social profiles across different OSNs.
• (5) calculated and added sentiment polarities for social event entities extracted from Wikidata knowledge graph. (Did not present
because of the time limit, however, can be opened for Q&A discussion).
29.06.2019 48
Conclusions
Introduction Challenges & RQs SOA Contributions Conclusions Future work
Google
scholar
Data sources
KB
Knowledge
bases
Existing
New KGs
Embedding
Life events
Find social link and sentimentProfiling
Sentiment
Biographies
Sentiment features
C2
C3
C3C4
C3C5
• Methods:
• We used new resources and features in all of our
methods, which are not used in the related work.
• Results:
• We show that our methods can outperform the existing
methods in terms of precision, recall, and accuracy.
| Hussein Hazimeh PhD presentation
49. 29.06.2019 49
Lessons learned and limitations
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation
UPAD
• Why some
attributes lack
information.
• Which
attributes
contain
information
more than
others.
• User
engagement to
OSNs.
Finding social
links
• Life event
importance.
• Machine
learning model
efficiency.
Finding
sentiment
• Integrating
feature other
than text can
augment the
certainty of the
sentiment
polarity.
• Temporal
sentiment
tracking
showed
remarkable
changes.
Limitations
User profile level:
Location detection
profile matching level:
Matching failure between a particular
pair of events.
Multimedia contents were unstudied.
OSN API updates.
Structure modifications.
Recently: Facebook, Twitter,
and LinkedIn.
Lessons learned
50. 29.06.2019 50
Open problems
Introduction Challenges & RQs SOA Contributions Conclusions Future work
| Hussein Hazimeh PhD presentation
Resources
Integrate
additional
resources.
More OSNs
(Medium,
Reddit, …).
Entities
Cover more
entities.
Measure the
quality of
links.
Features
Take into
consideration
- Images
- tags
- check-ins.
Matching
algorithms
Develop
matching
algorithms
for
multimedia
contents.
51. 29.06.2019 51
Publications
Contribution
1. Hussein Hazimeh, Elena Mugellini, Omar Abou Khaled, Philippe Cudré-Mauroux. «Linking user profiles in social
networks: a comparative review. » International Journal of Social Network Mining, Volume 2: 333-361 - 2017.
2. Hussein Hazimeh, Elena Mugellini, Omar Abou Khaled. « Reliable User Profile Analytics and Discovery on Social
Networks. » In 8th International Conference on Software and Computer Applications - ACM (ICSCA 2019).Penang,
Malaysia.
3. Hussein Hazimeh, Elena Mugellini, Omar Abou Khaled, Philippe Cudré-Mauroux. «SocialMatching++: A Novel
Approach for Interlinking User Profiles on Social Networks. » In PROFILES@ISWC 2017. Vienna, Austria.
4. Hussein Hazimeh, Elena Mugellini, Simon Ruffieux, Omar Abou Khaled, Philippe Cudré-Mauroux. « Automatic
Embedding of Social Network Profile Links into Knowledge Graphs. » In 9th International Symposium on Info &
Communication Technology - ACM (SoICT 2018). Da Nang, Vietnam.
5. Hussein Hazimeh, Mohammad Harissa, Elena Mugellini, Omar Abou Khaled. « Temporal Sentiment Analysis and
Tracking of Large-scale Social Events. » In 8th International Conference on Software and Computer Applications -
ACM (ICSCA 2019). Penang, Malaysia.
C1C2C3C4C5
| Hussein Hazimeh PhD presentation
52. 29.06.2019 52
Publications
6. Hussein Hazimeh, Ahmad Traboulsi, Hasan Noureddine, Elena Mugellini, Omar Abou Khaled. « Social Networks
Serving Web Feeds: An Approach for Web Feed Enrichment. » In 10th International Conference on Information
Management and Engineering - ACM (ICIME 2018). Manchester, UK.
7. Sajida Chamass, Hussein Hazimeh, Jawad Makki, Elena Mugellini, Omar Abou Khaled. «Lexicon-based sentiment
analysis approach for ranking event entities. » In International Journal of Services and Standards, Volume 12: 126-139.
(first author she was a master student under my supervision).
8. H Hussein, Y Iman, M Jawad, N Hassan, T Julien, AK Omar, M Elena. «Leveraging Co-authorship and Biographical
Information for Author Ambiguity Resolution in DBLP. » The 30-th IEEE International Conference on Advanced
Information Networking, AINA 2016. Crans-Montana, Switzerland.
| Hussein Hazimeh PhD presentation
53. [32] O. Goga, H. Lei, S. Hari, G. Friedland, R. Sommer, and R. Teixeir. Exploiting innocuous activity for correlating users across sites. In 22nd International World Wide Web
Conference, WWW 2013, pages 447–458.
[87] Y. Sha, Q. Liang, and K. Zheng. Matching user accounts across social networks based on user message. In International Conference on Computational Science, ICCS 2016, pages
2423–2427.
[112] R. Zafarani and H. Liu. Connecting corresponding identities across communities. In ICWSM.
[33] O. Goga, P. Loiseau, R. Sommer, R. Teixeira, and K.P. Gummadi. On the reliability of profile matching across large online social networks. In KDD.
[#] N. Bennacer, C.N. Jipmo, A. Penta, and G. Quercini. Matching user profiles across social networks. In Advanced Information Systems Engineering - 26th International Conference,
CAiSE 2014, pages 424–438.
[95] T. Van Le, T.N. Truong, and T. Vu Pham. A content-based approach for user profile modeling and matching on social networks. In Multi-disciplinary Trends in Artificial
Intelligence - 8th International Workshop, MIWAI 2014, pages 232–243
[82] E. Raad, R. Chbeir, and A. Dipanda. User profile matching in social networks. In The 13th International Conference on Network-Based Information Systems, NBiS 2010, pages
297–304.
[41] P. Jain, P. Kumaraguru, and A. Joshi. @i seek ’fb.me’: identifying users across multiple online social networks. In WWW (Companion Volume).
[61] S. Liu, S. Wang, F. Zhu, J. Zhang, and R. Krishnan. Hydra: large scale social identity linkage via heterogeneous behavior modeling. In International Conference on Management of
Data, SIGMOD 2014, pages 51–62.
[113] R. Zafarani and H. Liu. Connecting users across social media sites: a behavioralmodeling approach. In The 19th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, KDD 2013, pages 41–49.
[73] A. Nunes, P. Calado, and B. Martins. Resolving user identities over social networks through supervised learning and rich similarity features. In Proceedings of the 27th Annual
ACM Symposium on Applied Computing SAC 2012, pages 728–729.
[66] M. Motoyama and G. Varghese. I seek you: searching and matching individuals in social networks. In Proceedings of the 5th ACM International Conference on Web Search and
Data Mining (WSDM) 2009, pages 67–75.
(NaBIC) 2015, pages 417–428.
29.06.2019 53
References (1/2)
| Hussein Hazimeh PhD presentation
54. [2] S. Bartunov, A. Korshunov, S. Taek Park, W. Ryu, and H. Lee. Joint link-attribute user identity resolution in online social networks. In SNAKDD Workshop.
[98] J. Vosecky, D. Hong, and V.Y. Shen. User identification across multiple social networks. In Networked Digital Technologies, First International Conference 2009
[88] ] Y. Shen and H. Jin. Controllable information sharing for user accounts linkage across multiple online social networks. In CIKM.
[54] W. Liang, B. Meng, and L. Xianchao. Gcm: A greedy-based cross-matching algorithm for identifying users across multiple online social networks. In PAISI.
[84] R. Roedler, D. Kergl, and G. Dreo Rodosek. Profile matching across online social networks based on geo-tags. In Proceedings of the 7th World Congress on Nature and
Biologically Inspired Computing
[75] A. Panchenko, D. Babaev, and S. Obiedkov. Large-scale parallel matching of social network profiles. In AIST.
[79] O. Peled, M. Fire, and Y. Elovici. Matching entities across online social networks. In Neurocomputing, page 91–206.
[40] P. Jain and P. Kumaraguru. Other times, other values: leveraging attribute history to link user profiles across online social networks. In ACM (HT).
[80] D. Perito, C. Castelluccia, M. Ali Kaafar, and P. Manils. How unique and traceable are usernames? privacy enhancing technologies. In PETS.
[93] M. Szomszor, I. Cantador, and H. Alani. Correlating user profiles from multiple folksonomies. In Proceedings of the 19th ACM Conference on Hypertext and Hypermedia 2008,
pages 33–42.
[39] T. Iofciu, P. Fankhauser, F. Abel, and K. Bischoff. Identifying users across social tagging systems. In Proceedings of the Fifth International Conference on Weblogs and Social
Media 2011.
[62] ] A. Malhotra, L.C. Totti, Meira Jr. W., P. Kumaraguru, A. Virgílio, and F. Almeida. Studying user footprints in different online social networks. In International Conference on
Advances in Social Networks Analysis and Mining, ASONAM 2012, pages 1065–1070.
[115] H. Zhang, M-Y. Kan, Y. Liu, and S. Ma. Online social network profile linkage. In Information Retrieval Technology - 10th Asia Information Retrieval Societies Conference,
AIRS 2014, pages 197–208.
[99] S. Vosoughi, H. Zhou, and D. Roy. Digital stylometry: linking profiles across social networks. In 8th International Conference Social Informatics, SocInfo.
29.06.2019 54
References (2/2)
| Hussein Hazimeh PhD presentation
55. PhD candidate: Hussein Hazimeh
Director: Prof. Philippe Cudré-Mauroux / UNI-FR
Co-Director: Prof. Elena Mugellini / HES-SO
28.06.2019
Automatic Knowledge Graph
Entity Reinforcement Based on
Social Networks