SlideShare a Scribd company logo
1 of 57
Download to read offline
User Identity Linkage: Data Collection, Dataset
Biases, Method, Control and Application
Rishabh Kaushal
PhD15008
Committee Members:
Prof. Sanjay Jha
Dr. Alessandra Sala
Prof. Anwitaman Datta
Prof. Ponnurangam Kumaraguru (PK), Advisor
PhD Defense Presentation
Who Am I ?
Sponsored PhD Student, Precog Research Group, IIIT, Delhi.
Serving as Assistant Professor, IT Dept, IGDTUW.
MS by Research from IIIT, Hyderabad.
Research Interest: Social Computing.
2
Outline of Talk
3
Identity in Physical World
4
Identity Physical World
Student
Teacher
Software Engineer
Father
Identity in Online World
Identity has three dimensions - profile, content, and network
User joins multiple social networks
5
World of Social
Networks
Professional
Personal
News
Problem: User Identity Linkage (UIL)
UIL refers to the problem of determining whether two input user
identities, taken from two different social networks A and B, belong to
the same person or not.
(Ia
, Ib
): Linked User Identity Pair
6
Motivation
7
Motivation
8
Thesis Statement
“Computational approaches can be proposed for the analysis of data
collection methods, investigation of biases in identity linkage datasets,
linkage of user identities across social networks, control-ability of user
identity linkage, and application of user identity linkage solution to
solve extraneous problems.”
9
Outline of Talk
10
Accepted at 12th IEEE International Conference on Social Computing (SocialCom 2019). Xiamen, China.
Data Collection Methods
11
Social Aggregation (SA)
We refer to such sites as social aggregation platforms on which users
create an account and provide details of their multiple social network
accounts.
Perito et al. → Google profiles, Liu et al. → About.me profiles
12
Cross Platform Sharing
Cross platform sharing refers to a user behavior in which user posts
the same content across multiple social network (Correa et al.)
13
Self Disclosure
14
On user profile page, user himself/herself discloses their identity on
other social network platform (Chen et al.)
Social Network Coverage
15
Distribution of #Identities per User
16
Linked Identity Pairs
Only top-6 social networks
where we got best coverage
are plotted.
17
Data Collection - Conclusion
Computational approaches to collect linked user identity pairs can be
implemented.
Each data collection method depends upon a particular user behavior
which is leverage to collect linked identities of that user.
18
Outline of Talk
19
Accepted at 35th ACM/SIGAPP Symposium on Applied Computing (SAC 2020). Brno, Czech Republic.
Why study dataset biases ?
20
Every data collection approach depend on the typical behaviors of
users who maintain identities across multiple social networks
As a consequence, these behavioral biases exhibited by users get
manifested in these user identity linkage datasets.
Scope of our work
We focus on two identity linkage datasets (SD and CPS) derived by
leveraging two user behaviors namely, self-disclosure and cross
platform posting, respectively on Twitter and Instagram.
(1) Detection & Impact: Does dataset bias exist? What is the impact
of dataset biases on ML models?
(2) Quantification: How to measure the amount of dataset biases?
21
UIL as Supervised Learning Problem
22
Negative Class Generation: To create unlinked user identity pairs i.e. user identities that do not
belong to the same person, done in two ways - random pairing and similar pairing.
1. Jaccard Similarity on ‘username’ of
user identity pair.
2. Edit Distance on ‘display name’ of
user identity pair.
+ve Pairs: (rishabhk_, rk.iiit)
-ve Pair: (rishab, rk.iiit)
(rahul, rk.iiit)
DataSet Details
23
User Behavioral Features
Jaccard Similarity (JS) on usernames
24
50% of user identity pairs from SD
have JS value as 0.9 as opposed to
only 23% from CPS
Proportionofusers
User Behavioral Features
25
Edit Distance (ED) on display names
Proportionofusers
58% display names of user identity
pairs obtained through SD have 0.0
ED as compared to 35% from CPS
Impact of biases on model
26
Across all learning algorithms adopted, precision of models trained and tested on same datasets
are better than the models trained & tested on different datasets.
Experiments in two ways. (1) Same dataset for train-test (2) Different dataset for
train-test
Quantification of Bias
We have detected behavioral biases in user identities, characterized
them and measured their impact on identity linkage models.
We propose a design that quantifies biases by leveraging from a
well-established discrimination measurement approach namely
‘situational testing’.
27
Situational Testing (ST)
28
Background Quantification Metric
Applying ST to quantify biases
Data Record:
Person → User Identity Pair
Protected Attribute:
Gender (male or female) → Data Collection Method (SD or CPS)
Class Label:
(Selected / Not-Selected) → (Linked / Not-Linked)
29
Results
RQ: Are both decision classes (linked and unlinked) equally affected by biases?
30
t-value=0, means no bias.
But, it is evident that probability
distributions of t−values are spread
on both positive (t>0) and negative
(t<0) sides which indicates that
behavioral biases affect many data
records.
Dataset Biases - Conclusion
Behavioral biases exist in identity linkage datasets. They can be
detected and quantified.
We recommend to collect linked user identities using more than one
data collection method.
Mitigation of biases in identity dataset - open problem.
31
Outline of Talk
32
Accepted at International School & Conference on Network Science (NetSciX, 2020), Tokyo, Japan.
Propose: NeXLink Framework
Can we obtain effective node representations such that node embeddings of users
belonging to Cross-Network Linkages (CNLs) are closer in embedding space than
other nodes?
33
Input
Output
More formally
The goal of embedding function is to transform each user identity ui
X
and uj
Y
into
low dimensional vectors zi
X
and zj
Y
of size d such that if ui
X
and uj
Y
belong to the
same person, then their embedding vectors zi
X
and zj
Y
are closer in embedding
space else far apart.
34
NeXLink Framework
35
Structural similarities of node
within their respective
networks are preserved
Similarities of nodes across the
two networks are preserved based
on common friendship relation
Local Node Embeddings*
The joint probability of ui
X
and uk
X
represented by their embedding vectors zi
X
and
zj
X
can be expressed as below
The empirical probability between ui
X
and uk
X
within same network is defined by
their normalized weights as below
Optimization: Minimize the KL-divergence between these distributions
36
* LINE algorithm: Tang et al.
Global Node Embeddings
To construct global node embeddings, we construct a
global graph (G) as follows.
G(V) = VX
+ VY
G(E) = CNL + NCNL
Positive Edge Generation (CNL): Linked identity pairs
belonging to same person across social networks.
37
Negative Edge Generation (NCNL): For every node pair (ui
X
,uj
Y
) we perform a random
walk of t length starting at node ui
X
and add (ui
X
,uk
Y
) to NCNL (Non Cross Network Links)
if uk
Y
appears in the random walk.
Global Node Embeddings
To learn node embeddings, we perform biased walks (node2vec*) guided by
common friends (CF) metric such that transition probability is
38
* node2vec algorithm: Grover et al.
Datasets
We evaluated NeXLink framework on two datasets.
Augmented Dataset: Sampled two sub-graphs from a large Facebook friendship
network data comprising of 63,713 nodes and 817,090 edges. (Man et al.)
Real-world Dataset: Twitter (5,120 users and 130,575 edges) and Instagram (5,313
users and 54,233 edges) with 1,288 common users. (Kong et al.)
39
Evaluation Metric
For a given node ui
X
, our goal is find node uj
Y
which belong to the same person.
Therefore, we count a hit if zj
Y
is present in top-k node embeddings, ordered based
on cosine similarity.
40
Evaluation - Comparison with others
We evaluate our proposed NeXLink
(LINE-node2vec) framework with two
other approaches.
IONE: Input-Output Network
Embedding (IONE) for the task of
network alignment
REGAL: Representation Learning
based Graph Alignment
41
NeXLink Framework - Conclusion
Node representation learning based approach can be proposed to
effectively learn embedding vectors for extracting linked user identities .
42
Outline of Talk
43
Accepted at 9th International Conference on Social Informatics (SocInfo, 2017), University of Oxford, London.
Linkability Nudge
Can we help users control linkability of their identities across social
networks ?
We design and implement a linkability nudge, gentle interventions to
help users towards making an informed decision.
User decides a range of linkability threshold (score) for each identity
pair. (dynamic web portal)
Whenever user behavior goes beyond the pre-configured range, the
user is nudged. (web browser extension)
44
Linkability Nudge Architecture
45
Linkability Score - Displayed to User
46
Content Driven Color Nudge
47
Attribute Driven Notify Nudge
48
Nudge Evaluation
Controlled lab experiment, control vs treatment period.
Participants were recruited and told to perform tasks related to
making a post and changing their profile attribute.
We observed the impact of linkability nudge on participants.
49
Nudge Evaluation
50
Minutes since the start of experiment
Participants
Outline of Talk
51
Accepted at 7th International Conference on Mining Intelligence & Knowledge Exploration (MIKE 2019), NIT, Goa.
Clone Detection
Clone: User identity looking similar to the victim identity within the
same social network
52
Why detect clone identities ?
53
Contributions Summary
Performed comparative analysis of data collection methods.
Investigated biases in identity linkage datasets.
Proposed node embedding framework for user identity linkage.
Helped users control linkability of their identities across OSNs.
Applied UIL solution to detect clones and flag their behaviors.
54
Limitations & Future Directions
Data collection is a challenge. Need to explore other social media platforms
goodreads, strava, etc.
We employed situational testing in detection of dataset biases. Other methods
from fairness algorithm studies need to be explored.
Our NeXLink node embedding framework takes only network information.
Leveraging content and profile features can be helpful.
We performed controlled lab study. Deploying linkability nudge for field trials.
55
Acknowledgements
PhD Advisor: Prof PK
Monitoring Committee: Prof Arun Balaji Buduru, Prof Rajiv Ratn Shah
Co-authors and Peers
Members of Precog
My family
56
57
Thanks

More Related Content

What's hot

Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Matthew Rowe
 
06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)Duke Network Analysis Center
 
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...Diego Armando
 
Optimizing community detection in social networks using antlion and K-median
Optimizing community detection in social networks using antlion and K-medianOptimizing community detection in social networks using antlion and K-median
Optimizing community detection in social networks using antlion and K-medianjournalBEEI
 
$$ Using statistics to search and annotate pictures an evaluation of semantic...
$$ Using statistics to search and annotate pictures an evaluation of semantic...$$ Using statistics to search and annotate pictures an evaluation of semantic...
$$ Using statistics to search and annotate pictures an evaluation of semantic...mhmt82
 
Social media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / LecturerSocial media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / Lecturergomathi chlm
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisScott Gomer
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using KerasIRJET Journal
 
Link Prediction Survey
Link Prediction SurveyLink Prediction Survey
Link Prediction SurveyPatrick Walter
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachAndry Alamsyah
 
Learning Social Networks From Web Documents Using Support
Learning Social Networks From Web Documents Using SupportLearning Social Networks From Web Documents Using Support
Learning Social Networks From Web Documents Using Supportceya
 
Making the invisible visible through SNA
Making the invisible visible through SNAMaking the invisible visible through SNA
Making the invisible visible through SNAMYRA School of Business
 
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS csandit
 
Overview Of Network Analysis Platforms
Overview Of Network Analysis PlatformsOverview Of Network Analysis Platforms
Overview Of Network Analysis PlatformsNoah Flower
 
Community detection in complex social networks
Community detection in complex social networksCommunity detection in complex social networks
Community detection in complex social networksAboul Ella Hassanien
 

What's hot (19)

Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
 
06 Community Detection
06 Community Detection06 Community Detection
06 Community Detection
 
Q046049397
Q046049397Q046049397
Q046049397
 
17 Statistical Models for Networks
17 Statistical Models for Networks17 Statistical Models for Networks
17 Statistical Models for Networks
 
06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)
 
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...
 
Optimizing community detection in social networks using antlion and K-median
Optimizing community detection in social networks using antlion and K-medianOptimizing community detection in social networks using antlion and K-median
Optimizing community detection in social networks using antlion and K-median
 
$$ Using statistics to search and annotate pictures an evaluation of semantic...
$$ Using statistics to search and annotate pictures an evaluation of semantic...$$ Using statistics to search and annotate pictures an evaluation of semantic...
$$ Using statistics to search and annotate pictures an evaluation of semantic...
 
Node similarity
Node similarityNode similarity
Node similarity
 
Social media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / LecturerSocial media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / Lecturer
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
 
Link Prediction Survey
Link Prediction SurveyLink Prediction Survey
Link Prediction Survey
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network Approach
 
Learning Social Networks From Web Documents Using Support
Learning Social Networks From Web Documents Using SupportLearning Social Networks From Web Documents Using Support
Learning Social Networks From Web Documents Using Support
 
Making the invisible visible through SNA
Making the invisible visible through SNAMaking the invisible visible through SNA
Making the invisible visible through SNA
 
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
 
Overview Of Network Analysis Platforms
Overview Of Network Analysis PlatformsOverview Of Network Analysis Platforms
Overview Of Network Analysis Platforms
 
Community detection in complex social networks
Community detection in complex social networksCommunity detection in complex social networks
Community detection in complex social networks
 

Similar to User Identity Linkage: Data Collection, DataSet Biases, Method, Control and Application

IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction TechniquesIRJET Journal
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities inmoresmile
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...BAINIDA
 
IRJET- Predicting Social Network Communities Structure Changes and Detection ...
IRJET- Predicting Social Network Communities Structure Changes and Detection ...IRJET- Predicting Social Network Communities Structure Changes and Detection ...
IRJET- Predicting Social Network Communities Structure Changes and Detection ...IRJET Journal
 
security enhanced content sharing in social io t a directed hypergraph based ...
security enhanced content sharing in social io t a directed hypergraph based ...security enhanced content sharing in social io t a directed hypergraph based ...
security enhanced content sharing in social io t a directed hypergraph based ...Venkat Projects
 
Studying user footprints in different online social networks
Studying user footprints in different online social networksStudying user footprints in different online social networks
Studying user footprints in different online social networksIIIT Hyderabad
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networkseSAT Publishing House
 
Control of Photo Sharing on Online Social Network.
Control of Photo Sharing on Online Social Network.Control of Photo Sharing on Online Social Network.
Control of Photo Sharing on Online Social Network.SAFAD ISMAIL
 
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSIJDKP
 
Scalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksScalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksIJDKP
 
Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks IJECEIAES
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)Duke Network Analysis Center
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measuresdnac
 
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...Subhajit Sahu
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115Divita Madaan
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeHarish Vaidyanathan
 
IRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET Journal
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
LCF: A Temporal Approach to Link Prediction in Dynamic Social NetworksIJCSIS Research Publications
 

Similar to User Identity Linkage: Data Collection, DataSet Biases, Method, Control and Application (20)

IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction Techniques
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
 
IRJET- Predicting Social Network Communities Structure Changes and Detection ...
IRJET- Predicting Social Network Communities Structure Changes and Detection ...IRJET- Predicting Social Network Communities Structure Changes and Detection ...
IRJET- Predicting Social Network Communities Structure Changes and Detection ...
 
security enhanced content sharing in social io t a directed hypergraph based ...
security enhanced content sharing in social io t a directed hypergraph based ...security enhanced content sharing in social io t a directed hypergraph based ...
security enhanced content sharing in social io t a directed hypergraph based ...
 
Studying user footprints in different online social networks
Studying user footprints in different online social networksStudying user footprints in different online social networks
Studying user footprints in different online social networks
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
 
Control of Photo Sharing on Online Social Network.
Control of Photo Sharing on Online Social Network.Control of Photo Sharing on Online Social Network.
Control of Photo Sharing on Online Social Network.
 
Ppt
PptPpt
Ppt
 
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
 
Scalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksScalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large Networks
 
Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks
 
01 Network Data Collection (2017)
01 Network Data Collection (2017)01 Network Data Collection (2017)
01 Network Data Collection (2017)
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
 
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of Life
 
IRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social Networks
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 

More from IIIT Hyderabad

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayIIIT Hyderabad
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesIIIT Hyderabad
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasIIIT Hyderabad
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIIIT Hyderabad
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityIIIT Hyderabad
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...IIIT Hyderabad
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper IIIT Hyderabad
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasIIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...IIIT Hyderabad
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceIIIT Hyderabad
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...IIIT Hyderabad
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesIIIT Hyderabad
 

More from IIIT Hyderabad (20)

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success stories
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake News
 
#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBias
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial Advice
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian Languages
 

Recently uploaded

Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptbibisarnayak0
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 

Recently uploaded (20)

Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.ppt
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 

User Identity Linkage: Data Collection, DataSet Biases, Method, Control and Application

  • 1. User Identity Linkage: Data Collection, Dataset Biases, Method, Control and Application Rishabh Kaushal PhD15008 Committee Members: Prof. Sanjay Jha Dr. Alessandra Sala Prof. Anwitaman Datta Prof. Ponnurangam Kumaraguru (PK), Advisor PhD Defense Presentation
  • 2. Who Am I ? Sponsored PhD Student, Precog Research Group, IIIT, Delhi. Serving as Assistant Professor, IT Dept, IGDTUW. MS by Research from IIIT, Hyderabad. Research Interest: Social Computing. 2
  • 4. Identity in Physical World 4 Identity Physical World Student Teacher Software Engineer Father
  • 5. Identity in Online World Identity has three dimensions - profile, content, and network User joins multiple social networks 5 World of Social Networks Professional Personal News
  • 6. Problem: User Identity Linkage (UIL) UIL refers to the problem of determining whether two input user identities, taken from two different social networks A and B, belong to the same person or not. (Ia , Ib ): Linked User Identity Pair 6
  • 9. Thesis Statement “Computational approaches can be proposed for the analysis of data collection methods, investigation of biases in identity linkage datasets, linkage of user identities across social networks, control-ability of user identity linkage, and application of user identity linkage solution to solve extraneous problems.” 9
  • 10. Outline of Talk 10 Accepted at 12th IEEE International Conference on Social Computing (SocialCom 2019). Xiamen, China.
  • 12. Social Aggregation (SA) We refer to such sites as social aggregation platforms on which users create an account and provide details of their multiple social network accounts. Perito et al. → Google profiles, Liu et al. → About.me profiles 12
  • 13. Cross Platform Sharing Cross platform sharing refers to a user behavior in which user posts the same content across multiple social network (Correa et al.) 13
  • 14. Self Disclosure 14 On user profile page, user himself/herself discloses their identity on other social network platform (Chen et al.)
  • 17. Linked Identity Pairs Only top-6 social networks where we got best coverage are plotted. 17
  • 18. Data Collection - Conclusion Computational approaches to collect linked user identity pairs can be implemented. Each data collection method depends upon a particular user behavior which is leverage to collect linked identities of that user. 18
  • 19. Outline of Talk 19 Accepted at 35th ACM/SIGAPP Symposium on Applied Computing (SAC 2020). Brno, Czech Republic.
  • 20. Why study dataset biases ? 20 Every data collection approach depend on the typical behaviors of users who maintain identities across multiple social networks As a consequence, these behavioral biases exhibited by users get manifested in these user identity linkage datasets.
  • 21. Scope of our work We focus on two identity linkage datasets (SD and CPS) derived by leveraging two user behaviors namely, self-disclosure and cross platform posting, respectively on Twitter and Instagram. (1) Detection & Impact: Does dataset bias exist? What is the impact of dataset biases on ML models? (2) Quantification: How to measure the amount of dataset biases? 21
  • 22. UIL as Supervised Learning Problem 22 Negative Class Generation: To create unlinked user identity pairs i.e. user identities that do not belong to the same person, done in two ways - random pairing and similar pairing. 1. Jaccard Similarity on ‘username’ of user identity pair. 2. Edit Distance on ‘display name’ of user identity pair. +ve Pairs: (rishabhk_, rk.iiit) -ve Pair: (rishab, rk.iiit) (rahul, rk.iiit)
  • 24. User Behavioral Features Jaccard Similarity (JS) on usernames 24 50% of user identity pairs from SD have JS value as 0.9 as opposed to only 23% from CPS Proportionofusers
  • 25. User Behavioral Features 25 Edit Distance (ED) on display names Proportionofusers 58% display names of user identity pairs obtained through SD have 0.0 ED as compared to 35% from CPS
  • 26. Impact of biases on model 26 Across all learning algorithms adopted, precision of models trained and tested on same datasets are better than the models trained & tested on different datasets. Experiments in two ways. (1) Same dataset for train-test (2) Different dataset for train-test
  • 27. Quantification of Bias We have detected behavioral biases in user identities, characterized them and measured their impact on identity linkage models. We propose a design that quantifies biases by leveraging from a well-established discrimination measurement approach namely ‘situational testing’. 27
  • 28. Situational Testing (ST) 28 Background Quantification Metric
  • 29. Applying ST to quantify biases Data Record: Person → User Identity Pair Protected Attribute: Gender (male or female) → Data Collection Method (SD or CPS) Class Label: (Selected / Not-Selected) → (Linked / Not-Linked) 29
  • 30. Results RQ: Are both decision classes (linked and unlinked) equally affected by biases? 30 t-value=0, means no bias. But, it is evident that probability distributions of t−values are spread on both positive (t>0) and negative (t<0) sides which indicates that behavioral biases affect many data records.
  • 31. Dataset Biases - Conclusion Behavioral biases exist in identity linkage datasets. They can be detected and quantified. We recommend to collect linked user identities using more than one data collection method. Mitigation of biases in identity dataset - open problem. 31
  • 32. Outline of Talk 32 Accepted at International School & Conference on Network Science (NetSciX, 2020), Tokyo, Japan.
  • 33. Propose: NeXLink Framework Can we obtain effective node representations such that node embeddings of users belonging to Cross-Network Linkages (CNLs) are closer in embedding space than other nodes? 33 Input Output
  • 34. More formally The goal of embedding function is to transform each user identity ui X and uj Y into low dimensional vectors zi X and zj Y of size d such that if ui X and uj Y belong to the same person, then their embedding vectors zi X and zj Y are closer in embedding space else far apart. 34
  • 35. NeXLink Framework 35 Structural similarities of node within their respective networks are preserved Similarities of nodes across the two networks are preserved based on common friendship relation
  • 36. Local Node Embeddings* The joint probability of ui X and uk X represented by their embedding vectors zi X and zj X can be expressed as below The empirical probability between ui X and uk X within same network is defined by their normalized weights as below Optimization: Minimize the KL-divergence between these distributions 36 * LINE algorithm: Tang et al.
  • 37. Global Node Embeddings To construct global node embeddings, we construct a global graph (G) as follows. G(V) = VX + VY G(E) = CNL + NCNL Positive Edge Generation (CNL): Linked identity pairs belonging to same person across social networks. 37 Negative Edge Generation (NCNL): For every node pair (ui X ,uj Y ) we perform a random walk of t length starting at node ui X and add (ui X ,uk Y ) to NCNL (Non Cross Network Links) if uk Y appears in the random walk.
  • 38. Global Node Embeddings To learn node embeddings, we perform biased walks (node2vec*) guided by common friends (CF) metric such that transition probability is 38 * node2vec algorithm: Grover et al.
  • 39. Datasets We evaluated NeXLink framework on two datasets. Augmented Dataset: Sampled two sub-graphs from a large Facebook friendship network data comprising of 63,713 nodes and 817,090 edges. (Man et al.) Real-world Dataset: Twitter (5,120 users and 130,575 edges) and Instagram (5,313 users and 54,233 edges) with 1,288 common users. (Kong et al.) 39
  • 40. Evaluation Metric For a given node ui X , our goal is find node uj Y which belong to the same person. Therefore, we count a hit if zj Y is present in top-k node embeddings, ordered based on cosine similarity. 40
  • 41. Evaluation - Comparison with others We evaluate our proposed NeXLink (LINE-node2vec) framework with two other approaches. IONE: Input-Output Network Embedding (IONE) for the task of network alignment REGAL: Representation Learning based Graph Alignment 41
  • 42. NeXLink Framework - Conclusion Node representation learning based approach can be proposed to effectively learn embedding vectors for extracting linked user identities . 42
  • 43. Outline of Talk 43 Accepted at 9th International Conference on Social Informatics (SocInfo, 2017), University of Oxford, London.
  • 44. Linkability Nudge Can we help users control linkability of their identities across social networks ? We design and implement a linkability nudge, gentle interventions to help users towards making an informed decision. User decides a range of linkability threshold (score) for each identity pair. (dynamic web portal) Whenever user behavior goes beyond the pre-configured range, the user is nudged. (web browser extension) 44
  • 46. Linkability Score - Displayed to User 46
  • 49. Nudge Evaluation Controlled lab experiment, control vs treatment period. Participants were recruited and told to perform tasks related to making a post and changing their profile attribute. We observed the impact of linkability nudge on participants. 49
  • 50. Nudge Evaluation 50 Minutes since the start of experiment Participants
  • 51. Outline of Talk 51 Accepted at 7th International Conference on Mining Intelligence & Knowledge Exploration (MIKE 2019), NIT, Goa.
  • 52. Clone Detection Clone: User identity looking similar to the victim identity within the same social network 52
  • 53. Why detect clone identities ? 53
  • 54. Contributions Summary Performed comparative analysis of data collection methods. Investigated biases in identity linkage datasets. Proposed node embedding framework for user identity linkage. Helped users control linkability of their identities across OSNs. Applied UIL solution to detect clones and flag their behaviors. 54
  • 55. Limitations & Future Directions Data collection is a challenge. Need to explore other social media platforms goodreads, strava, etc. We employed situational testing in detection of dataset biases. Other methods from fairness algorithm studies need to be explored. Our NeXLink node embedding framework takes only network information. Leveraging content and profile features can be helpful. We performed controlled lab study. Deploying linkability nudge for field trials. 55
  • 56. Acknowledgements PhD Advisor: Prof PK Monitoring Committee: Prof Arun Balaji Buduru, Prof Rajiv Ratn Shah Co-authors and Peers Members of Precog My family 56