SlideShare a Scribd company logo
Antonio Fernández Anta
Unsupervised Scalable Statistical
Method for Identifying Influential
Users in Online Social Networks
Team
•  Universidad Carlos III
de Madrid
  Rubén Cuevas
  Henry Laniado
  Rosa E. Lillo
  Juan Romo
  Carlos Sguera
•  IMDEA Networks
Institute
  Arturo Azcorra
  Luis F. Chiroque
  A.F.A.
Motivation
•  Online Social Networks (OSNs) are used everyday
by billions of people
•  They are invaluable to extract information and to
actuate in advertising, marketing, politics, etc.
•  A recurring problem in OSNs analyses is to identify
“interesting” or “influential” users
•  Usually the characterization of influential users is
given a priori, and algorithms to find these
characteristics are proposed
9
Characterizing Influential Users
•  Several characterization that have been used for
influential OSN users:
 Large number of followers [Cha HBG 2010][Pastor-
Satorras Vespignani 2001] [Cohen EbAH 2001]
 Capacity of engagement [Domingos Richardson 2001]
[D’Agostino ANT 2015]
 High infection capacity in an epidemic model [Kitsak
GHLMSM 2010] [Morone Makse 2015] [Kempe Kleinberg
Tardos 2015]
•  Each of these characterizations may miss
important interesting users
•  They disregard many available attributes of the
users 10
Contributions
•  We propose a new unsupervised method to identify
“interesting” users: Massive Unsupervised Outlier
Detection (MUOD)
•  MOUD finds outliers in the multidimensional data
available from the users
•  These outliers can later be explored further to
identify their nature: MUOD identifies multiple
types of outliers to make this easier
•  MUOD scales to millions of users, so it is usable in
large OSN
•  We successfully tested MUOD in data of Google+
with 170M users over 2 years
11
Problem Statement
•  We have a set of n OSN users
•  For every user we have d attributes:
 Connectivity: Number of friends,
followers, centrality metrics, etc.
 Activity: Number of posts, likes,
reposts, etc.
 Profile: user’s name, location (e.g.,
city where she lives), job,
education, gender, and related data
Un
d
Outliers
•  The objective is to find
the outliers in the set
of OSN users
Multidimensional Data
•  Detecting outliers in multidimensional data is not
easy
Multidimensional Data
•  With more than three dimensions, it is practically
impossible to graphically visualize the observations
using Cartesian coordinates.
•  Convenient alternative: parallel coordinates
[Wegman 1990]
•  Observation x Rd can be seen as real function
defined on an arbitrary set of equally spaced
domain points, e.g., {1, . . . , d }, and x can be
expressed as x = {x (1), . . . , x (d)} [López-Pintado
Romo 2009]
15
Functional Data Analysis
•  Each observation/user is expressed as a curve, and
the outliers are curves that are different from “the
mass” [Hubert Rousseeuw Segaert 2015] in
 Magnitude
 Amplitude
 Shape
16
The Method
•  In MOUD we assign to each user an index that gives
the outlier intensity of each type:
 The shape index IS is based on the correlation
coefficient between the functions
 The amplitude index IA is based on the slope of linear
regression curves between the functions
 The magnitude index IM is based on the constant term of
linear regression curves between the functions
•  The higher the corresponding index, the more
likely the user is an outlier
17
Shape Index
Let us consider the set of users
Where each user is a vector of d values
The shape index of a user x is computed as
Where is the Pearson correlation
coefficient
18
X = {x1, x2, . . . , xn}
IS(x, X) =
1
n
nX
j=1
⇢(x, xj) 1
⇢(x, xj)
Shape Index Example
19
0 10 20 30 40 50 60 70
-2
0
2
4
6
8
10
12
0 20 40 60 80 100 120
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Magnitude and Amplitude Indices
We use linear regression
To obtain the magnitude and amplitude indices
IM (x, X) =
1
n
nX
j=1
ˆ↵j IA(x, X) =
1
n
nX
j=1
ˆj 1
x
xj
↵j
j
ˆj = Cov(x, xj)/Var(xj) ˆ↵j = x ˆjxj
Magnitude and Amplitude Indices
21 0 20 40 60 80 100 120
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 20 40 60 80 100 120
0
2
4
6
8
10
12
14
16
0 10 20 30 40 50 60 70
−5
0
5
10
15
20
25
0 100 200 300 400 500 600 700
-2
0
2
4
6
8
10
12
Which are Outliers?
•  Given the index IS of each user we can obtain the
set of outliers:
 Sort by IS
 Cut by point given by the tangent method [Louail 2014]
Sets of Outliers
•  Given the sets of outliers of shape, magnitude and
amplitude, we have up to 7 different outliers
subsets to consider, given their possible
intersections
23
Outliers groups. Simulation
outliers
shape
magnitude
amplitude
fbplot
Performance Evaluation
24
Performance Results
25
Mixed Outliers
26
Decomposed Results
27
Implementation
•  We have implemented the outlier detection
algorithm MUOD in R
•  We had to implement it in C++ and add it to the R
system, since R functions did not allow the
required memory control
•  The implementation allows parallel execution in p
cores, with time complexity O(n2d/p)
•  It has been made available in a public repository:
https://github.com/luisfo/muod.outliers
28
Performance
29
MOUD in Google+
•  We have data of n=170M Google+ users and 2 years
of activity (2011-2013), with d=21 features for
each (of profile, activity, and connectivity)
•  We use the 5.6M active
•  We find:
 4K outliers of MAS
 2K outliers of MS
 2K outliers of AS
 294K outliers of only SHA
30
Exploration of the Outlier Sets
31
24681012
medians(log)
NumActivities
NumAtts
NumPlusOnes
NumReplies
NumReshares
NumFriends
NumFollowers
NumFields
PerBidir
accountAge
accountRec
gender
job
numVideos
numPhotos
numAlbums
numArticles
numHangouts
numEvents
numWithGeo
pageRank
FBPLOT
SHA
MS
AS
MAS
mass (sample)
followers
engagement
activity
centrality
Epidemic Behavior
•  We run 10 SI (susceptible-infected) simulations in
the connected component (170M users)
32
1 2 3 4 5
0102030405060
infection process
steps
millionsofusers
FBPLOT
SHA
MS
AS
MAS
mass
Examples of Outlier Users
33
Conclusions and Future Work
•  We propose to use an unsupervised outlier
detection method to identify “interesting” users in
OSN
•  Then, explore what are the outliers
•  We propose a new method that scales to millions
of users and test it with a real data set
•  In the future we plan to use the method in
multiple contexts where identify outliers in
multidimensional data is useful (fraud detection,
faulty images, etc.)
34
Ongoing Work
•  Data from Twitter (MAG 2, AMP 226, SHA 6871, MA
5, MS 165, MAS 25, rest 138280)
35
Thank you!!
Azcorra, A., Chiroque, L. F., Cuevas, R., Fernández Anta, A., Laniado, H., Lillo, R. E.,
Romo, J., and Sguera, C. (2018), “Unsupervised Scalable Statistical Method for
Identifying Influential Users in Online Social Networks” Scientific Reports (2018).
https://github.com/luisfo/muod.outliers
36

More Related Content

Similar to Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks

Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...
Nikolaos Aletras
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and How
Julián Urbano
 
Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018
Fabien Gandon
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
Paolo Missier
 
IIR 2017, Lugano Switzerland
IIR 2017, Lugano SwitzerlandIIR 2017, Lugano Switzerland
IIR 2017, Lugano Switzerland
Marco Polignano
 
A recommender system for social learning platforms
A recommender system for social learning platformsA recommender system for social learning platforms
A recommender system for social learning platforms
Soudé Fazeli
 
Building Social Life Networks 130818
Building Social Life Networks 130818Building Social Life Networks 130818
Building Social Life Networks 130818
Ramesh Jain
 
Kp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptxKp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptx
CloudBusiness2
 
Tutorial on User Profiling with Graph Neural Networks and Related Beyond-Acc...
Tutorial on User Profiling with Graph Neural Networks  and Related Beyond-Acc...Tutorial on User Profiling with Graph Neural Networks  and Related Beyond-Acc...
Tutorial on User Profiling with Graph Neural Networks and Related Beyond-Acc...
Erasmo Purificato
 
Open science - Science 2.0
Open science - Science 2.0Open science - Science 2.0
Open science - Science 2.0
Jesus Enrique Aldana S.
 
TL_Thompson.pptx.ppt
TL_Thompson.pptx.pptTL_Thompson.pptx.ppt
TL_Thompson.pptx.ppt
RGowthamRao
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
jins0618
 
Data science guide
Data science guideData science guide
Data science guide
gokulprasath06
 
Ultimate Guide to Walkability Assessment Tools
Ultimate Guide to Walkability Assessment ToolsUltimate Guide to Walkability Assessment Tools
Ultimate Guide to Walkability Assessment Tools
State of Place
 
Social Network Analysis report
Social Network Analysis reportSocial Network Analysis report
Social Network Analysis report
Matthieu Cisel
 
DeepScan: Exploiting Deep Learning for Malicious Account Detection in Locatio...
DeepScan: Exploiting Deep Learning for Malicious Account Detection in Locatio...DeepScan: Exploiting Deep Learning for Malicious Account Detection in Locatio...
DeepScan: Exploiting Deep Learning for Malicious Account Detection in Locatio...
yeung2000
 
CoMo Game Dev - usability and user experience methods
CoMo Game Dev - usability and user experience methods CoMo Game Dev - usability and user experience methods
CoMo Game Dev - usability and user experience methods
Isa Jahnke
 
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social NetworksCollective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Turi, Inc.
 
Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...
Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...
Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...
Erasmo Purificato
 
Can we predict your sentiments by listening to your peers?
Can we predict your sentiments by listening to your peers?Can we predict your sentiments by listening to your peers?
Can we predict your sentiments by listening to your peers?
International Federation for Information Technologies in Travel and Tourism (IFITT)
 

Similar to Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks (20)

Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and How
 
Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
 
IIR 2017, Lugano Switzerland
IIR 2017, Lugano SwitzerlandIIR 2017, Lugano Switzerland
IIR 2017, Lugano Switzerland
 
A recommender system for social learning platforms
A recommender system for social learning platformsA recommender system for social learning platforms
A recommender system for social learning platforms
 
Building Social Life Networks 130818
Building Social Life Networks 130818Building Social Life Networks 130818
Building Social Life Networks 130818
 
Kp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptxKp-Data Analytics-ts.pptx
Kp-Data Analytics-ts.pptx
 
Tutorial on User Profiling with Graph Neural Networks and Related Beyond-Acc...
Tutorial on User Profiling with Graph Neural Networks  and Related Beyond-Acc...Tutorial on User Profiling with Graph Neural Networks  and Related Beyond-Acc...
Tutorial on User Profiling with Graph Neural Networks and Related Beyond-Acc...
 
Open science - Science 2.0
Open science - Science 2.0Open science - Science 2.0
Open science - Science 2.0
 
TL_Thompson.pptx.ppt
TL_Thompson.pptx.pptTL_Thompson.pptx.ppt
TL_Thompson.pptx.ppt
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
 
Data science guide
Data science guideData science guide
Data science guide
 
Ultimate Guide to Walkability Assessment Tools
Ultimate Guide to Walkability Assessment ToolsUltimate Guide to Walkability Assessment Tools
Ultimate Guide to Walkability Assessment Tools
 
Social Network Analysis report
Social Network Analysis reportSocial Network Analysis report
Social Network Analysis report
 
DeepScan: Exploiting Deep Learning for Malicious Account Detection in Locatio...
DeepScan: Exploiting Deep Learning for Malicious Account Detection in Locatio...DeepScan: Exploiting Deep Learning for Malicious Account Detection in Locatio...
DeepScan: Exploiting Deep Learning for Malicious Account Detection in Locatio...
 
CoMo Game Dev - usability and user experience methods
CoMo Game Dev - usability and user experience methods CoMo Game Dev - usability and user experience methods
CoMo Game Dev - usability and user experience methods
 
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social NetworksCollective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social Networks
 
Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...
Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...
Leveraging Graph Neural Networks for User Profiling: Recent Advances and Open...
 
Can we predict your sentiments by listening to your peers?
Can we predict your sentiments by listening to your peers?Can we predict your sentiments by listening to your peers?
Can we predict your sentiments by listening to your peers?
 

More from Facultad de Informática UCM

¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?
Facultad de Informática UCM
 
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
Facultad de Informática UCM
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation Computers
Facultad de Informática UCM
 
uElectronics ongoing activities at ESA
uElectronics ongoing activities at ESAuElectronics ongoing activities at ESA
uElectronics ongoing activities at ESA
Facultad de Informática UCM
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura Arm
Facultad de Informática UCM
 
Formalizing Mathematics in Lean
Formalizing Mathematics in LeanFormalizing Mathematics in Lean
Formalizing Mathematics in Lean
Facultad de Informática UCM
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented Computing
Facultad de Informática UCM
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
Facultad de Informática UCM
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuro
Facultad de Informática UCM
 
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 Design Automation Approaches for Real-Time Edge Computing for Science Applic... Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Facultad de Informática UCM
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Facultad de Informática UCM
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error Correction
Facultad de Informática UCM
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intento
Facultad de Informática UCM
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
Facultad de Informática UCM
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
Facultad de Informática UCM
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...
Facultad de Informática UCM
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Facultad de Informática UCM
 
Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?
Facultad de Informática UCM
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Facultad de Informática UCM
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore wind
Facultad de Informática UCM
 

More from Facultad de Informática UCM (20)

¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?
 
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation Computers
 
uElectronics ongoing activities at ESA
uElectronics ongoing activities at ESAuElectronics ongoing activities at ESA
uElectronics ongoing activities at ESA
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura Arm
 
Formalizing Mathematics in Lean
Formalizing Mathematics in LeanFormalizing Mathematics in Lean
Formalizing Mathematics in Lean
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented Computing
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuro
 
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 Design Automation Approaches for Real-Time Edge Computing for Science Applic... Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error Correction
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intento
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
 
Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore wind
 

Recently uploaded

Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
Kamal Acharya
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
The Role of Electrical and Electronics Engineers in IOT Technology.pdf
The Role of Electrical and Electronics Engineers in IOT Technology.pdfThe Role of Electrical and Electronics Engineers in IOT Technology.pdf
The Role of Electrical and Electronics Engineers in IOT Technology.pdf
Nettur Technical Training Foundation
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
top1002
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 

Recently uploaded (20)

Water billing management system project report.pdf
Water billing management system project report.pdfWater billing management system project report.pdf
Water billing management system project report.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
The Role of Electrical and Electronics Engineers in IOT Technology.pdf
The Role of Electrical and Electronics Engineers in IOT Technology.pdfThe Role of Electrical and Electronics Engineers in IOT Technology.pdf
The Role of Electrical and Electronics Engineers in IOT Technology.pdf
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 

Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks

  • 1. Antonio Fernández Anta Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks
  • 2. Team •  Universidad Carlos III de Madrid   Rubén Cuevas   Henry Laniado   Rosa E. Lillo   Juan Romo   Carlos Sguera •  IMDEA Networks Institute   Arturo Azcorra   Luis F. Chiroque   A.F.A.
  • 3. Motivation •  Online Social Networks (OSNs) are used everyday by billions of people •  They are invaluable to extract information and to actuate in advertising, marketing, politics, etc. •  A recurring problem in OSNs analyses is to identify “interesting” or “influential” users •  Usually the characterization of influential users is given a priori, and algorithms to find these characteristics are proposed 9
  • 4. Characterizing Influential Users •  Several characterization that have been used for influential OSN users:  Large number of followers [Cha HBG 2010][Pastor- Satorras Vespignani 2001] [Cohen EbAH 2001]  Capacity of engagement [Domingos Richardson 2001] [D’Agostino ANT 2015]  High infection capacity in an epidemic model [Kitsak GHLMSM 2010] [Morone Makse 2015] [Kempe Kleinberg Tardos 2015] •  Each of these characterizations may miss important interesting users •  They disregard many available attributes of the users 10
  • 5. Contributions •  We propose a new unsupervised method to identify “interesting” users: Massive Unsupervised Outlier Detection (MUOD) •  MOUD finds outliers in the multidimensional data available from the users •  These outliers can later be explored further to identify their nature: MUOD identifies multiple types of outliers to make this easier •  MUOD scales to millions of users, so it is usable in large OSN •  We successfully tested MUOD in data of Google+ with 170M users over 2 years 11
  • 6. Problem Statement •  We have a set of n OSN users •  For every user we have d attributes:  Connectivity: Number of friends, followers, centrality metrics, etc.  Activity: Number of posts, likes, reposts, etc.  Profile: user’s name, location (e.g., city where she lives), job, education, gender, and related data Un d
  • 7. Outliers •  The objective is to find the outliers in the set of OSN users
  • 8. Multidimensional Data •  Detecting outliers in multidimensional data is not easy
  • 9. Multidimensional Data •  With more than three dimensions, it is practically impossible to graphically visualize the observations using Cartesian coordinates. •  Convenient alternative: parallel coordinates [Wegman 1990] •  Observation x Rd can be seen as real function defined on an arbitrary set of equally spaced domain points, e.g., {1, . . . , d }, and x can be expressed as x = {x (1), . . . , x (d)} [López-Pintado Romo 2009] 15
  • 10. Functional Data Analysis •  Each observation/user is expressed as a curve, and the outliers are curves that are different from “the mass” [Hubert Rousseeuw Segaert 2015] in  Magnitude  Amplitude  Shape 16
  • 11. The Method •  In MOUD we assign to each user an index that gives the outlier intensity of each type:  The shape index IS is based on the correlation coefficient between the functions  The amplitude index IA is based on the slope of linear regression curves between the functions  The magnitude index IM is based on the constant term of linear regression curves between the functions •  The higher the corresponding index, the more likely the user is an outlier 17
  • 12. Shape Index Let us consider the set of users Where each user is a vector of d values The shape index of a user x is computed as Where is the Pearson correlation coefficient 18 X = {x1, x2, . . . , xn} IS(x, X) = 1 n nX j=1 ⇢(x, xj) 1 ⇢(x, xj)
  • 13. Shape Index Example 19 0 10 20 30 40 50 60 70 -2 0 2 4 6 8 10 12 0 20 40 60 80 100 120 0 0.2 0.4 0.6 0.8 1 1.2 1.4
  • 14. Magnitude and Amplitude Indices We use linear regression To obtain the magnitude and amplitude indices IM (x, X) = 1 n nX j=1 ˆ↵j IA(x, X) = 1 n nX j=1 ˆj 1 x xj ↵j j ˆj = Cov(x, xj)/Var(xj) ˆ↵j = x ˆjxj
  • 15. Magnitude and Amplitude Indices 21 0 20 40 60 80 100 120 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 20 40 60 80 100 120 0 2 4 6 8 10 12 14 16 0 10 20 30 40 50 60 70 −5 0 5 10 15 20 25 0 100 200 300 400 500 600 700 -2 0 2 4 6 8 10 12
  • 16. Which are Outliers? •  Given the index IS of each user we can obtain the set of outliers:  Sort by IS  Cut by point given by the tangent method [Louail 2014]
  • 17. Sets of Outliers •  Given the sets of outliers of shape, magnitude and amplitude, we have up to 7 different outliers subsets to consider, given their possible intersections 23 Outliers groups. Simulation outliers shape magnitude amplitude fbplot
  • 22. Implementation •  We have implemented the outlier detection algorithm MUOD in R •  We had to implement it in C++ and add it to the R system, since R functions did not allow the required memory control •  The implementation allows parallel execution in p cores, with time complexity O(n2d/p) •  It has been made available in a public repository: https://github.com/luisfo/muod.outliers 28
  • 24. MOUD in Google+ •  We have data of n=170M Google+ users and 2 years of activity (2011-2013), with d=21 features for each (of profile, activity, and connectivity) •  We use the 5.6M active •  We find:  4K outliers of MAS  2K outliers of MS  2K outliers of AS  294K outliers of only SHA 30
  • 25. Exploration of the Outlier Sets 31 24681012 medians(log) NumActivities NumAtts NumPlusOnes NumReplies NumReshares NumFriends NumFollowers NumFields PerBidir accountAge accountRec gender job numVideos numPhotos numAlbums numArticles numHangouts numEvents numWithGeo pageRank FBPLOT SHA MS AS MAS mass (sample) followers engagement activity centrality
  • 26. Epidemic Behavior •  We run 10 SI (susceptible-infected) simulations in the connected component (170M users) 32 1 2 3 4 5 0102030405060 infection process steps millionsofusers FBPLOT SHA MS AS MAS mass
  • 28. Conclusions and Future Work •  We propose to use an unsupervised outlier detection method to identify “interesting” users in OSN •  Then, explore what are the outliers •  We propose a new method that scales to millions of users and test it with a real data set •  In the future we plan to use the method in multiple contexts where identify outliers in multidimensional data is useful (fraud detection, faulty images, etc.) 34
  • 29. Ongoing Work •  Data from Twitter (MAG 2, AMP 226, SHA 6871, MA 5, MS 165, MAS 25, rest 138280) 35
  • 30. Thank you!! Azcorra, A., Chiroque, L. F., Cuevas, R., Fernández Anta, A., Laniado, H., Lillo, R. E., Romo, J., and Sguera, C. (2018), “Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks” Scientific Reports (2018). https://github.com/luisfo/muod.outliers 36