Large overview on active attacks in social networks. Starting from the motivations and ending with some privacy-preserving solutions studied in later years.
result management system report for college project
Active attacks in social networks
1. Active Attacks
In Social Networks
Tanasache Florin & Ragonese Alberto
Seminar in Web Security and Privacy
Active Attacks in Social Networks
2. Why Social Networks?
Active Attacks in Social Networks
Human interaction and socialization has
changed according to the technology
evolution.
In particular, emotions, feelings, thoughts,
opinions can all be shared instantly by
simply pressing a button in our favourite
social network application.
«Users publish detailed
personal information about
their preferences and daily
life»
3. Social Networks
Social networks model social relationships by graph structures
using nodes and edges.
Nodes correspond to people or other social entities and edges
correspond to social relationship between them.
World's biggest social networks :
1. Facebook (1.9 billion users)
2. WhatsApp (1.2 billion users as of February 2017)
3. Messenger (1.2 billion users as of April 2017)
4. YouTube (1 billion users)
5. WeChat/Weixin (889 million users)
6. QQ (869 million users)
7. Instagram (700 million users)
8. Qzone (638 million users)
9. Twitter (328 million users)
10. Weibo (313 million users)
Active Attacks in Social Networks
Facebook properties
5. Research on
Social Networks
Digital traces of human social interactions in a wide variety of
online settings:
Public Data: when users explicitly choose to disclose:
no privacy!
Sensitive Data: email, phone and messaging networks
need privacy protection!
Example:
Active Attacks in Social Networks
6. Anonymized
Social Networks
In designing studies of such systems, one needs to set up the
data to protect the privacy of individual users while preserving
the global network properties for the research studies.
Anonymization: a simple procedure in which each individual’s
“name” is replaced by a random userID, but the connections
between the people are revealed.
Active Attacks in Social Networks
7. Attacks on
Anonymized Networks
Can anonymization protect users’ privacy?
Identifying nodes and learning about the edge relations
among them compromise the privacy:
Privacy Breach!
There are two types of attack:
Passive Attack: An adversary tries to learn the identities of the nodes
only after the anonymized network has been released
Active Attack: An adversary tries to compromise privacy by strategically
creating new user accounts and links before the anonymized network
is released
Active Attacks in Social Networks
8. Active Attack
First Step: before releasing the anonymized network G of n-k
nodes, attacker:
Choose a set of b targeted users in G.
Create a subgraph H containing k nodes.
Attach H to the targeted nodes.
Active Attacks in Social Networks
The secret subgraph H constructed for the attacks can be thought
as a kind of structural steganography.
9. Active Attack
Second Step: after the release of the anonymized network:
Find the subgraph H in the graph G
Follow edges from H to locate b target nodes and their true location in
G
Determine all edges among these b nodes :
Active Attacks in Social Networks
10. Active Attack
Second Step: after the release of the anonymized network:
Find the subgraph H in the graph G
Follow edges from H to locate b target nodes and their true location in
G
Determine all edges among these b nodes : breach privacy
Active Attacks in Social Networks
11. Graph Isomorphism
In order to find the subgraph H, the construction of H succeeds if:
i. There is no subgraph S≠H in G such that G[S] and G[H] are isomorphic.
ii. The subgraph H can be efficiently found, given G.
iii. The subgraph H has no automorphism.
An isomorphism between two set of nodes P and Q in G is a one-to-one
correspondence f : P->Q that maps edges to edges and non-edges to non-
edges. Two vertices u and v in P are connected if their corresponding node
f(u) and f(v) are connected in Q.
An automorphism is an isomorphism to itself.
Active Attacks in Social Networks
12. Walk-Based Attack
The construction of H
Construction of H:
H = set of nodes X with size k = (2+δ) log 𝑛 for a small constant δ > 0.
W = {𝑤1, 𝑤2, … , 𝑤 𝑏} set of targeted users with size b = O(log2
𝑛)
• e.g. n = 1000M, b = 900, k ≈ 30
Active Attacks in Social Networks
13. Walk-Based Attack
The construction of H
Construction of H:
Choose two constants d0 ≤ d1 = O(log n) and for each node 𝑥𝑖 we choose an
external degree Δ𝑖 ∈ [𝑑0, 𝑑1] specifying the number of edges 𝒙𝒊 will have to
nodes in G-H.
Each 𝒘𝒋 connects to a set of nodes 𝑁𝑗 ⊆ 𝑋.
Set 𝑵𝒋 must be of size at most c=3 and are distinct across all nodes 𝒘𝒋.
Active Attacks in Social Networks
Total degree of xi is Δ'i
14. Walk-Based Attack
The construction of H
Construction of H:
Add arbitrary edges from H to G-H to make it Δi for all 𝒙𝒊
Add internal edges in H: edge (𝒙𝒊, 𝒙𝒊+𝟏)
Add additional internal edges connecting (𝑥𝑖, 𝑥𝑗) with probability 0.5
Therefore, each node 𝒙𝒊 has total degrees of Δ’i = Δi + (#internal edges)
Active Attacks in Social Networks
X1 X2 X3
15. Walk-Based Attack
Finding H
When the graph G is released, we want to identify H searching along k-
node paths in G and looking for a k-node path P for which the edges
induced among the nodes of P have precisely the structure of H.
Therefore, for every k-node path P = {𝑦1, 𝑦2, … , 𝑦 𝑘} in G, we visit the nodes
of P in order, declaring P to have failed in the comparison to H as soon as
we reach a node 𝑦𝑖 that fails one of the following two tests:
Degree test: The degree of node 𝑦𝑖 should be equal to the value Δ’i, which we
know to be the degree of node 𝑥𝑖 in G.
Internal structure test: For each j < i, there should be an edge (𝑦𝑗, 𝑦𝑖) in G if
and only if (𝑥𝑖, 𝑥𝑗) is an edge of H.
Active Attacks in Social Networks
Δi
xi
#internal edges
Δ’i = Δi + (#internal edges)
16. Walk-Based Attack
Finding H
Active Attacks in Social Networks
Finally, if we reach the end of the path P
without failure of these tests:
copy of H in G.
Search tree T: All nodes 𝛼𝑖 in T has
corresponding node f(𝛼𝑖) in G.
Every path of nodes 𝛼1, 𝛼2, … . , 𝛼𝑗 from the
root must have corresponding path in G
formed by nodes f(𝛼1), f(𝛼2), …, f(𝛼𝑗) with
the same degree sequence 𝑥1, 𝑥2, … , 𝑥𝑗.
17. Walk-Based Attack
Analysis
Theorem 1 [Uniqueness]:
With high probability, there is no subset of nodes S≠X in G such that G[S] is isomorphic to
G[X] = H. Formally:
H is a random subgraph and G is arbitrary
Edges between H and G – H are arbitrary
There are edges (xi, xi+1)
Then with high probability no subgraph of G is isomorphic to H
Active Attacks in Social Networks
18. Walk-Based Attack
Analysis
Theorem 1 [Uniqueness]:
With high probability, there is no subset of nodes S≠X in G such that G[S] is isomorphic to
G[X] = H. Formally:
H is a random subgraph and G is arbitrary
Edges between H and G – H are arbitrary
There are edges (xi, xi+1)
Then with high probability no subgraph of G is isomorphic to H
Theorem 2 [Efficiency]:
Search tree T does not grow too large. Formally:
For every ε, with high probability the size of T is O(𝒏 𝟏+𝝐
)
Active Attacks in Social Networks
19. Walk-Based Attack
Experiment
Data: Network of friends on
LiveJournal
4.4M nodes and 77M edges
Anonymized it
Active Attacks in Social Networks
Uniqueness: With 7 nodes, an average of
70 nodes can be de-anonymized
Even if (2+δ) log(4.4M) ≈ 44
Efficiency: |T| is typically ~ 9∙104
Detectability: the figure shows the
success frequency for two different
choices of 𝑑0 and 𝑑1 (interval [10,20]
and [20,60] and varying values of k. In
both cases with only 7 nodes we have a
high success rate.
20. The Cut-Based Attack
Any Active Attack has a Theoretical asymptotic lower bound of new
nodes: Ω( log 𝑛 )
Subgraph H=(V, E) V={𝑥1, 𝑥2, … , 𝑥 𝑘} , k = O( log n)
How many compromised node?
b = Θ log n
Active Attacks in Social Networks
21. The Cut-Based Attack
Construction of H
For W={w1, w2,…, wb} targeted users
Create k new user accounts X= {x1, x2,…, xk } where k=3b+3 nodes
Create links between each pair (xi , xj ) with probability 0.5
Choose arbitrary b nodes {x1, x2,…, xb};
Connect xi to wi
Active Attacks in Social Networks
23. The Cut-Based Attack
Properties
With high probability:
H has non-trivial automorphism
b is the size of the cut between H and G-H
All internal cuts in H are those of size >b
Cuts of size ≤ b are external.
Therefore these cuts will never break H.
Active Attacks in Social Networks
24. The Cut-Based Attack
Recovery H from G
Algorithm:
1. Compute the Gomory-Hu tree of G O(nm)
Active Attacks in Social Networks
25. The Cut-Based Attack
Recovery H from G
Algorithm:
1. Compute the Gomory-Hu tree of G O(nm)
Active Attacks in Social Networks
26. The Cut-Based Attack
Recovery H from G
Algorithm:
1. Compute the Gomory-Hu tree of G O(nm)
2. Delete all edges of weight at most b from the tree
3. Iterate over all components of size equal k,
testing Isomorphism to H.
4. H has no non-trivial automorphisms, so from the found component
we can identify the nodes x1,...,xb hence we are able to identify the
targeted users {w1, w2,…, wb}
Active Attacks in Social Networks
27. The Cut-Based Attack
Some Statistics
1.49
0.4
0.35
0.3
1.7
0.5
0.4
0.31
1.936
0.7
0.5
0.328
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Facebook Instagram LinkedIn Twitter
Billionofusers
Number of active users
2015 2016 2017
Based on Statistica.com
Active Attacks in Social Networks
28. The Cut-Based Attack
Specific numbers
Facebook
N=1,968 billion users, creating k=21 new users account we can succeed in
identifying 6 targeted users
Instagram
N=0,7 billion users, k=18 and b=5
LinkedIn
N=0,467 billion users, k=18 and b=5
Twitter
N=0,328 billion users, k=18 and b=5
Active Attacks in Social Networks
29. Walk vs Cut
Walk-Based
Fast recovery algorithm
Hard to detect
Χ Needs more new nodes Θ(log n)
Can de-anonymize b= Θ(log2n)=
Θ(k2)
Cut-Based
Χ More expensive recovery algorithm
Χ Easier to detect because H is dense
and tends to stand out
Needs less new node O( log 𝑛)
(close to theoretical asymptotic
lower bound: Ω( log 𝑛))
Χ Can de-anonymize only Θ( log 𝑛)
= Θ(k)
Active Attacks in Social Networks
30. Active vs Passive
Active attack
More effective. Work with high
probability in any network.
Can choose the victims
Χ Risk of being detected
Passive attack
Χ Attackers may not be able to
identify themselves after seeing
the released anonymized network.
Χ The victims are only those linked to
the attackers (neighbors).
Harder to detect
Active Attacks in Social Networks
31. Semi-passive attack
Semi-passive attack
A coalition of existing users colludes to attack specific users
Create only additional links to the targeted nodes. No
additional node.
Active Attacks in Social Networks
32. Semi-passive attack
Semi-passive attack
A coalition of existing users colludes to attack specific users
Create only additional links to the targeted nodes. No
additional node.
Active Attacks in Social Networks
33. Conclusions
Anonymized network is not safe, regardless of the
manner of privacy definition
For the curator of sensitive data it’s very difficult
to detect H graph without knowing its structure.
Data utility Vs Privacy
Differential Privacy
Active Attacks in Social Networks
34. Conclusions
Countermeasures
1. Detect fake accounts when created. Fake accounts send random friend
requests at the time they are created. If all friends of real person belong
to different communities is very suspicious.
Active Attacks in Social Networks
35. Conclusions
Countermeasures
2. Add random perturbation
Eg. Delete m edges and add other m edges.
Need a model to bias perturbation in order to preserve the main
properties of the graph
3. Add non-random perturbation
(k,l)-anonymity: express that a user can’t be re-identified with
probability higher than 1/k by an active attacker able to
introduce l sybil nodes in the graph
Can be shown that all real life social graph tend to be
(1,1)anonymous which is the lowest privacy level
New approach [3] consist in transform a graph into another with
higher anonymity than (1,1)-anonymity by only adding edges
Active Attacks in Social Networks
36. Conclusions
Countermeasures
Active Attacks in Social Networks
Original: Original G graph
Random approach: Anonymized with random approach
Our Approach: Anonymized by adding edges such that G is not (1,1)-anonymous
38. References
Thank you !
[1] Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and
Structural Steganography, By Lars Backstrom, Cynthia Dwork, and Jon Kleinberg
[2] Technical Perspective Anonymity Is Not Privacy By Vitaly Shmatikov
[3] Counteracting active attacks in social network graphs By Sjouke Mauw, Rolando
Trujillo-Rasua, and Bochuan Xuan
And several other minor resources.