On A Quest for Combating Filter Bubbles and Misinformation.
Invited Talk, Chinese University of Hong Kong at Shenzhen, Dec 13, 2022.
Social media have greatly facilitated access to information and news and have enhanced users' ability to share with peers their views on issues. However, they have unfortunately led to increased societal polarization. At the center of this phenomenon are filter bubbles and misinformation. Filter bubbles are the result of excessive personalization which enhances relevance of content at the price of limiting exposure to a specific viewpoint. These bubbles are amplified by the so-called echo chambers that exist in social media, whereby members of a community mutually reinforce a fixed opinion or viewpoint on an issue. Misinformation as well as disinformation, on the other hand, tends to propagate through the network, often faster and more virally than truth.
Both problems manifest themselves in the form of groups of actors working in concert and providing mutual reinforcement. How can we recognize these groups? Having detected them, how can we counteract these problems? The first question can benefit from an examination of techniques developed to search for dense subgraphs in an underlying network. As for the second question, a natural approach for countering filter bubbles is to launch some kind of counter-campaign to balance users' exposure to viewpoints. Countermeasures for misinformation propagating through a network depend on the party planning the countermeasure. The network host can intervene and take steps to limit the propagation of misinformation, but these actions come with a cost. Besides the political sensitivity and cost of limiting freedom of expression, what if the intervention was by mistake done on genuine information? On the other hand, a third party interested in countering the propagation of misinformation may launch a counter-campaign. Some of the ideas behind designing such campaigns have strong connections to a classic problem called Influence Maximization, studied in a very different context, driven by different applications like viral marketing, infection containment, and revenue or welfare maximization. In this talk, we will examine research on detecting dense subgraphs as well as competitive influence maximization and discuss how that can inspire techniques for addressing the two problems above.
DevEX - reference for building teams, processes, and platforms
cuhk-fb-mi-talk.pdf
1. On A Quest for Combating
Filter Bubbles and
Misinformation
Laks V.S. Lakshmanan
University of British Columbia
Vancouver, BC, Canada
2. • Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence
Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 2
3. Prolegomenon
• What this talk is not about and will not do for you.
• Classify different kinds of “fake news”: e.g., mis/disinformation ...
• Computational Fact Checking or Claim Verification
• Offer a comprehensive solution to the filter bubble/echo chambers
or “fake news” problems.
• The scope of both stretch beyond just tech (e.g., models and
algorithms).
• Even the “tech-restricted” versions we won’t get to completely solve
today (in this talk).
12/13/22 CUHK-Shenzhen, China 3
4. Prolegomenon
• Instead, we will examine some (necessarily restricted)
models and formulations of problems.
• Offer a view of how research done in some different
contexts may inspire techniques for solving restricted
versions of the filter bubbles / echo chambers and the
misinformation problems.
• In case I missed your work, …
12/13/22 CUHK-Shenzhen, China 4
5. Not long ago, or maybe long ago …
12/13/22 CUHK-Shenzhen, China 5
6. And then came …
12/13/22 CUHK-Shenzhen, China
but arguably also these …
Which led to many great things
6
8. •Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 8
9. ["Political polarization 1994-2017." Pew Research Center., Washington, DC October 2017].
Filter Bubble and Echo Chambers exacerbate polarization
12/13/22 CUHK-Shenzhen, China 9
10. Filter Bubble and Echo Chambers exacerbate polarization
["Political polarization 1994-2017." Pew Research Center., Washington, DC October 2017].
12/13/22 CUHK-Shenzhen, China 10
11. Political Echo Chambers
● Members of densely connected groups are
likely to have the same opinions and
attitudes.
● Study focus on opposing political echo
chambers (~250K each) on Twitter in Japan.
● Political echo chambers have denser and
more core-periphery information spreading
structures than those of most other
communities.
12/13/22 CUHK-Shenzhen, China
[Asatani et al. Dense and influential core promotion of daily viral information spread in political echo chambers. Scientific
Reports 2021].
11
12. The Price of Filter Bubbles
• Filter bubbles and echo chambers can impede natural
opinion formation
[Musco, Musco, and Tourakakis. Minimizing polarization and disagreement in social networks. WWW 2018].
• Can lead to one-sided policy decisions
[Perrone and Wieder. Pro-painkiller echo chamber shaped policy amid drug epidemic. The Center for
Public Integrity, 2016].
• And erosion of societal trust
[Nguyen. Echo chambers and epistemic bubbles. Episteme, 2020].
12/13/22 CUHK-Shenzhen, China 12
13. • Filter Bubbles and Echo Chambers
•Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 13
15. Economic Cost of Misinformation
12/13/22 CUHK-Shenzhen, China 15
16. Economic Impact of Misinformation
12/13/22 CUHK-Shenzhen, China
FAKE NEWS: ELECTIONS
THE U.S. TO SPEND $200 MILLION ALONE ADVANCING FAKE
NEWS
$400 MILLION SPENT GLOBALLY ON FAKE POLITICAL NEWS
COVID-19 Vaccine Misinformation and
Disinformation Costs an Estimated $50 to
$300 Million Each Day
[Bruns, Hosangadi, Trotochaud, and Sell. Johns Hopkins
Center for Health Security. 2021].
[U. of Baltimore and CHEQ. The economic
cost of bad actors on the internet. Fake
News 2019].
16
17. Misinformation Propagation (US Politics)
● The connections between misinformation spreaders are denser than
connections between fact-checkers.
● Increasing the value of k takes us from the periphery to the denser inner
core structure.
12/13/22 CUHK-Shenzhen, China
k-Core decomposition of the pre-Election retweet network. Orange = fact-
checks and purple = claims.
[Shao, Hui, Wang et al. Anatomy of an online misinformation network. PLoS ONE 2018].
18
18. Misinformation Propagation + Bubbles (Covid-19)
● Echo-chambers with misinformed sub-communities are much denser than
those with informed sub-communities.
12/13/22 CUHK-Shenzhen, China
[Memon and Carley. Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset. CEUR Workshop 2020].
(a) Retweet (b) Mention
(c) Reply
(d) Retweet+Mention+Reply
19
19. • Filter Bubbles and Echo Chambers
• Misinformation
•Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 20
20. Densest Subgraphs: Undirected
• What is a good notion of density?
• Classical: average degree: ! " =
$
%
.
• Average #motifs/vertex: ' ", Ψ =
* +,,
%
. '-./ − optimal density.
• E.g., Δ-density.
• More generally, Ψ-density for pattern Ψ (e.g., h-clique).
• Intuition: densest subgraphs may indicate echo chambers.
12/13/22 CUHK-Shenzhen, China
#instances of Ψ (motif) in G.
21
21. Different notions of density.
12/13/22 CUHK-Shenzhen, China
-densest subgraph.
density = 11/7.
-densest subgraph.
density = 2/4.
• Clique-density.
• Pattern-density.
22
22. k-cores and k-clique-cores
12/13/22 CUHK-Shenzhen, China
3
2
1, 0
<latexit sha1_base64="h+a8v17wh/Dw4VGrNfEaZ6hpP7Q=">AAAB9XicbVDLSgNBEJyNrxhfUY9eBoPgxbArQT0GvHiMYB6QrGF20kmGzGOZmVXDkv/w4kERr/6LN//GSbIHTSxoKKq66e6KYs6M9f1vL7eyura+kd8sbG3v7O4V9w8aRiWaQp0qrnQrIgY4k1C3zHJoxRqIiDg0o9H11G8+gDZMyTs7jiEUZCBZn1FinXQ/6ohIPaVnE6o0mG6x5Jf9GfAyCTJSQhlq3eJXp6doIkBayokx7cCPbZgSbRnlMCl0EgMxoSMygLajkggwYTq7eoJPnNLDfaVdSYtn6u+JlAhjxiJynYLYoVn0puJ/Xjux/aswZTJOLEg6X9RPOLYKTyPAPaaBWj52hFDN3K2YDokm1LqgCi6EYPHlZdI4LwcX5cptpVStZHHk0RE6RqcoQJeoim5QDdURRRo9o1f05j16L9679zFvzXnZzCH6A+/zB+i2kr8=</latexit>
k-cores
<latexit sha1_base64="9hFQByLqhvsg+DyYKvGpEOrpZNE=">AAACBHicbVDLSgNBEJyNrxhfUY+5DAYhgoZdCeox4MVjBPOA7BJmJ51kyOzMMjMrhiUHL/6KFw+KePUjvPk3Th4HTSxoKKq66e4KY860cd1vJ7Oyura+kd3MbW3v7O7l9w8aWiaKQp1KLlUrJBo4E1A3zHBoxQpIFHJohsPrid+8B6WZFHdmFEMQkb5gPUaJsVInXygNT7FvFCOiz+HEj0L5kJ6NqVSgO/miW3anwMvEm5MimqPWyX/5XUmTCIShnGjd9tzYBClRhlEO45yfaIgJHZI+tC0VJAIdpNMnxvjYKl3ck8qWMHiq/p5ISaT1KAptZ0TMQC96E/E/r52Y3lWQMhEnBgSdLeolHBuJJ4ngLlNADR9ZQqhi9lZMB0QRamxuORuCt/jyMmmcl72LcuW2UqxW5nFkUQEdoRLy0CWqohtUQ3VE0SN6Rq/ozXlyXpx352PWmnHmM4foD5zPHyBll8E=</latexit>
(k, 4)-cores
0 1 2, 3
(", $)-core of G – maximal subgraph where each vertex participates in ≥
' instances of Ψ.
23
23. Densest Subgraph Discovery
12/13/22 CUHK-Shenzhen, China
Problem: Given a graph G(V, E) and an h-clique Ψ "#, %# ,
find the subgraph D with the highest h-clique density
& ', Ψ .
Ψ can be any pattern: e.g., a 3-star, Δ, etc.
Focus of this talk: h-cliques.
24
24. SOTA1
: Densest Subgraph Discovery:
Exact
• Binary search to guess the density
• Construct the flow network
• Based on guessed density and original graph
• Use max-flow algorithm to check the
feasibility
• Example: ! = 0, % = 1 (max triangle deg)
• α= (l+r)/2=0.5.
• Run time: '
( )
* − 1
ℎ − 1
+ ) Λ + min ), Λ 2
.
1
As of 2017.
12/13/22 CUHK-Shenzhen, China
[Mitzenmacher, Pachocki, Peng, Tourakakis, and Xu. Scalable large near-clique detection in large-scale networks via
sampling. KDD 2015].
#instances of Ψ.
⇒⇒
25
25. A
DS Discovery – A Triangle Example
12/13/22 CUHK-Shenzhen, China
B
C
D
s t
Ψ"
Ψ#
Ψ$
Ψ%
0
1
1
1
3&
3&
3&
3&
+∞
+∞
+∞
+∞
+∞
+∞
+∞
+∞
1
1
1
Flow network.
If ) = 0.5
If ) = 1/3
⇐
26
26. SOTA1
Densest Subgraph Discovery:
Approximation
• Approximation algorithm: PeelApp
• Iteratively peel the vertex w/ smallest h-clique-degree.
• Let !", !$, … be the list of residual subgraphs generated.
• Return !& with the highest density.
• Approximation:
• The density of S is at least
"
'(
⋅ *+,- =
"
/
⋅ *012.
• Running time: time.
12/13/22 CUHK-Shenzhen, China
<latexit sha1_base64="iHkLEsdke5bqZTUfsJFWe3g6ats=">AAACBHicbVDLSsNAFJ34rPUVddnNYBHqoiWRoi5cFNy4s4J9QBPKZDJph05mwsxEKKELN/6KGxeKuPUj3Pk3TtsstPXAhcM593LvPUHCqNKO822trK6tb2wWtorbO7t7+/bBYVuJVGLSwoIJ2Q2QIoxy0tJUM9JNJEFxwEgnGF1P/c4DkYoKfq/HCfFjNOA0ohhpI/Xt0m2FezgUGmZh1fXwUAhF4LDqTk5h3y47NWcGuEzcnJRBjmbf/vJCgdOYcI0ZUqrnOon2MyQ1xYxMil6qSILwCA1Iz1COYqL8bPbEBJ4YJYSRkKa4hjP190SGYqXGcWA6Y6SHatGbiv95vVRHl35GeZJqwvF8UZQyqAWcJgJDKgnWbGwIwpKaWyEeIomwNrkVTQju4svLpH1Wc89r9bt6uXGVx1EAJXAMKsAFF6ABbkATtAAGj+AZvII368l6sd6tj3nripXPHIE/sD5/AEI0lo0=</latexit>
O(n ·
✓
d 1
h 1
◆
)
[Tsourakakis. The k-clique densest subgraph problem. WWW 2015].
1
As of 2017.
27
27. DSD: SOTA Limitations
• Initial bounds on ! not tight.
• Size of flow network can be large: e.g., large G with
many instances of Ψ.
• Flow network built from original G each time.
• Even PeelApp does redundant work.
12/13/22 CUHK-Shenzhen, China
$, Ψ -core to the rescue!
Can we “bound” the densest subgraph?
28
28. Bounding Densest Subgraphs with Cores
• Theorem: G, k, Ψ as before. H a (#, Ψ)-core of G. Then:
#
&'
≤ ) *, Ψ ≤ #+,-.
Special case: #+,--core has density in
/012
3
, #+,- .
12/13/22 CUHK-Shenzhen, China
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
h
29
29. Bounding DSG with cores: An Example
12/13/22 CUHK-Shenzhen, China
For !"#$ = 2 and a 2-core, LB = 1 and UB = 2.
' = 1. ' =
5
4
,
9
6
,
13
8
, ⋯ → 2.
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
30
30. Bounding Densest Subgraphs with Cores
• Lemma: The DSG of G must be contained in its
(⌈#$%&⌉, Ψ)-core.
12/13/22 CUHK-Shenzhen, China
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
31
31. Exact algorithm: CoreExact
• Our algorithm: CoreExact
• Follow the same framework as existing exact algorithm
• Three core-based optimization techniques
• Binary search to guess the density
• Construct the flow network
• Based on guessed density and original graph
• Use max-flow algorithm to check the feasibility
12/13/22 CUHK-Shenzhen, China
1. Tighter bounds derived from cores [
"#$%
&'
, )*+,]
2. Build the flow network on cores
3. Locate Clique-densest subgraph in even smaller cores after each checking
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
32
32. Approximation Algorithms
• IncApp:
• Do a (", Ψ)-core decomposition of G. time.
• Return the ("&'(, Ψ)-core.
•
)
|+,|
=
)
.
-approximation.
• Finding (repeatedly) clique-degree can be expensive for
large cliques.
• CoreApp: Heuristic to directly find ("&'(, Ψ)-core.
12/13/22 CUHK-Shenzhen, China
<latexit sha1_base64="ojo/HvrAsrswEIka12R2Rr1XIFU=">AAACBHicbVDLSsNAFJ3UV62vqMtuBotQFy2JFHVZcOPOCvYBTSiTyaQdOpkJMxOhhC7c+CtuXCji1o9w5984bbPQ1gMXDufcy733BAmjSjvOt1VYW9/Y3Cpul3Z29/YP7MOjjhKpxKSNBROyFyBFGOWkralmpJdIguKAkW4wvp753QciFRX8Xk8S4sdoyGlEMdJGGtjl2yr3cCg0zMKa6+GREIrAUc2dnsGBXXHqzhxwlbg5qYAcrYH95YUCpzHhGjOkVN91Eu1nSGqKGZmWvFSRBOExGpK+oRzFRPnZ/IkpPDVKCCMhTXEN5+rviQzFSk3iwHTGSI/UsjcT//P6qY6u/IzyJNWE48WiKGVQCzhLBIZUEqzZxBCEJTW3QjxCEmFtciuZENzll1dJ57zuXtQbd41K08njKIIyOAFV4IJL0AQ3oAXaAINH8AxewZv1ZL1Y79bHorVg5TPH4A+szx8+mJaB</latexit>
O(n ·
✓
d 1
h 1
◆
)
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
33
33. Approximation Algorithms
Core App:
1. Sort vertices of G in ↓ order of their h-clique-based core
number, using cheaper proxy.
2. Obtain the max core & core number " from top-#
vertices
3. If the max degree of remaining vertices is larger than "
• # = 2×#, repeat 2.
• Otherwise, output the max core
12/13/22 CUHK-Shenzhen, China
Same worst case time complexity as IncApp and PeelApp (SOTA) but much faster in practice.
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
34
34. Sample Experiment Results
12/13/22 CUHK-Shenzhen, China
As-Caida (n = 26K, m = 106K). Friendster (n = 20M, m = 106M).
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
35
35. Mini Case Study: Covid-19
•Covid-19 Retweets.
1,025,937 retweets involving 660,730 users.
è(660,730 nodes, 835193 edges).
•Largest connected component:
(399,962 nodes, 663,506 edges)
12/13/22 CUHK-Shenzhen, China
Courtesy: Thirumuruganathan, QCRI.
36
36. 12/13/22 CUHK-Shenzhen, China
Densest subgraph :
86 vertices
18-core
density : 12.5407
Top-2 densest subgraph:
1134 vertices
13-core
density : 10.0150
Cross edges: 296
Side
effects
of
Vaccine
Modes
of
Transm-
ission
of Virus.
Case counts in diff
states and
countries.
37
37. Mini Case Study II: Voter Fraud 2020
12/13/22 CUHK-Shenzhen, China
Tweets on US Presidential Election 2020.
Number of nodes : 1,385,225
Number of edges : 6,631,720
Number of Tweets: 8,085,323
Size of the largest connected component:
Number of nodes: 1,356,657
Number of edges : 6,611,465
Courtesy: Thirumuruganathan, QCRI.
38
39. 12/13/22 CUHK-Shenzhen, China
Repeated allegations
of voter fraud.
retweeting Sydney
Powell’s tweet
warning states against
certifying the election.
Quoting Trump “dirty
rolls ==> dirty polls”.
big tech is colluding
with dems to defeat
Trump. Vote in person
to fight against mail-in
voter fraud. FBI said
many military mail-in
votes, all for Trump,
were thrown away in
a ditch in PA. Biggest
voter fraud in
American history.
Voting machines
known to be insecure.
Need proof of
citizenship and photo
ID to prevent fraud.
Fact-checkers from AP,
Politifact, &
Reuters confirm -- no
evidence of
widespread election
fraud. Experts confirm
elections are secure;
most of the
interference comes
from misinformation
campaigns. GOP and
Trump team are
sowing disinfo. and
panic. Need to protect
democracy. Trump’s
narrower margin wins
in 2016 vs Biden’s
wider ones in 2020.
Debunk “Deborah
Jean Christiansen’s
vote is fraud” by
quoting her. More
former Trump aides
getting infected than
voter fraud cases!
Quotes of Sydney Powell’s tweet; replies
that there is no evidence of widespread
fraud; Biden brags about having “the most
extensive and inclusive VOTER FRAUD
organization in the history of American
politics; (CNN) dishonesty taxonomy of
Trump rally; Phily Mayor hiding info. from
people. Anyone caught cheating with
Voter Fraud games should be federally
charged; State officials from both parties
stated the election went well. Losing side
refusing to recognize clear winner;
weaving conspiracy theories and
strangling faith and belief.
40
40. Mini Case Study III: Nepal Earthquake
12/13/22 CUHK-Shenzhen, China
• Graph constructed from cascades of tweets collected following the Nepal
earthquake, April 2015.
• 265383 nodes.
• 3898972 edges.
• largest connected component:
• 258756 nodes.
• 3771999 edges.
Courtesy: Thirumuruganathan, QCRI.
https://zenodo.org/record/2587475#.Ypkxmi-caFg.
41
41. 12/13/22 CUHK-Shenzhen, China
1463 vertices
129-core
density: 105.328
370 vertices
115-core
density : 71.9378
129 edges
Requests
for help
Info on
earthquake –
magnitude,
distance to cities
affected from
capital
Reports
on
damage
and ruin
42
42. Recent Progress on DSGs
WWW2020
Provide near optimal
via multiple peeling
1 + # -approx within
$(
& '( )
*∗ ⋅
-
./) proved by
[SODA2022]
STOC2020
(1 + #)-approximation
on dynamic graph
With $(log4 5 ⋅ #67)
per edge
insertion/deletion
WWW2020
Define and find
minimal DSG
Minimal: no proper
subgraph is a DSGs
SODA2022
A flow-based 1 + # -
approx algo
With 8
$(
9
.
)
12/13/22 CUHK-Shenzhen, China
[Digvijay, Gao, Peng et al. Flowless: Extracting densest subgraphs without flow computations. WWW 2020].
[Sawlani and Wang. Near-optimal fully dynamic densest subgraph. STOC 2020].
[Chang and Qiao. Deconstruct Densest Subgraphs. WWW 2020].
[Chekuri, Quanrud, and Torres. Densest Subgraph: Supermodularity, Iterative Peeling, and Flow. SODA 2022].
43
43. • Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
•Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 44
44. Directed Densest Subgraphs
12/13/22 CUHK-Shenzhen, China
a
e
d
c
b
!∗
#∗
A directed densest subgraph (DDS) of a digraph
is a pair of vertex sets (S, T). Its density is
<latexit sha1_base64="jzi2npcaaUdd+d3XTNTd2P0/iEE=">AAACGnicbVBNS8NAEN3U7/oV9ehlsQgKUhIp6kUoiuBRsdVCU8pmu2mXbrJxdyKUJL/Di3/FiwdFvIkX/43b2oNaHwy8fW+GnXl+LLgGx/m0ClPTM7Nz8wvFxaXllVV7bf1ay0RRVqdSSNXwiWaCR6wOHARrxIqR0Bfsxu+fDv2bO6Y0l1ENBjFrhaQb8YBTAkZq266nenLnaq+2i4+xFyhC0+xs9M7y1NO3CtLsKvNoRwLOalme47ZdcsrOCHiSuGNSQmNctO13ryNpErIIqCBaN10nhlZKFHAqWF70Es1iQvuky5qGRiRkupWOTsvxtlE6OJDKVAR4pP6cSEmo9SD0TWdIoKf/ekPxP6+ZQHDUSnkUJ8Ai+v1RkAgMEg9zwh2uGAUxMIRQxc2umPaIyQdMmkUTgvv35ElyvV92D8qVy0qpejKOYx5toi20g1x0iKroHF2gOqLoHj2iZ/RiPVhP1qv19t1asMYzG+gXrI8vxligKA==</latexit>
⇢(S, T) =
|E(S, T)|
p
|S| · |T|
-- generalizes edge density from undirected graphs.
Problem: Find !∗
, #∗
with max. %.
[Kannan and Vinay. Analyzing the structure of large graphs. Tech Report 1999].
45
45. SOTA1
DDS Discovery: Exact
• Repeatedly solve Max-flow, similarly to the undirected case.
• for each value of ! =
|$|
|%|
: 0 < ) , |+| ≤ -
• Find the max density by binary search.
• Build flow network and solve Max-flow.
• Overall time: . -/
01234567 .
• > 2 days on ~1,200 vertices and ~2,600 edges.
12/13/22 CUHK-Shenzhen, China
[Khuller and Saha. On finding dense subgraphs. ICALP 2009].
1As of 2019.
46
46. SOTA DDS Discovery: Approximation
12/13/22 CUHK-Shenzhen, China
Greedy Peeling Algorithm:
• Build a bipartite graph
(L,R,E) where ! = # = $
• The edges are all from
! copy to # copy
• Each time remove a node
with least degree
• Report densest subgraph
among those obtained.
c
a b
d
e
% & + ( time.
Approximation?
G
[Khuller and Saha. On finding dense subgraphs. ICALP 2009].
47
47. SOTA DDS Discovery: Approximation
12/13/22 CUHK-Shenzhen, China
• Fix [personal communication with authors].
• 2-approximation algorithm
• !(#(# + %))
KS-Approx
density: 2.75
Ground truth
density: 6
<latexit sha1_base64="Whotl/O/SEtWiMWhAbdFJgi04F4=">AAACfHicbVFdSwJBFB23L7MvrcdehiwoKtm1qB6jgnrwoSgrMJHZ8aqDs7PLzN1Qln5Cr/Xb+jPRrBqkdmHgcM7cz+NHUhh03a+MMzM7N7+QXcwtLa+sruUL648mjDWHKg9lqJ99ZkAKBVUUKOE50sACX8KT371M9adX0EaE6gH7EdQD1laiJThDS937Da+RL7oldxB0GngjUCSjuG0UMvylGfI4AIVcMmNqnhthPWEaBZfwlnuJDUSMd1kbahYqFoCpJ4NZ3+iOZZq0FWr7FNIB+zcjYYEx/cC3PwOGHTOppeR/Wi3G1lk9ESqKERQfNmrFkmJI08VpU2jgKPsWMK6FnZXyDtOMoz3PWJdB7Qj42CZJL1aCh02YYCX2UDNLGsCACZVulVSEinu0InywN1Hwq9qyqbx7JdoCzUHFeqAOrjVAd28qxdriTZowDR7LJe+oVL47Lp5fjAzKkk2yRXaJR07JObkht6RKOGmTd/JBPjPfzraz7xwOvzqZUc4GGQvn5Ad1dMU4</latexit>
<latexit sha1_base64="5KqrWk8OLGSMOzvxwclVW1sn29I=">AAACfHicbVHLSiNBFK20zhgfo0aXbgqj4DAaulXUpaigiywUjQoxhOqbm1ikurqpui0JjZ/gVr/NnxGrYwSTzIWCwzl1nydMlLTk++8Fb2r61++Z4uzc/MKfxaXl0sqtjVMDWINYxeY+FBaV1FgjSQrvE4MiChXehd3TXL97QmNlrG+on2AjEh0t2xIEOeoamkFzuexX/EHwSRAMQZkN47JZKsBDK4Y0Qk2ghLX1wE+okQlDEhQ+zz2kFhMBXdHBuoNaRGgb2WDWZ77pmBZvx8Y9TXzA/szIRGRtPwrdz0jQox3XcvJ/Wj2l9lEjkzpJCTV8NWqnilPM88V5SxoEUn0HBBjpZuXwKIwAcucZ6TKonSCMbJL1Ui0hbuEYq6hHRjjSIkVC6nyrrCp12uNVGaK7icZv1ZXN5a0z2ZFkt6vOA719bhC7fydSnC3BuAmT4Ha3EuxVdq/2y8cnQ4OKbI2tsy0WsEN2zC7YJasxYB32wl7ZW+HD2/D+eTtfX73CMGeVjYR38Al3jMU5</latexit> <latexit sha1_base64="OciOVARK1sKEoilDge+XCapw8Sg=">AAACfHicbVFdbxJBFB1WrS3alupjXyaiCaYt2cWm9ZGoiT7wgGn5SICQ2csFJszObmbuNpANP8FX/W3+GeMsYCLQm0xycs7czxMmSlry/d8F78nTZwfPD4+KL14en5yWzl61bZwawBbEKjbdUFhUUmOLJCnsJgZFFCrshLPPud55QGNlrO9pkeAgEhMtxxIEOeoOhrVhqexX/VXwfRBsQJltojk8K0B/FEMaoSZQwtpe4Cc0yIQhCQqXxX5qMREwExPsOahFhHaQrWZd8neOGfFxbNzTxFfs/xmZiKxdRKH7GQma2l0tJx/TeimNPw4yqZOUUMO60ThVnGKeL85H0iCQWjggwEg3K4epMALInWery6p2grC1STZPtYR4hDusojkZ4UiLFAmp862yhtTpnDdkiO4mGv+prmwuV77IiSR72XAe6MuvBnH2fi/F2RLsmrAP2rVq8KFa+35drn/aGHTIztkbVmEBu2V19o01WYsBm7Af7Cf7VfjjvfUuvKv1V6+wyXnNtsK7+Qt5osU6</latexit> <latexit sha1_base64="DGIsGN9ixCJF6GsZzWTuQPbmAhU=">AAACfHicbVHLSiNBFK30OBp1fM0s3RTGAQc1dKvoLEUFXWShaFSIIVTf3CRFqqubqtuS0OQT3Oq3+TNidYxgEi8UHM6p+zxhoqQl338teD9mfs7OFecXFn8tLa+srv2+tXFqAKsQq9jch8KikhqrJEnhfWJQRKHCu7B7mut3j2isjPUN9ROsR6KtZUuCIEddQ2O/sVryy/4w+DQIRqDERnHZWCvAQzOGNEJNoIS1tcBPqJ4JQxIUDhYeUouJgK5oY81BLSK09Ww464D/dUyTt2LjniY+ZL9mZCKyth+F7mckqGMntZz8Tqul1Ppfz6ROUkINH41aqeIU83xx3pQGgVTfAQFGulk5dIQRQO48Y12GtROEsU2yXqolxE2cYBX1yAhHWqRISJ1vlVWkTnu8IkN0N9H4qbqyubx1JtuS7E7FeaB3zg1i999UirMlmDRhGtzulYP98t7VQen4ZGRQka2zDbbFAnbEjtkFu2RVBqzNntgzeym8eZvetrf78dUrjHL+sLHwDt8Be7jFOw==</latexit> <latexit sha1_base64="JOt/1H2zqv7i0ww80DAT2XJ/owU=">AAACfHicbVHLSiNBFK30OBp1fM0s3RTGAQc1dKvoLEUFXWShaFSIIVTf3CRFqqubqtuS0OQT3Oq3+TNidYxgEi8UHM6p+zxhoqQl338teD9mfs7OFecXFn8tLa+srv2+tXFqAKsQq9jch8KikhqrJEnhfWJQRKHCu7B7mut3j2isjPUN9ROsR6KtZUuCIEddQ+OgsVryy/4w+DQIRqDERnHZWCvAQzOGNEJNoIS1tcBPqJ4JQxIUDhYeUouJgK5oY81BLSK09Ww464D/dUyTt2LjniY+ZL9mZCKyth+F7mckqGMntZz8Tqul1Ppfz6ROUkINH41aqeIU83xx3pQGgVTfAQFGulk5dIQRQO48Y12GtROEsU2yXqolxE2cYBX1yAhHWqRISJ1vlVWkTnu8IkN0N9H4qbqyubx1JtuS7E7FeaB3zg1i999UirMlmDRhGtzulYP98t7VQen4ZGRQka2zDbbFAnbEjtkFu2RVBqzNntgzeym8eZvetrf78dUrjHL+sLHwDt8Bfc7FPA==</latexit> <latexit sha1_base64="wI9CgGlL/wh61/YzwYNb5yZoG+8=">AAACfHicbVHLSitBEO2Mb72+l24acy8oapjxvRQVdJGFco0KMYSeSiU26ekZumskYfAT3Oq3+TNiT4xgEgsaDud0PU+YKGnJ998L3tj4xOTU9Mzs3J/5hcWl5ZVbG6cGsAKxis19KCwqqbFCkhTeJwZFFCq8C9tnuX73hMbKWN9QN8FaJFpaNiUIctR/qB/Ul4p+ye8FHwVBHxRZP67qywV4aMSQRqgJlLC2GvgJ1TJhSILC59mH1GIioC1aWHVQiwhtLevN+sz/OabBm7FxTxPvsT8zMhFZ241C9zMS9GiHtZz8Taum1DyuZVInKaGGr0bNVHGKeb44b0iDQKrrgAAj3awcHoURQO48A116tROEgU2yTqolxA0cYhV1yAhHWqRISJ1vlZWlTju8LEN0N9H4rbqyubxxLluS7HbZeaC3Lwxie3MkxdkSDJswCm53S8Feafd6v3hy2jdomq2xdbbBAnbETtglu2IVBqzFXtgreyt8eH+9LW/n66tX6OessoHwDj8Bf+TFPQ==</latexit> <latexit sha1_base64="WOlJgwemx+DmvqbfEWG3xF6xG2Q=">AAACfHicbVFdSxtBFJ2sVlPbaqKPfRlMC5Zq2I0SfRQr6EMelDYqxBBmb26SIbOzy8xdSVjyE3zV39Y/UzobI5jECwOHc+Z+njBR0pLv/y14K6sf1taLHzc+ff6yuVUqb9/YODWATYhVbO5CYVFJjU2SpPAuMSiiUOFtOPyV67cPaKyM9R8aJ9iORF/LngRBjvoNnXqnVPGr/jT4MghmoMJmcdUpF+C+G0MaoSZQwtpW4CfUzoQhCQonG/epxUTAUPSx5aAWEdp2Np11wr87pst7sXFPE5+ybzMyEVk7jkL3MxI0sItaTr6ntVLqnbQzqZOUUMNLo16qOMU8X5x3pUEgNXZAgJFuVg4DYQSQO89cl2ntBGFuk2yUaglxFxdYRSMywpEWKRJS51tlDanTEW/IEN1NNL6qrmwu753LviS733Ae6P0Lgzj8sZTibAkWTVgGN7VqcFitXR9VTs9mBhXZV7bL9ljAjtkpu2RXrMmA9dkje2LPhX/eN++nd/Dy1SvMcnbYXHj1/4H6xT4=</latexit>
<latexit sha1_base64="b/lZi7cHtUhY0qgyTwdfMpaH82g=">AAACfnicbVFdSxtBFL1ZW6u2WrWPfRkaLAoad6Ogj1IL9iEPFowKMYTZyU28ZHZ2mbkrCUt+g6/60/w3zsYUmsQLA4dz5n6eONPkOAxfKsHSh4/Ln1ZW1z5/Wd/4urm1fe3S3CpsqlSn9jaWDjUZbDKxxtvMokxijTfx4LzUbx7QOkrNFY8ybCeyb6hHSrKnmnGnqI87m9WwFk5CLIJoCqowjcvOVkXddVOVJ2hYaelcKwozbhfSMimN47W73GEm1UD2seWhkQm6djGZdix2PNMVvdT6Z1hM2P8zCpk4N0pi/zORfO/mtZJ8T2vl3DttF2SynNGot0a9XAtORbm66JJFxXrkgVSW/KxC3UsrFfsDzXSZ1M5QzWxSDHNDKu3iHKt5yFZ60iEnkky5VdEgkw9Fg2L0NzH4T/VlS3n3N/WJ3X7Du2D2LyziYG8hxdsSzZuwCK7rteioVv97XD37NTVoBb7DD9iFCE7gDP7AJTRBAcEjPMFzAMHP4CA4fPsaVKY532AmgtNXo8DFRg==</latexit> <latexit sha1_base64="HFj6g0RuKsIntz/MrjsPqy3QnNo=">AAACfnicbVFdSxtBFL3ZqvX7oz76MhgqChp3tVAfxQr6kAeFRoUYwuzkJl4yO7vM3C0JS35DX9uf1n/T2RjBJF4YOJwz9/PEmSbHYfivEnxaWFz6vLyyura+sbm1vfPlwaW5VdhQqU7tUywdajLYYGKNT5lFmcQaH+P+j1J//IXWUWp+8jDDViJ7hrqkJHuqEbeL81F7uxrWwnGIeRBNQBUmcdfeqajnTqryBA0rLZ1rRmHGrUJaJqVxtPqcO8yk6sseNj00MkHXKsbTjsRXz3REN7X+GRZj9n1GIRPnhknsfyaSX9ysVpIfac2cuxetgkyWMxr12qiba8GpKFcXHbKoWA89kMqSn1WoF2mlYn+gqS7j2hmqqU2KQW5IpR2cYTUP2EpPOuREkim3Kupk8oGoU4z+JgbfVF+2lA+vqUfsjuveBXN8YxH7R3Mp3pZo1oR58HBWi85rZ/ffqpdXE4OWYQ/24RAi+A6XcAt30AAFBL/hD/wNIDgIToLT169BZZKzC1MRXPwHpdfFRw==</latexit>
<latexit sha1_base64="Mrac9AmGg1pDfULu3wtz7vyya0Y=">AAACf3icbVFdSxtBFJ2sbY3a1q9HX4aGYgoh7PqB+ha0oA95sNDEQBLC7M1NHDM7u8zclYQl/8FX/Wf+m87GCCbphYHDOXM/T5goacn3Xwve2qfPX9aLG5tbX799397Z3WvaODWADYhVbFqhsKikxgZJUthKDIooVHgXjq5y/e4RjZWx/kuTBLuRGGo5kCDIUU3oZcen095Oya/6s+CrIJiDEpvHbW+3AJ1+DGmEmkAJa9uBn1A3E4YkKJxudlKLiYCRGGLbQS0itN1sNu6U/3RMnw9i454mPmM/ZmQisnYShe5nJOjeLms5+T+tndLgvJtJnaSEGt4aDVLFKeb57rwvDQKpiQMCjHSzcrgXRgC5Cy10mdVOEBY2ycaplhD3cYlVNCYjHGmRIiF1vlVWlzod87oM0d1E47vqyuZy+bccSrKVurNBV64N4ujXSoqzJVg2YRU0j6rBcfXoz0mpdjk3qMgO2A9WZgE7YzV2w25ZgwF7YE/smb14Be/Qq3r+21evMM/ZZwvhXfwDPbHFhw==</latexit>
<latexit sha1_base64="5PN1cqdjLamdY7CGZ+vd+T/Tydo=">AAACf3icbVFdSxtBFJ2sbY3a1q9HX4aGYgoh7Kr48Ra0oA95sNDEQBLC7M1NHDM7u8zclYQl/8FX/Wf+m87GCCbphYHDOXM/T5goacn3Xwve2qfPX9aLG5tbX799397Z3WvaODWADYhVbFqhsKikxgZJUthKDIooVHgXjq5y/e4RjZWx/kuTBLuRGGo5kCDIUU3oZcen095Oya/6s+CrIJiDEpvHbW+3AJ1+DGmEmkAJa9uBn1A3E4YkKJxudlKLiYCRGGLbQS0itN1sNu6U/3RMnw9i454mPmM/ZmQisnYShe5nJOjeLms5+T+tndLgvJtJnaSEGt4aDVLFKeb57rwvDQKpiQMCjHSzcrgXRgC5Cy10mdVOEBY2ycaplhD3cYlVNCYjHGmRIiF1vlVWlzod87oM0d1E47vqyuZy+bccSrKVurNBV64N4ujXSoqzJVg2YRU0j6rBcfXoz0mpdjk3qMgO2A9WZgE7YzV2w25ZgwF7YE/smb14Be/Qq3r+21evMM/ZZwvhXfwDP8jFiA==</latexit>
<latexit sha1_base64="kaC645jKDfiAyiPu7G+ZAQe5fW0=">AAACf3icbVFdSwJBFB23b/vSeuxlSKKCkF0LqreooB58KEgNTGR2vOrk7OwyczeUxf/Qa/2z/k2ztkFqFwYO58z9PH4khUHX/co5C4tLyyura/n1jc2t7UJxp27CWHOo8VCG+tlnBqRQUEOBEp4jDSzwJTT8wU2qN95AGxGqJxxF0ApYT4mu4AwtVffbiXcxbhdKbtmdBJ0HXgZKJIuHdjHHXzohjwNQyCUzpum5EbYSplFwCeP8S2wgYnzAetC0ULEATCuZjDumB5bp0G6o7VNIJ+zfjIQFxowC3/4MGPbNrJaS/2nNGLsXrUSoKEZQ/KdRN5YUQ5ruTjtCA0c5soBxLeyslPeZZhzthaa6TGpHwKc2SYaxEjzswAwrcYiaWdIABkyodKukKlQ8pFXhg72Jgl/Vlk3lo1vRE2hOqtYGdXKnAQbHcynWFm/WhHlQr5S903Ll8ax0dZ0ZtEr2yD45Ih45J1fknjyQGuHklbyTD/Lp5JxDp+y4P1+dXJazS6bCufwGPavFhw==</latexit>
…
18 vertices
36 vertices
<latexit sha1_base64="0KVnyYv6DtN8OkwP1tQAIjKB8QQ=">AAACfHicbVFdbxJBFB1WWyttLdVHXybSJjStZBeN+kjUxD7wQKN8JEDI3eECE2ZnNzN3DWTDT/BVf5t/xjgLNCnQm0xycs7czxMmSlry/b8F78nTg8NnR8+LxyenL85K5y/bNk6NwJaIVWy6IVhUUmOLJCnsJgYhChV2wtmXXO/8RGNlrH/QIsFBBBMtx1IAOeo7DINhqexX/VXwfRBsQJltojk8L4j+KBZphJqEAmt7gZ/QIANDUihcFvupxQTEDCbYc1BDhHaQrWZd8kvHjPg4Nu5p4iv2YUYGkbWLKHQ/I6Cp3dVy8jGtl9L40yCTOkkJtVg3GqeKU8zzxflIGhSkFg6AMNLNysUUDAhy59nqsqqdoNjaJJunWop4hDusojkZcKRFikDqfKusIXU65w0ZoruJxnvVlc3lylc5kWRvGs4DffPNIM6u9lKcLcGuCfugXasG76q1u/fl+ueNQUfsNXvDKixgH1md3bImazHBJuwX+83+FP55F96193b91Stscl6xrfA+/AdzXMU3</latexit>
Approximation Ratio
'
(.*+
= 2.18
# of c nodes = 41(
# of b nodes = 21(
# of a nodes = 1
Ground truth density: 21
KS-Approx density:
23
(3456
Approx Ratio:
(3456
(
Enlarge the
graph
[Khuller and Saha. On finding dense subgraphs. ICALP. 2009].
7∗.
7∗.
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD
2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
48
48. Densest Directed Subgraph: An Exact
Algorithm
• (", $)-core: An (S, T)-induced
subgraph:
• Every node in S has outdegree ≥ ".
• Every node in T has indegree ≥ $.
• S and T not necessarily disjoint.
• H = ({a,b}, {c,d}) is a (2, 2)-core.
12/13/22 CUHK-Shenzhen, China
c
a b
d
e
⇤
<latexit sha1_base64="vK5hisxuwaLuWz+t3EWeuy906m4=">AAACfHicbVHZSgMxFE3Hre7boy/BKriWGRX1UVTQhz5U7CLUKpn0toZmMkNyR1qGfoKv+m3+jJipFWzrhcDhnNz1+JEUBl33M+NMTE5Nz2Rn5+YXFpeWV1bXKiaMNYcyD2WoH3xmQAoFZRQo4SHSwAJfQtVvX6V69RW0EaEqYTeCesBaSjQFZ2ip+9LT3vNKzs27/aDjwBuAHBlE8Xk1wx8bIY8DUMglM6bmuRHWE6ZRcAm9ucfYQMR4m7WgZqFiAZh60p+1R7ct06DNUNunkPbZvxkJC4zpBr79GTB8MaNaSv6n1WJsntcToaIYQfGfRs1YUgxpujhtCA0cZdcCxrWws1L+wjTjaM8z1KVfOwI+tEnSiZXgYQNGWIkd1MySBjBgQqVbJQWh4g4tCB/sTRT8qrZsKu9ci5ZAc1CwHqiDGw3Q3h1LsbZ4oyaMg8pR3jvOH92d5C4uBwZlyQbZJDvEI2fkgtySIikTTlrkjbyTj8yXs+XsO4c/X53MIGedDIVz+g1Hc8Ui</latexit>
⇤
<latexit sha1_base64="IfdjkWd9tC1nJRISm8srvbkdDxo=">AAACfHicbVHLSgMxFE3HV32/lm6CVaivMqOiLkUFXXRR0bZCrZJJb2toJjMkd6Rl6Ce41W/zZ8RMrWBbLwQO5+Q+jx9JYdB1PzPOxOTU9Ex2dm5+YXFpeWV1rWLCWHMo81CG+sFnBqRQUEaBEh4iDSzwJVT99mWqV19BGxGqe+xGUA9YS4mm4AwtdXf3tPu8knMLbj/oOPAGIEcGUXpezfDHRsjjABRyyYypeW6E9YRpFFxCb+4xNhAx3mYtqFmoWACmnvRn7dFtyzRoM9T2KaR99m9GwgJjuoFvfwYMX8yolpL/abUYm2f1RKgoRlD8p1EzlhRDmi5OG0IDR9m1gHEt7KyUvzDNONrzDHXp146AD22SdGIleNiAEVZiBzWzpAEMmFDpVklRqLhDi8IHexMFv6otm8r5K9ESaPaL1gO1f60B2jtjKdYWb9SEcVA5LHhHhcPb49z5xcCgLNkgmyRPPHJKzskNKZEy4aRF3sg7+ch8OVvOnnPw89XJDHLWyVA4J99FW8Uh</latexit>
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020].
[Ma et al. On Densest Subgraph Discovery. TODS 2021].
49
49. Densest Directed Subgraph: Core-Exact
Theorem: The DDS of G is contained in the (
"∗
$ %
,
%⋅"∗
$
)-
core.
• a =
)∗
|+∗|
-- unknown; search through all
,
-
: 0 < 1, 2 ≤ 4.
• 6∗
-- unknown: start with good bounds and use binary search.
• E.g., lower bound = any 2-approx. solution and upper bound = 2 ×
lower bound.
• Still 9(4$
:;%<=>?@) but much faster in practice – smaller flow
graphs.
12/13/22 CUHK-Shenzhen, China
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
50
50. Densest Directed Subgraph: DC-Exact
• Uses a “divide and conquer” approach.
• For a given
!
"
, result of binary search for “best” (S,T) pair
gives enough info. about subranges of ratios that can be
skipped.
• Algorithm DC-Exact: $ %&'()*+,- , e.g., …
• % ≪ /0
in practice.
12/13/22 CUHK-Shenzhen, China
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
51
51. Densest Directed Subgraph: Core-Approx
• G[S,T] – (x,y)-core of G. Then ! ", $ ≥ &'.
• Let [&∗
, '∗
] be the max core-number pair, i. e. , it
maximizes &' among all (&, ')-cores.
• !∗
≤ 2 &∗'∗.
• èThe (&∗
, '∗
)-core is a 2-approx. solution to DDS.
12/13/22 CUHK-Shenzhen, China
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
52
52. Densest Directed Subgraph: Core-Approx
• Naïve implementation: for each !, compute all (!, $)-
cores, 0 < $ < (, and return (!∗
, $∗
)-core
à *(( + + ( ) time.
• Can we do better?
12/13/22 CUHK-Shenzhen, China
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
53
53. Densest Directed Subgraph: Core-Approx
12/13/22 CUHK-Shenzhen, China
x
8
5
2
7
1
6
y
4
3
7
4 8
2
1 3 6
5
?
<latexit sha1_base64="9Gq6BvRBrxDcJNRZdQw3Wu8S3uk=">AAACfnicbVFdSxtBFL3Zfln7pfXRl8HQYsHGXVvQR7FCfciDhUaFJMjdyU28ZHZ2mbkrCYu/oa/tT+u/6WyMYJJeGDicM/fzpIVhL3H8txE9efrs+Yu1l+uvXr95+25j8/2Fz0unqaNzk7urFD0ZttQRFkNXhSPMUkOX6fhbrV/ekvOc258yLaif4cjykDVKoDo9L+iuN5pxK56FWgXJHDRhHufXmw3dG+S6zMiKNuh9N4kL6VfohLWhu/Ve6alAPcYRdQO0mJHvV7Np79SHwAzUMHfhWVEz9nFGhZn30ywNPzOUG7+s1eT/tG4pw6N+xbYohay+bzQsjZJc1aurATvSYqYBoHYcZlX6Bh1qCQda6DKrXZBe2KSalJZ1PqAl1shEHAbSk2TItt6qarMtJ6rNKYWbWHpQQ9la3j3lEYvfawcX7N53RzT+tJISbEmWTVgFFwet5Evr4MfX5vHJ3KA12IYd2IUEDuEYzuAcOqCB4Rf8hj8RRB+jz9H+/deoMc/ZgoWIjv4B/bfFcQ==</latexit>
?
<latexit sha1_base64="9Gq6BvRBrxDcJNRZdQw3Wu8S3uk=">AAACfnicbVFdSxtBFL3Zfln7pfXRl8HQYsHGXVvQR7FCfciDhUaFJMjdyU28ZHZ2mbkrCYu/oa/tT+u/6WyMYJJeGDicM/fzpIVhL3H8txE9efrs+Yu1l+uvXr95+25j8/2Fz0unqaNzk7urFD0ZttQRFkNXhSPMUkOX6fhbrV/ekvOc258yLaif4cjykDVKoDo9L+iuN5pxK56FWgXJHDRhHufXmw3dG+S6zMiKNuh9N4kL6VfohLWhu/Ve6alAPcYRdQO0mJHvV7Np79SHwAzUMHfhWVEz9nFGhZn30ywNPzOUG7+s1eT/tG4pw6N+xbYohay+bzQsjZJc1aurATvSYqYBoHYcZlX6Bh1qCQda6DKrXZBe2KSalJZ1PqAl1shEHAbSk2TItt6qarMtJ6rNKYWbWHpQQ9la3j3lEYvfawcX7N53RzT+tJISbEmWTVgFFwet5Evr4MfX5vHJ3KA12IYd2IUEDuEYzuAcOqCB4Rf8hj8RRB+jz9H+/deoMc/ZgoWIjv4B/bfFcQ==</latexit>
Candidates
[ ⇤
, ⇤
]
Main idea:
for each ! ≤ #, search for the
largest %;
for each % ≤ #, search for the
largest !;
&( ( ⋅ (* + ()) time.
Max equal pair: (#, #).
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
54
54. Sample Experiment Results: Exact Algorithms
12/13/22 CUHK-Shenzhen, China
Up to 6 orders of magnitude faster
Datasets
MO: (~200, ~2.6K)
TC: (~1.2K, ~2.7K)
OF: (~3K, ~30K)
AD: (~6.4K, ~57K) )
AM: (~400K, ~3.4M)
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020].
55
55. Sample Experiment Results: Approx Algorithms
12/13/22 CUHK-Shenzhen, China
Up to 6 orders of magnitude faster
Datasets
MO: (~200, ~2.6K)
TC: (~1.2K, ~2.7K)
OF: (~3K, ~30K)
AD: (~6.4K, ~57K) )
AM: (~400K, ~3.4M)
AR: (~3.4M, ~5.8M)
BA: (~2.1M, ~17.8M)
TW: (~52.6M, ~1.96B)
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020].
[Bahmani, Kumar, Vassilvitskii. Densest Subgraph in Streaming and MapReduce. VLDB 2012].
56
56. Better Approximation Ratio?
• Propose a new LP formulation for DDS problem
• A divide-and-conquer algorithmic framework
• An efficient (1 + $)-approximation algorithm
• An efficient exact algorithm
• Up to 3 orders of magnitude faster than the state-of-the-
art exact and approximation algorithms
12/13/22 CUHK-Shenzhen, China
Any real positive number
[Ma, Fang, Cheng, L., and Han. A Convex-Programming Approach for Efficient Directed Densest Subgraph Discovery.
SIGMOD 2022].
57
57. Recent Progress on DDS
• A Concurrent work from SODA2022
• Gives (1 + $)-approximation in &
'(
(
)
) time via network
flow for undirected graphs
• Can also be extended to directed graphs with extra time cost
• It would be interesting to compare two algos empirically
12/13/22 CUHK-Shenzhen, China
[Chekuri, Quanrud, and Torres. “Densest Subgraph: Supermodularity, Iterative Peeling, and Flow.” SODA 2022].
58
58. Mini Case Study: Covid-19
•Covid-19 Retweets.
1,025,937 retweets involving 660,730 users.
è(660,730 nodes, 835193 edges).
•Largest connected component:
(399,962 nodes, 663,506 edges)
12/13/22 CUHK-Shenzhen, China
Courtesy: Thirumuruganathan, QCRI.
59
59. Directed Densest Subgraph from Covid-19
12/13/22 CUHK-Shenzhen, China
Source Nodes = 777
Target Nodes = 15
Common Nodes = 2
(5 70)-core.
Density: 55.8826
777 nodes “influenced” by
15 “initiators”.
Vaccine side
effects,
Modes of
Transmission.
60
60. Mini Case Study II: Nepal Earthquake
12/13/22 CUHK-Shenzhen, China
• Graph constructed from cascades of tweets collected following the Nepal earthquake,
April 2015.
• 265383 nodes.
• 3898972 edges.
• largest connected component:
• 258756 nodes.
• 3771999 edges.
https://zenodo.org/record/2587475#.Ypkxmi-caFg.
Courtesy: Thirumuruganathan, QCRI.
61
61. Directed Densest Subgraph from Nepal
12/13/22 CUHK-Shenzhen, China
Source Nodes: 122637
Target Nodes: 25233
Common nodes: 20713
(1,51)-core
density: 34.309
Tens of thousands of “initiators”
and more than a hundred thousand of
”influenced”.
Info on damage
and requests for
help.
62
62. • Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
•Combating via Mitigation: A Refresher on
Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 63
63. Propagation/Diffusion Models
12/13/22 CUHK-Shenzhen, China
• How does influence/information
travel in networks?
• Example Phenomena: infection,
product adoption, information,
opinion, rumor, etc.
• Stochastic diffusion models –
discrete/continuous time.
• How can we launch campaigns
to optimize design objectives?
[Kempe,Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
[W. Chen, L., and C. Castillo. Information and Influence Propagation in Social Networks. Morgan-Claypool 2013].
64
64. Influence Maximization
• Core optimization problem in IM: Given a diffusion model M, a network
G = (V, E), model parameters, and problem parameters (e.g., budget). Find a
seed set under budget that maximizes .
expected number of adopters given
initial adopters S (spread).
S ⇢ V M (S)
12/13/22 CUHK-Shenzhen, China 65
e.g., edge propagation probabilities.
65
65. Complexity of IM
• Theorem: The IM problem is NP-hard for several major diffusion models
under both discrete time and continuous time.
12/13/22 CUHK-Shenzhen, China
[Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
66
66. Complexity of Spread Computation
• Theorem: It is #P-hard to compute the expected spread of a node set
under major diffusion models. #simple paths in a digraph.
[Chen, Wang, and Yang. Efficient influence maximization in social networks. KDD 2009].
[Chen, Yuan, and Zhang. Scalable influence maximization in social networks under the linear threshold model.
ICDM 2010].
[W. Chen, L., and C. Castillo. Information and Influence Propagation in Social Networks. Morgan-Claypool 2013].
12/13/22 CUHK-Shenzhen, China
67
67. Properties of Spread Function
is
monotone: S ✓ S0
=) (S) (S0
).
(S)
12/13/22 CUHK-Shenzhen, China
68
68. Properties of Spread Function
is
submodular:
(S)
S ⇢ S0
⇢ V, x 2 V S0
=)
(x|S0
) (x|s), where
(x|S) := (S [ {x}) (S).
marginal gain.
12/13/22 CUHK-Shenzhen, China
[Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
69
69. Approximation of Submodular Function
Maximization
• Theorem: Let be a monotone submodular function, with Let
and resp. be the greedy and optimal solutions. Then
OPT
f : 2V
! R 0 f(;) = 0.
SGrd
S⇤
f(SGrd
) (1
1
e
)f(S⇤
).
[Nemhauser, Woolsey, and Fisher. An analysis of the approximations for maximizing submodular set functions. Math. Prog. 1978].
12/13/22 CUHK-Shenzhen, China
70
70. Approximation of Submodular Function
Maximization
• Theorem: The spread function is monotone and submodular under
various major diffusion models, for both discrete and continuous time.
(.)
12/13/22 CUHK-Shenzhen, China
[Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
71
71. Baseline Approximation Algorithm
Monte Carlo simulations for estimating
expected spread.
Lazy Forward optimization to save useless
updates.
è Greedy still extremely slow on large networks.
[Leskovec, Krause, Guestarin, Faloutsos, VanBriesen, and N. Glance.
Cost-effective outbreak detection in networks. KDD 2007].
[Kempe, Kleinberg, and Tardos. Maximizing the spread
of influence through a social network. KDD 2003].
12/13/22 CUHK-Shenzhen, China
72
72. Reverse Influence Sampling
• A series of algorithms that guarantee a
-approximation to the optimal
expected spread.
• Key : use random reverse reachable sets
(rr-sets) to gauge quality of (candidate) seeds.
(1
1
e
✏)
<latexit sha1_base64="AW/ZWNJ71ORm2nTuWljbif+hLkI=">AAACAXicbVBNS8NAEN34WetX1IvgZbEI9dCSVEGPBS8eK9gPaErZbCft0s0m7G6EEuLFv+LFgyJe/Rfe/Ddu2xy09cHA470ZZub5MWdKO863tbK6tr6xWdgqbu/s7u3bB4ctFSWSQpNGPJIdnyjgTEBTM82hE0sgoc+h7Y9vpn77AaRikbjXkxh6IRkKFjBKtJH69nHZrXiBJDR1sxSyigexYjwS53275FSdGfAycXNSQjkaffvLG0Q0CUFoyolSXdeJdS8lUjPKISt6iYKY0DEZQtdQQUJQvXT2QYbPjDLAQSRNCY1n6u+JlIRKTULfdIZEj9SiNxX/87qJDq57KRNxokHQ+aIg4VhHeBoHHjAJVPOJIYRKZm7FdERMHtqEVjQhuIsvL5NWrepeVGt3l6V6PY+jgE7QKSojF12hOrpFDdREFD2iZ/SK3qwn68V6tz7mrStWPnOE/sD6/AGGeJZN</latexit>
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014].
12/13/22
12/13/22 CUHK-Shenzhen, China 73
73. Reverse Reachable Sets (RR-Sets)
A
B
C
E
D
0.4
0.3
0.6
0.5
0.2
0.3 0.4
• rr-set = sample subgraph of G.
• example of rr-set generation under IC model.
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014].
12/13/22 74
12/13/22 CUHK-Shenzhen, China
74. Reverse Reachable Sets (RR-Sets)
start from a
random node
A
B
C
E
D
0.4
0.3
0.6
0.5
0.2
0.3 0.4
RR-set = {A}
• rr-set = sample subgraph of G.
• example of rr-set generation under IC model.
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014]
12/13/22 75
12/13/22 CUHK-Shenzhen, China
75. Reverse Reachable Sets (RR-Sets)
• An RR-set is a subgraph sample of !
• Generation of RR-sets under the IC model:
start from a
random node
A
B
C
E
D
0.4
0.3
0.6
0.5
0.2
0.3 0.4
sample its/their
incoming edges
RR-set = {A, C, B, E}
add the sampled
neighbors
• Intuition:
– An rr-set is a sample set of nodes that can
influence node A
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014]
12/13/22 76
12/13/22 CUHK-Shenzhen, China
76. Influence Estimation with RR-Sets
• Theorem: Pr[S overlaps a random rr-set] =
!
"
× expected spread of S.
• Family of approx. algorithms: TIM, IMM, Stop-
and-Stare, …
[Tang et al., “Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency”, SIGMOD 2014]
[Tang et al., “Influence Maximization in Near-Linear Time: A Martingale Approach”, SIGMOD 2015]
[Chen et al. An issue in the Martingale Analysis of the Influence Maximization Algorithm IMM. arXiv 2018].
[Nguyen et al., “Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks”,
SIGMOD 2016] à arXiv
[K. Huang, S. Wang, G. Bevilacqua, X. Xiao, and L. Revisiting the Stop-and-Stare Algorithms for Influence
Maximization, PVLDB 2017]
12/13/22
12/13/22 CUHK-Shenzhen, China 77
77. What if objective is not submodular?
12/13/22 CUHK-Shenzhen, China
• Max non-decreasing
non-submodular function.
! "#$%
≥
1
(
1 − e+,-
OPT.
[Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017].
78
78. What if objective is not submodular?
12/13/22 CUHK-Shenzhen, China
• Max non-decreasing
non-submodular function.
! "#$%
≥
1
(
1 − e+,-
OPT.
[Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017].
79
79. What if objective is not submodular?
12/13/22 CUHK-Shenzhen, China
• Max non-decreasing
non-submodular function.
! "#$%
≥
1
(
1 − e+,-
OPT.
[Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017].
80
80. What if the objective is not submodular?
12/13/22 CUHK-Shenzhen, China
to the rescue!
[Lu, Chen, and L. From Competition to complementarity: … Maximization. PVLDB 2016].
• f – monotone but not submodular.
• !, # – monotone and submodular and
! (#) lower (resp. upper) bounds f.
• Let $% ($', $() be the Greedy solution to
max
-⊆/, - 01
2 $ (resp. …) and $34 ∈ {$%, $', $(}
be the best w.r.t. f(.).
Then
81
81. What if the objective is not submodular?
12/13/22 CUHK-Shenzhen, China
to the rescue!
[Lu, Chen, and L. From Competition to complementarity: … Maximization. PVLDB 2016].
! "#$ ≥ max{
!("+)
-("+)
,
/("0
123
)
!("0
123
)
} ⋅ 1 −
1
8
⋅ ! "0
123
.
OPT.
82
82. • Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
•Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 83
83. Filter Bubbles, Echo Chambers, and Polarization
• Selective exposure to viewpoints/issues can engender/worsen
polarization.
[Pariser. The filter bubble: What the Internet is hiding from you. Penguin, 2011].
[Bakshy, Messing, and Adamic. Exposure to ideologically diverse news and opinion on Facebook. Science 2015].
• Aggravated by echo chambers in social media.
[Garrett. Echo chambers online?: Politically motivated selective exposure among internet news users. JCMC 2009].
[Akoglu. Quantifying political polarity based on bipartite opinion networks. ICWSM 2014].
[Amelkin, Singh, and Bogdanov. A Distance Measure for the Analysis of Polar Opinion Dynamics in Social Networks.
TKDD 2019].
[Chen, Lijffijit,, and De Bie. Quantifying and Minimizing Risk of Conflict in Social Media. KDD 2018].
[Garimella, de Morales, Gionis, and Mathioudakis. Quantifying Controversy over Social Media. TOCS 2018].
12/13/22 CUHK-Shenzhen, China 84
84. Balancing Exposure by Connections
• Link Recommendation
[Amelkin and A. K. Singh. Fighting opinion control in social networks via link recommendation. KDD
2019].
[Musco, Musco, and Tourakakis. Minimizing polarization and disagreement in social networks. WWW
2018],.
[Zhu, Bao, and Zhang. Minimizing Polarization and Disagreement in Social Networks via Link
Recommendation. NeurIPS 2021].
12/13/22 CUHK-Shenzhen, China 85
85. Interdisciplinary Approach
• Comprehensive solution goes beyond CS: e.g.,
Polarization Lab https://www.polarizationlab.com
• Interdisciplinary (CS, stats, sociology) approach.
• Real-life experiment by recruiting democrat and republican
volunteers incentivized to follow bots tweeting posts initially
aligned with their ideology but gradually from the other side of
the aisle.
• Complemented with offline tracking and study.
[Bail. Breaking the Social Media Prism. Princeton Univ. Press. 2021].
12/13/22 CUHK-Shenzhen, China 86
86. Balancing via Information Campaigns
• Smart Algorithm Bursts Social Networks' "Filter
Bubbles"
• “Instead of building echo chambers, Facebook, Twitter and
company can tweak their code to broaden exposure to wider
ranges of views.”
• “… results suggest that targeting a strategic group of social
media users and feeding them the right content is more
effective for propagating diverse views through a social media
network …”
12/13/22 CUHK-Shenzhen, China
[IEEE Spectrum Jan 2021. Featuring research of Aslay, Matakos, Galbrun, and Gionis. TKDE 2020].
87
87. Balancing via Information Campaigns
• Information Campaign Approach
[Garimella, Gionis, Parotsidis, and Tatti. Balancing information exposure in social networks. NeurIPS
2018].
[Aslay, Matakos, Galbrun, and Gionis. Maximizing the Diversity of Exposure in a Social Network. TKDE
2020].
[Tu, Aslay, and Gionis. Co-exposure maximization in online social networks. NeurIPS 2020].
• Common assumptions:
• awareness = adoption.
• Adoption of opposing views is independent.
12/13/22 CUHK-Shenzhen, China 88
88. Opinions can have complex interaction
12/13/22 CUHK-Shenzhen, China
Adopted and propagated independently?!
The Liberals claim that … they can
cut Canada’s greenhouse gas
emissions by 40 to 45% below
2005 levels by 2030. They passed
a climate plan, C-12, to set legally
binding emissions targets to reach
net-zero emissions in 2050.
New Democrats supported the
Liberals’ net-zero legislation and
have set an emissions reduction
target of 50 per cent below 2005
levels by 2030.
The Conservatives opposed the
Liberals’ net-zero emissions
legislation and say their climate
plan will meet Paris climate
commitments of 30 per cent below
2005 levels by 2030.
The People’s Party platform argues
that there is “no scientific
consensus” that human activity is
driving climate change and has said
warnings of looming
environmental catastrophe are
exaggerated.
Source: https://newsinteractives.cbc.ca/elections/federal/2021/party-platforms/#section-climate-change
89
89. Opinions can have complex interaction
12/13/22 CUHK-Shenzhen, China
Pure competition.
The Liberals claim that … they can
cut Canada’s greenhouse gas
emissions by 40 to 45% below
2005 levels by 2030. They passed
a climate plan, C-12, to set legally
binding emissions targets to reach
net-zero emissions in 2050.
New Democrats supported the
Liberals’ net-zero legislation and
have set an emissions reduction
target of 50 per cent below 2005
levels by 2030.
The Conservatives opposed the
Liberals’ net-zero emissions
legislation and say their climate
plan will meet Paris climate
commitments of 30 per cent below
2005 levels by 2030.
The People’s Party platform argues
that there is “no scientific
consensus” that human activity is
driving climate change and has said
warnings of looming
environmental catastrophe are
exaggerated.
90
90. Opinions can have complex interaction
12/13/22 CUHK-Shenzhen, China
Partial competition.
The Liberals claim that … they can
cut Canada’s greenhouse gas
emissions by 40 to 45% below
2005 levels by 2030. They passed
a climate plan, C-12, to set legally
binding emissions targets to reach
net-zero emissions in 2050.
New Democrats supported the
Liberals’ net-zero legislation and
have set an emissions reduction
target of 50 per cent below 2005
levels by 2030.
The Conservatives opposed the
Liberals’ net-zero emissions
legislation and say their climate
plan will meet Paris climate
commitments of 30 per cent below
2005 levels by 2030.
The People’s Party platform argues
that there is “no scientific
consensus” that human activity is
driving climate change and has said
warnings of looming
environmental catastrophe are
exaggerated.
91
91. Opinions can have complex interaction
12/13/22 CUHK-Shenzhen, China
Complementation/reinforcement.
The Liberals claim that … they can
cut Canada’s greenhouse gas
emissions by 40 to 45% below
2005 levels by 2030. They passed
a climate plan, C-12, to set legally
binding emissions targets to reach
net-zero emissions in 2050.
New Democrats supported the
Liberals’ net-zero legislation and
have set an emissions reduction
target of 50 per cent below 2005
levels by 2030.
The Conservatives opposed the
Liberals’ net-zero emissions
legislation and say their climate
plan will meet Paris climate
commitments of 30 per cent below
2005 levels by 2030.
The People’s Party platform argues
that there is “no scientific
consensus” that human activity is
driving climate change and has said
warnings of looming
environmental catastrophe are
exaggerated.
92
92. • Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
•Mitigating Filter Bubbles
•A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
A useful digression.
12/13/22 CUHK-Shenzhen, China 93
93. Awareness vs adoption
Higher utility!!
Awareness spreads like epidemic, but adoption depends on UTILITY
[Kalish. A new product adoption model with price advertising and uncertainty, Management Science 1985].
12/13/22 CUHK-Shenzhen, China 94
95. Welfare Maximization: complementary
setting
• Problem: Given social network G = (V,E), propagation
model, item utility model, and budget vector. Find an
allocation of seed nodes to items that maximizes the
expected social welfare.
Expected sum of utilities of
itemsets adopted by users.
12/13/22 CUHK-Shenzhen, China 96
96. What does the theory say?
12/13/22 CUHK-Shenzhen, China
[Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019].
97
97. A simple greedy still works
GREEDY ALGORITHM
Does not require specific
utility-parameters as input
(1 −
$
%
) approximation
12/13/22 CUHK-Shenzhen, China
[Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019].
98
98. Prefix-preserving seed selection - PRIMA
1 −
#
$
%&'()*+
1 −
#
$
%&'(#
,# ,-
1 −
#
$
%&'(-
,)*+ = max
2
b2
Select enough samples corresponding to every
budget of the budget vector
○ Challenge: The number of samples required is not monotone in
budget
12/13/22 CUHK-Shenzhen, China
[Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019].
99
100. Welfare Maximization: competing setting
• Problem: Given social network G = (V,E), propagation
model, item utility model, budget vector, and a fixed
(partial) allocation of seed nodes to items, find an allocation
of seed nodes to items that maximizes the expected
social welfare.
Expected sum of utilities of
itemsets adopted by users.
12/13/22 CUHK-Shenzhen, China 101
101. How hard is (the) competition?
12/13/22 CUHK-Shenzhen, China
[Banerjee, Chen, and L. Maximizing Social Welfare in a Competitive Diffusion Model. PVLDB 2021].
102
102. [Banerjee, Chen, and L. Maximizing Social Welfare in a Competitive Diffusion Model. PVLDB 2021].
General case algorithm - SeqGRD
!"
!#
$# $% $"
• Instance dependent approximation :
&'()
&'*+
(- −
-
/
)123
• Sort the items based on their utilities – {$# > $% > ⋯ > $"}
!%
…
… ∑!9
12/13/22 CUHK-Shenzhen, China
$":; = max exp.
utility of any
bundle.
$"9<= exp. min
utility of any item.
PRIMA+.
103
103. • Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
•Mitigating Filter Bubbles
• A User Utility Perspective
•A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 104
104. Filter bubble problem
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
NAY!
• Items (opinions) are complementary objective-wise
• Items (opinions) are competing propagation-wise
[Garrett Echo chambers online?: Politically motivated selective exposure among Internet news users. Journal of computer-mediated communication 2009].
[Aslay, Matakos, Galbrun, and Gionis. Maximizing the Diversity of Exposure in a Social Network. TKDE 2020].
12/13/22 CUHK-Shenzhen, China 105
105
105. Problem: Key Ingredients
§Competition parameter
§ After being influenced, adopt the second item w.p. = !, 0 ≤ ! < 1
§(Host’s) Reward of adoption is supermodular, models
complementarity
§ &, for the first item
§ & + Δ, for the second item, & < Δ
§Expected (host) utility for user adopting both & + !Δ
§Goal is to maximize the sum of utilities under a competition-
driven diffusion
12/13/22 CUHK-Shenzhen, China 106
[Banerjee. Welfare maximization… influence. PhD Thesis. UBC. 2022].
106
106. Filter bubble mitigation
• There is an existing bubble
• A more general setting
Item A
Problem FB Mitigation (FBM): Given graph ! = #, %, & ,
competition parameter ', 0 < ' < 1, fixed A seeds +,, and
budget -, find B seeds +., such that +. ≤ - and the
expected welfare is maximized.
12/13/22 CUHK-Shenzhen, China
107
107. Inherent Challenges – Strike One
• FBM is neither monotone nor submodular.
• Restricted (sequential) setting: propagation of follower
doesn’t start before that of leader ends. FBM in the
sequential setting is monotone and submodular! J
• But wait! FBM can be arbitrarily worse than FBM$%& and
vice versa! L
12/13/22 CUHK-Shenzhen, China 108
108. Another Attempt
12/13/22 CUHK-Shenzhen, China
Item A
First
Level
Competition
Item B
• Expected reward at each FLC node = ! + #Δ.
Surrogate objective: Expected # FLC nodes ×
(! + #Δ).
• Clearly a lower bound for FBM.
• But the FLC objective is neither monotone
nor submodular.
109
109. Algorithm 1 – SPReadGRD
• Greedily selects B seeds that maximize the marginal
spread
• Ignore the welfare objective
• PRIMA+ is used to do the seed selection
• Given fixed !"
, PRIMA selects !#
, such that
• %(!#
∪ !"
) = 1 −
,
-
− . %(!#∗
∪ !"
)
12/13/22 CUHK-Shenzhen, China 110
110
110. Analyzing SpreadGRD
• Given !, for the welfare function # the following holds:
• $% ! ≤ # ! ≤ $ + (Δ %(!)
• SPRGRD therefore has the following bound:
# !,
∪ !.
≥ $ ⋅ % !,
∪ !.
≥ $ ⋅ 1 −
1
3
− 4 ⋅ % !,
∪ !∗
≥
$
(Δ + $
(1 −
1
3
− 4)#(!,
∪ !∗
)
12/13/22 CUHK-Shenzhen, China 111
111
111. Algorithm 2 – Sandwich
• Assume a tattler diffusion model
• A node influences its neighbors, with every item in the
awareness set
• !" # ≥ !(#)
• !"(⋅) is monotone and submodular
12/13/22 CUHK-Shenzhen, China 112
112
112. Algorithm 2 – Sandwich
• Assume a tattler diffusion model
• !" # ≥ !(#)
• Assume diffusion model with ' = )
• !* # ≤ !(#)
• !*(⋅) is monotone and submodular
12/13/22 CUHK-Shenzhen, China 113
113
113. Algorithm 2 – Sandwich
• Assume a tattler diffusion model
• !" # ≥ !(#)
• Assume diffusion model with ' = )
• !* # ≤ !(#)
• Using sandwich
• Let #,-./ = 0123045678∈ 5:,5,5<
!(#,=>)
• ! #,-./ ≥ max
B 5<
B< 5<
,
B: 5∗
B 5∗ 1 −
F
G
!(#∗
)
12/13/22 CUHK-Shenzhen, China 114
114
114. Algorithm 3 - NetRewGRD
Item A
Item B
First
Level
Competition
• Extends state of the sampling for
welfare objective
• Reverse reachable trees
• Recursive weight update using a
linear pass
• Scales for large networks
12/13/22 CUHK-Shenzhen, China
[Banerjee. Welfare maximization… influence. PhD Thesis. UBC. 2022].
115
115. Experiments
• Baselines considered:
• COEX: Maximizes co-adoptions of both items
• TDEM: Maximizes welfare based on leaning scores
[Tu, Aslay, and Gionis. Co-exposure maximization in online social networks. NeurIPS 2020].
[Aslay, Matakos, Galbrun, and Gionis. "Maximizing the diversity of exposure in a social network. TKDE 2020]
12/13/22 CUHK-Shenzhen, China
116
117. Sample of Results – Running Time
12/13/22 CUHK-Shenzhen, China 118
118
118. • Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
•Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 119
119. Misinformation Mitigation – Prior Art
• Influence Blocking
• Temporal aspects ignored or not differentiated
• Focus on scalability
[Ceren, Agrawal, and El Abbadi. "Limiting the spread of misinformation in social networks." WWW
2011],
[He, Song, Chen, and Jiang. Influence blocking maximization in social networks under the competitive
linear threshold model. SDM 2012],
[Song,, Hsu, and Lee. Temporal influence blocking: Minimizing the effect of misinformation in social
networks. ICDE 2017],
[Tong,Wu, Guo et al. An efficient randomized algorithm for rumor blocking in online social
networks." IEEE TNSE 2017],
[Tong, Du, and Wu. On misinformation containment in online social networks. NeurIPS 2018],
[Simpson, Srinivasan, and Thomo. Reverse Prevention Sampling for Misinformation Mitigation in
Social Networks. ICDT 2020].
12/13/22 CUHK-Shenzhen, China 120
120. Temporal Aspects of Propagation
[Vosoughi, Roy, and Aral. The spread of true and false news online. Science 2018]
Together these have important consequences for effective seed set selection
[Mitchell, Stocking, and Matsa. Long-form reading shows signs of life in our mobile news world. Pew
Research Center 2016]
Misinformation spreads faster, farther, and wider than truth! Adoption decisions
have varying lengths
12/13/22 CUHK-Shenzhen, China 121
121. Temporal Aspects of Propagation
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022]
• Associate meeting probabilities with each edge
• User reaction times sampled from a data-driven distribution
t = 0 t = 2 t = 3 t = 6
12/13/22 CUHK-Shenzhen, China
Adoption decisions of !", !$, !%, !&, !' uncontested.
!( faces a tie; broken with a random permutation, e.g., !', !" .
F->3.
DW: [3,6].
M->4.
Tie!
122
122. Misinformation Mitigation Problem
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB
2022]
Reward function !(⋅) measures effectiveness of mitigation
P1 is not submodular!
P1: Given fake seeds %& and reward function !(⋅),
find a seed set that maximizes the expected reward
12/13/22 CUHK-Shenzhen, China
Truth reaches well
before misinfo.
Truth arrives too late!
123
123. Sandwiching the Mitigation Objective
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022]
Observe: Supermodular behavior arises due to joint effect of mitigation seeds, i.e. acting
alone they would not achieve the same reward.
LB: Maximum reward over singleton seed sets from !" (tight).
!" = {%&, %(}
LB = *+,
-∈{/0,/1}
2(%4, {5})
12/13/22 CUHK-Shenzhen, China 124
124. Sandwiching the Mitigation Objective
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022]
Simple Candidate: drop meeting events and enforce dominant tie-breaking.
Tighter UB: remove meeting events on edges that can be traversed by both sides.
!" = {%&, %(}
12/13/22 CUHK-Shenzhen, China 125
125. Importance Sampling
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022]
Observe: only nodes reached
by the misinformation are
eligible for reward.
Idea: only sample roots from
nodes that misinfo campaign
reaches → tighter bounds!
RDR sets: weighted analog to
RR sets for reward probabilities
12/13/22 CUHK-Shenzhen, China 126
126. Experiments
[M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022]
Two settings for selecting misinformation seeds: (1) from top-k influential users and (2) uniformly at random
12/13/22 CUHK-Shenzhen, China
Small # popular instigators. Several bots or newly created puppet accounts.
127
127. Experiments
[M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022]
Reward distribution dominated by uncontested mitigation adoption
12/13/22 CUHK-Shenzhen, China 128
128. Experiments
[M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022]
Mitigation seeds remain effective under simultaneous perturbation of model parameters.
12/13/22 CUHK-Shenzhen, China 129
129. • Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
•Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 130
131. Misinformation Intervention – Prior Art
• Disadvantaging posts with misleading info, deleting
edges, removing nodes, … à too hard?
• No correction for wrong intervention!
[Farajtabar, Mehrdad, et al. Fake news mitigation via point process based intervention. ICML 2017],
[Tong et al. Gelling, and melting, large graphs by edge manipulation. CIKM 2012],
[Khalil, Boutros, Dilkina, and Song. "Scalable diffusion-aware optimization of network topology KDD 2014],
[Chen, Chen, et al. "Node immunization on large graphs: Theory and algorithms." TKDE 2015],
[Medya,, Silva, and Singh. "Approximate Algorithms for Data-driven Influence Limitation." TKDE 2020],
[Caraban et al. "23 ways to nudge: A review of technology-mediated nudging in human-computer
interaction." SIGCHI 2019],
[Caraban, Konstantinou, and Karapanos. "The Nudge Deck: A design support tool for technology-mediated
nudging." ACM Designing Interactive Systems Conference. 2020],
[Bhuiyan et al. "NudgeCred: Supporting News Credibility Assessment on Social Media Through Nudges." CSCW2
2021].
12/13/22 CUHK-Shenzhen, China 132
132. Cost Aware Intervention
[Thirumuruganathan, Simpson, L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
12/13/22 CUHK-Shenzhen, China 133
133. Reward Function
!"
#$%
− reach of item '" after intervention.
!"
$()#$%
− reach of item '" w/ no intervention.
12/13/22 CUHK-Shenzhen, China 134
134. [Thirumuruganathan, Simpson, L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
Cost Aware Intervention
12/13/22 CUHK-Shenzhen, China 135
dEFEND [Shu et al. KDD 2019].
Marked Hawke Process [Mishra et al. CIKM 2016].
135. Experiments
[Thirumuruganathan, Simpson, and L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
NCB-TS: Neural Contextual Bandits w/ Thompson Sampling
CB-TS: Contextual Bandits w/ Thompson Sampling
RB: (Learned) Rule based
CSC: Cost Sensitive Classification
12/13/22 CUHK-Shenzhen, China 136
136. Experiments
[Thirumuruganathan, Simpson, and L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
Real-time Evaluation from Twitter’s stream during 10-Oct-2020 to 10-Nov-2020.
• 5 million tweets w/ 1800 distinct English news articles
• Topics include Politics (32%), Healthcare (26%), Entertainment (30%), Misc. (12%)
Manual Evaluation
• Random sample of 750 viral and non-viral
tweets
• 3 volunteers evaluated intervention
• Accuracy of 92.1%
Automated Evaluation
• Google FactCheck Claim Search API
• TiKL: That is a Known Lie
• Accuracy of 96.6%
12/13/22 CUHK-Shenzhen, China 137
137. • Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
•Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 138
138. Summary
• Efficient detection of dense subgraphs in undirected and
directed graphs is useful for finding filter bubbles and groups
of actors engaged in spreading misinformation.
• In mitigating filter bubbles via information campaigns,
competition between viewpoints/opinions cannot be ignored.
• In mitigating misinformation, it’s critical to incorporate
temporal aspects.
• In misinformation intervention, it’s important to watch your
step and correct your gait in the face of mistakes.
12/13/22 CUHK-Shenzhen, China 139
139. Open Questions – Detection
• Integrating content analysis in going after the “right”
densest subgraphs.
• Can we detect filter bubbles and groups promoting
misinformation as they form?
• Longitudinal: (how) do these groups transform over time?
12/13/22 CUHK-Shenzhen, China 140
140. Open Questions – Countering
• Multiple campaigns of items involving partial/pure
competition, complementation?
• How can we learn propagation probabilities, competition
parameters, utilities from available propagation traces?
• Go beyond expected outcome? E.g., as filter bubbles or
misinformation spreading occur, can we counter them?
12/13/22 CUHK-Shenzhen, China 141
141. Open Questions --
• Case studies reflecting the effect of mitigation campaigns on
filter bubbles and misinformation diffusion.
• Integrating with claim verification and (computational) fact
checking efforts.
• Incentivizing balance of adoption (in case of filter bubbles)
and adoption of truth (in case of misinformation).
12/13/22 CUHK-Shenzhen, China 142
142. Acknowledgments
12/13/22 CUHK-Shenzhen, China
Chenhao Ma Farnoosh Hashemi Glenn Bevilacqua Michael Simpson
HKU UBC UBC->Oracle UBC
Prithu Banerjee Reynold Cheng Saravanan Thirimuruganathan Xiaolin Han
UBC ->Oracle HKU QCRI, HBKU HKU
Xuemin Lin Wenjie Zhang Yixiang Fang Wei Chen Wei Lu
UNSW UNSW CUHK MSRA UBC→LinkedIn
143