SlideShare a Scribd company logo
1 of 143
Download to read offline
On A Quest for Combating
Filter Bubbles and
Misinformation
Laks V.S. Lakshmanan
University of British Columbia
Vancouver, BC, Canada
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence
Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 2
Prolegomenon
• What this talk is not about and will not do for you.
• Classify different kinds of “fake news”: e.g., mis/disinformation ...
• Computational Fact Checking or Claim Verification
• Offer a comprehensive solution to the filter bubble/echo chambers
or “fake news” problems.
• The scope of both stretch beyond just tech (e.g., models and
algorithms).
• Even the “tech-restricted” versions we won’t get to completely solve
today (in this talk).
12/13/22 CUHK-Shenzhen, China 3
Prolegomenon
• Instead, we will examine some (necessarily restricted)
models and formulations of problems.
• Offer a view of how research done in some different
contexts may inspire techniques for solving restricted
versions of the filter bubbles / echo chambers and the
misinformation problems.
• In case I missed your work, …
12/13/22 CUHK-Shenzhen, China 4
Not long ago, or maybe long ago …
12/13/22 CUHK-Shenzhen, China 5
And then came …
12/13/22 CUHK-Shenzhen, China
but arguably also these …
Which led to many great things
6
12/13/22 CUHK-Shenzhen, China 7
•Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 8
["Political polarization 1994-2017." Pew Research Center., Washington, DC October 2017].
Filter Bubble and Echo Chambers exacerbate polarization
12/13/22 CUHK-Shenzhen, China 9
Filter Bubble and Echo Chambers exacerbate polarization
["Political polarization 1994-2017." Pew Research Center., Washington, DC October 2017].
12/13/22 CUHK-Shenzhen, China 10
Political Echo Chambers
● Members of densely connected groups are
likely to have the same opinions and
attitudes.
● Study focus on opposing political echo
chambers (~250K each) on Twitter in Japan.
● Political echo chambers have denser and
more core-periphery information spreading
structures than those of most other
communities.
12/13/22 CUHK-Shenzhen, China
[Asatani et al. Dense and influential core promotion of daily viral information spread in political echo chambers. Scientific
Reports 2021].
11
The Price of Filter Bubbles
• Filter bubbles and echo chambers can impede natural
opinion formation
[Musco, Musco, and Tourakakis. Minimizing polarization and disagreement in social networks. WWW 2018].
• Can lead to one-sided policy decisions
[Perrone and Wieder. Pro-painkiller echo chamber shaped policy amid drug epidemic. The Center for
Public Integrity, 2016].
• And erosion of societal trust
[Nguyen. Echo chambers and epistemic bubbles. Episteme, 2020].
12/13/22 CUHK-Shenzhen, China 12
• Filter Bubbles and Echo Chambers
•Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 13
Misinformation is Not a New Problem
12/13/22 CUHK-Shenzhen, China 14
Economic Cost of Misinformation
12/13/22 CUHK-Shenzhen, China 15
Economic Impact of Misinformation
12/13/22 CUHK-Shenzhen, China
FAKE NEWS: ELECTIONS
THE U.S. TO SPEND $200 MILLION ALONE ADVANCING FAKE
NEWS
$400 MILLION SPENT GLOBALLY ON FAKE POLITICAL NEWS
COVID-19 Vaccine Misinformation and
Disinformation Costs an Estimated $50 to
$300 Million Each Day
[Bruns, Hosangadi, Trotochaud, and Sell. Johns Hopkins
Center for Health Security. 2021].
[U. of Baltimore and CHEQ. The economic
cost of bad actors on the internet. Fake
News 2019].
16
Misinformation Propagation (US Politics)
● The connections between misinformation spreaders are denser than
connections between fact-checkers.
● Increasing the value of k takes us from the periphery to the denser inner
core structure.
12/13/22 CUHK-Shenzhen, China
k-Core decomposition of the pre-Election retweet network. Orange = fact-
checks and purple = claims.
[Shao, Hui, Wang et al. Anatomy of an online misinformation network. PLoS ONE 2018].
18
Misinformation Propagation + Bubbles (Covid-19)
● Echo-chambers with misinformed sub-communities are much denser than
those with informed sub-communities.
12/13/22 CUHK-Shenzhen, China
[Memon and Carley. Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset. CEUR Workshop 2020].
(a) Retweet (b) Mention
(c) Reply
(d) Retweet+Mention+Reply
19
• Filter Bubbles and Echo Chambers
• Misinformation
•Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 20
Densest Subgraphs: Undirected
• What is a good notion of density?
• Classical: average degree: ! " =
$
%
.
• Average #motifs/vertex: ' ", Ψ =
* +,,
%
. '-./ − optimal density.
• E.g., Δ-density.
• More generally, Ψ-density for pattern Ψ (e.g., h-clique).
• Intuition: densest subgraphs may indicate echo chambers.
12/13/22 CUHK-Shenzhen, China
#instances of Ψ (motif) in G.
21
Different notions of density.
12/13/22 CUHK-Shenzhen, China
-densest subgraph.
density = 11/7.
-densest subgraph.
density = 2/4.
• Clique-density.
• Pattern-density.
22
k-cores and k-clique-cores
12/13/22 CUHK-Shenzhen, China
3
2
1, 0
<latexit sha1_base64="h+a8v17wh/Dw4VGrNfEaZ6hpP7Q=">AAAB9XicbVDLSgNBEJyNrxhfUY9eBoPgxbArQT0GvHiMYB6QrGF20kmGzGOZmVXDkv/w4kERr/6LN//GSbIHTSxoKKq66e6KYs6M9f1vL7eyura+kd8sbG3v7O4V9w8aRiWaQp0qrnQrIgY4k1C3zHJoxRqIiDg0o9H11G8+gDZMyTs7jiEUZCBZn1FinXQ/6ohIPaVnE6o0mG6x5Jf9GfAyCTJSQhlq3eJXp6doIkBayokx7cCPbZgSbRnlMCl0EgMxoSMygLajkggwYTq7eoJPnNLDfaVdSYtn6u+JlAhjxiJynYLYoVn0puJ/Xjux/aswZTJOLEg6X9RPOLYKTyPAPaaBWj52hFDN3K2YDokm1LqgCi6EYPHlZdI4LwcX5cptpVStZHHk0RE6RqcoQJeoim5QDdURRRo9o1f05j16L9679zFvzXnZzCH6A+/zB+i2kr8=</latexit>
k-cores
<latexit sha1_base64="9hFQByLqhvsg+DyYKvGpEOrpZNE=">AAACBHicbVDLSgNBEJyNrxhfUY+5DAYhgoZdCeox4MVjBPOA7BJmJ51kyOzMMjMrhiUHL/6KFw+KePUjvPk3Th4HTSxoKKq66e4KY860cd1vJ7Oyura+kd3MbW3v7O7l9w8aWiaKQp1KLlUrJBo4E1A3zHBoxQpIFHJohsPrid+8B6WZFHdmFEMQkb5gPUaJsVInXygNT7FvFCOiz+HEj0L5kJ6NqVSgO/miW3anwMvEm5MimqPWyX/5XUmTCIShnGjd9tzYBClRhlEO45yfaIgJHZI+tC0VJAIdpNMnxvjYKl3ck8qWMHiq/p5ISaT1KAptZ0TMQC96E/E/r52Y3lWQMhEnBgSdLeolHBuJJ4ngLlNADR9ZQqhi9lZMB0QRamxuORuCt/jyMmmcl72LcuW2UqxW5nFkUQEdoRLy0CWqohtUQ3VE0SN6Rq/ozXlyXpx352PWmnHmM4foD5zPHyBll8E=</latexit>
(k, 4)-cores
0 1 2, 3
(", $)-core of G – maximal subgraph where each vertex participates in ≥
' instances of Ψ.
23
Densest Subgraph Discovery
12/13/22 CUHK-Shenzhen, China
Problem: Given a graph G(V, E) and an h-clique Ψ "#, %# ,
find the subgraph D with the highest h-clique density
& ', Ψ .
Ψ can be any pattern: e.g., a 3-star, Δ, etc.
Focus of this talk: h-cliques.
24
SOTA1
: Densest Subgraph Discovery:
Exact
• Binary search to guess the density
• Construct the flow network
• Based on guessed density and original graph
• Use max-flow algorithm to check the
feasibility
• Example: ! = 0, % = 1 (max triangle deg)
• α= (l+r)/2=0.5.
• Run time: '
( )
* − 1
ℎ − 1
+ ) Λ + min ), Λ 2
.
1
As of 2017.
12/13/22 CUHK-Shenzhen, China
[Mitzenmacher, Pachocki, Peng, Tourakakis, and Xu. Scalable large near-clique detection in large-scale networks via
sampling. KDD 2015].
#instances of Ψ.
⇒⇒
25
A
DS Discovery – A Triangle Example
12/13/22 CUHK-Shenzhen, China
B
C
D
s t
Ψ"
Ψ#
Ψ$
Ψ%
0
1
1
1
3&
3&
3&
3&
+∞
+∞
+∞
+∞
+∞
+∞
+∞
+∞
1
1
1
Flow network.
If ) = 0.5
If ) = 1/3
⇐
26
SOTA1
Densest Subgraph Discovery:
Approximation
• Approximation algorithm: PeelApp
• Iteratively peel the vertex w/ smallest h-clique-degree.
• Let !", !$, … be the list of residual subgraphs generated.
• Return !& with the highest density.
• Approximation:
• The density of S is at least
"
'(
⋅ *+,- =
"
/
⋅ *012.
• Running time: time.
12/13/22 CUHK-Shenzhen, China
<latexit sha1_base64="iHkLEsdke5bqZTUfsJFWe3g6ats=">AAACBHicbVDLSsNAFJ34rPUVddnNYBHqoiWRoi5cFNy4s4J9QBPKZDJph05mwsxEKKELN/6KGxeKuPUj3Pk3TtsstPXAhcM593LvPUHCqNKO822trK6tb2wWtorbO7t7+/bBYVuJVGLSwoIJ2Q2QIoxy0tJUM9JNJEFxwEgnGF1P/c4DkYoKfq/HCfFjNOA0ohhpI/Xt0m2FezgUGmZh1fXwUAhF4LDqTk5h3y47NWcGuEzcnJRBjmbf/vJCgdOYcI0ZUqrnOon2MyQ1xYxMil6qSILwCA1Iz1COYqL8bPbEBJ4YJYSRkKa4hjP190SGYqXGcWA6Y6SHatGbiv95vVRHl35GeZJqwvF8UZQyqAWcJgJDKgnWbGwIwpKaWyEeIomwNrkVTQju4svLpH1Wc89r9bt6uXGVx1EAJXAMKsAFF6ABbkATtAAGj+AZvII368l6sd6tj3nripXPHIE/sD5/AEI0lo0=</latexit>
O(n ·
✓
d 1
h 1
◆
)
[Tsourakakis. The k-clique densest subgraph problem. WWW 2015].
1
As of 2017.
27
DSD: SOTA Limitations
• Initial bounds on ! not tight.
• Size of flow network can be large: e.g., large G with
many instances of Ψ.
• Flow network built from original G each time.
• Even PeelApp does redundant work.
12/13/22 CUHK-Shenzhen, China
$, Ψ -core to the rescue!
Can we “bound” the densest subgraph?
28
Bounding Densest Subgraphs with Cores
• Theorem: G, k, Ψ as before. H a (#, Ψ)-core of G. Then:
#
&'
≤ ) *, Ψ ≤ #+,-.
Special case: #+,--core has density in
/012
3
, #+,- .
12/13/22 CUHK-Shenzhen, China
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
h
29
Bounding DSG with cores: An Example
12/13/22 CUHK-Shenzhen, China
For !"#$ = 2 and a 2-core, LB = 1 and UB = 2.
' = 1. ' =
5
4
,
9
6
,
13
8
, ⋯ → 2.
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
30
Bounding Densest Subgraphs with Cores
• Lemma: The DSG of G must be contained in its
(⌈#$%&⌉, Ψ)-core.
12/13/22 CUHK-Shenzhen, China
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
31
Exact algorithm: CoreExact
• Our algorithm: CoreExact
• Follow the same framework as existing exact algorithm
• Three core-based optimization techniques
• Binary search to guess the density
• Construct the flow network
• Based on guessed density and original graph
• Use max-flow algorithm to check the feasibility
12/13/22 CUHK-Shenzhen, China
1. Tighter bounds derived from cores [
"#$%
&'
, )*+,]
2. Build the flow network on cores
3. Locate Clique-densest subgraph in even smaller cores after each checking
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
32
Approximation Algorithms
• IncApp:
• Do a (", Ψ)-core decomposition of G. time.
• Return the ("&'(, Ψ)-core.
•
)
|+,|
=
)
.
-approximation.
• Finding (repeatedly) clique-degree can be expensive for
large cliques.
• CoreApp: Heuristic to directly find ("&'(, Ψ)-core.
12/13/22 CUHK-Shenzhen, China
<latexit sha1_base64="ojo/HvrAsrswEIka12R2Rr1XIFU=">AAACBHicbVDLSsNAFJ3UV62vqMtuBotQFy2JFHVZcOPOCvYBTSiTyaQdOpkJMxOhhC7c+CtuXCji1o9w5984bbPQ1gMXDufcy733BAmjSjvOt1VYW9/Y3Cpul3Z29/YP7MOjjhKpxKSNBROyFyBFGOWkralmpJdIguKAkW4wvp753QciFRX8Xk8S4sdoyGlEMdJGGtjl2yr3cCg0zMKa6+GREIrAUc2dnsGBXXHqzhxwlbg5qYAcrYH95YUCpzHhGjOkVN91Eu1nSGqKGZmWvFSRBOExGpK+oRzFRPnZ/IkpPDVKCCMhTXEN5+rviQzFSk3iwHTGSI/UsjcT//P6qY6u/IzyJNWE48WiKGVQCzhLBIZUEqzZxBCEJTW3QjxCEmFtciuZENzll1dJ57zuXtQbd41K08njKIIyOAFV4IJL0AQ3oAXaAINH8AxewZv1ZL1Y79bHorVg5TPH4A+szx8+mJaB</latexit>
O(n ·
✓
d 1
h 1
◆
)
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
33
Approximation Algorithms
Core App:
1. Sort vertices of G in ↓ order of their h-clique-based core
number, using cheaper proxy.
2. Obtain the max core & core number " from top-#
vertices
3. If the max degree of remaining vertices is larger than "
• # = 2×#, repeat 2.
• Otherwise, output the max core
12/13/22 CUHK-Shenzhen, China
Same worst case time complexity as IncApp and PeelApp (SOTA) but much faster in practice.
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
34
Sample Experiment Results
12/13/22 CUHK-Shenzhen, China
As-Caida (n = 26K, m = 106K). Friendster (n = 20M, m = 106M).
[Fang, Yu, Cheng, L., and Lin. PVLDB 2019].
35
Mini Case Study: Covid-19
•Covid-19 Retweets.
1,025,937 retweets involving 660,730 users.
è(660,730 nodes, 835193 edges).
•Largest connected component:
(399,962 nodes, 663,506 edges)
12/13/22 CUHK-Shenzhen, China
Courtesy: Thirumuruganathan, QCRI.
36
12/13/22 CUHK-Shenzhen, China
Densest subgraph :
86 vertices
18-core
density : 12.5407
Top-2 densest subgraph:
1134 vertices
13-core
density : 10.0150
Cross edges: 296
Side
effects
of
Vaccine
Modes
of
Transm-
ission
of Virus.
Case counts in diff
states and
countries.
37
Mini Case Study II: Voter Fraud 2020
12/13/22 CUHK-Shenzhen, China
Tweets on US Presidential Election 2020.
Number of nodes : 1,385,225
Number of edges : 6,631,720
Number of Tweets: 8,085,323
Size of the largest connected component:
Number of nodes: 1,356,657
Number of edges : 6,611,465
Courtesy: Thirumuruganathan, QCRI.
38
12/13/22 CUHK-Shenzhen, China
1962 vertices
91-core
Density: 83.7665
2206 vertices
54-core
Density: 50.9231 Cross edges: 1385
39
12/13/22 CUHK-Shenzhen, China
Repeated allegations
of voter fraud.
retweeting Sydney
Powell’s tweet
warning states against
certifying the election.
Quoting Trump “dirty
rolls ==> dirty polls”.
big tech is colluding
with dems to defeat
Trump. Vote in person
to fight against mail-in
voter fraud. FBI said
many military mail-in
votes, all for Trump,
were thrown away in
a ditch in PA. Biggest
voter fraud in
American history.
Voting machines
known to be insecure.
Need proof of
citizenship and photo
ID to prevent fraud.
Fact-checkers from AP,
Politifact, &
Reuters confirm -- no
evidence of
widespread election
fraud. Experts confirm
elections are secure;
most of the
interference comes
from misinformation
campaigns. GOP and
Trump team are
sowing disinfo. and
panic. Need to protect
democracy. Trump’s
narrower margin wins
in 2016 vs Biden’s
wider ones in 2020.
Debunk “Deborah
Jean Christiansen’s
vote is fraud” by
quoting her. More
former Trump aides
getting infected than
voter fraud cases!
Quotes of Sydney Powell’s tweet; replies
that there is no evidence of widespread
fraud; Biden brags about having “the most
extensive and inclusive VOTER FRAUD
organization in the history of American
politics; (CNN) dishonesty taxonomy of
Trump rally; Phily Mayor hiding info. from
people. Anyone caught cheating with
Voter Fraud games should be federally
charged; State officials from both parties
stated the election went well. Losing side
refusing to recognize clear winner;
weaving conspiracy theories and
strangling faith and belief.
40
Mini Case Study III: Nepal Earthquake
12/13/22 CUHK-Shenzhen, China
• Graph constructed from cascades of tweets collected following the Nepal
earthquake, April 2015.
• 265383 nodes.
• 3898972 edges.
• largest connected component:
• 258756 nodes.
• 3771999 edges.
Courtesy: Thirumuruganathan, QCRI.
https://zenodo.org/record/2587475#.Ypkxmi-caFg.
41
12/13/22 CUHK-Shenzhen, China
1463 vertices
129-core
density: 105.328
370 vertices
115-core
density : 71.9378
129 edges
Requests
for help
Info on
earthquake –
magnitude,
distance to cities
affected from
capital
Reports
on
damage
and ruin
42
Recent Progress on DSGs
WWW2020
Provide near optimal
via multiple peeling
1 + # -approx within
$(
& '( )
*∗ ⋅
-
./) proved by
[SODA2022]
STOC2020
(1 + #)-approximation
on dynamic graph
With $(log4 5 ⋅ #67)
per edge
insertion/deletion
WWW2020
Define and find
minimal DSG
Minimal: no proper
subgraph is a DSGs
SODA2022
A flow-based 1 + # -
approx algo
With 8
$(
9
.
)
12/13/22 CUHK-Shenzhen, China
[Digvijay, Gao, Peng et al. Flowless: Extracting densest subgraphs without flow computations. WWW 2020].
[Sawlani and Wang. Near-optimal fully dynamic densest subgraph. STOC 2020].
[Chang and Qiao. Deconstruct Densest Subgraphs. WWW 2020].
[Chekuri, Quanrud, and Torres. Densest Subgraph: Supermodularity, Iterative Peeling, and Flow. SODA 2022].
43
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
•Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 44
Directed Densest Subgraphs
12/13/22 CUHK-Shenzhen, China
a
e
d
c
b
!∗
#∗
A directed densest subgraph (DDS) of a digraph
is a pair of vertex sets (S, T). Its density is
<latexit sha1_base64="jzi2npcaaUdd+d3XTNTd2P0/iEE=">AAACGnicbVBNS8NAEN3U7/oV9ehlsQgKUhIp6kUoiuBRsdVCU8pmu2mXbrJxdyKUJL/Di3/FiwdFvIkX/43b2oNaHwy8fW+GnXl+LLgGx/m0ClPTM7Nz8wvFxaXllVV7bf1ay0RRVqdSSNXwiWaCR6wOHARrxIqR0Bfsxu+fDv2bO6Y0l1ENBjFrhaQb8YBTAkZq266nenLnaq+2i4+xFyhC0+xs9M7y1NO3CtLsKvNoRwLOalme47ZdcsrOCHiSuGNSQmNctO13ryNpErIIqCBaN10nhlZKFHAqWF70Es1iQvuky5qGRiRkupWOTsvxtlE6OJDKVAR4pP6cSEmo9SD0TWdIoKf/ekPxP6+ZQHDUSnkUJ8Ai+v1RkAgMEg9zwh2uGAUxMIRQxc2umPaIyQdMmkUTgvv35ElyvV92D8qVy0qpejKOYx5toi20g1x0iKroHF2gOqLoHj2iZ/RiPVhP1qv19t1asMYzG+gXrI8vxligKA==</latexit>
⇢(S, T) =
|E(S, T)|
p
|S| · |T|
-- generalizes edge density from undirected graphs.
Problem: Find !∗
, #∗
with max. %.
[Kannan and Vinay. Analyzing the structure of large graphs. Tech Report 1999].
45
SOTA1
DDS Discovery: Exact
• Repeatedly solve Max-flow, similarly to the undirected case.
• for each value of ! =
|$|
|%|
: 0 < ) , |+| ≤ -
• Find the max density by binary search.
• Build flow network and solve Max-flow.
• Overall time: . -/
01234567 .
• > 2 days on ~1,200 vertices and ~2,600 edges.
12/13/22 CUHK-Shenzhen, China
[Khuller and Saha. On finding dense subgraphs. ICALP 2009].
1As of 2019.
46
SOTA DDS Discovery: Approximation
12/13/22 CUHK-Shenzhen, China
Greedy Peeling Algorithm:
• Build a bipartite graph
(L,R,E) where ! = # = $
• The edges are all from
! copy to # copy
• Each time remove a node
with least degree
• Report densest subgraph
among those obtained.
c
a b
d
e
% & + ( time.
Approximation?
G
[Khuller and Saha. On finding dense subgraphs. ICALP 2009].
47
SOTA DDS Discovery: Approximation
12/13/22 CUHK-Shenzhen, China
• Fix [personal communication with authors].
• 2-approximation algorithm
• !(#(# + %))
KS-Approx
density: 2.75
Ground truth
density: 6
<latexit sha1_base64="Whotl/O/SEtWiMWhAbdFJgi04F4=">AAACfHicbVFdSwJBFB23L7MvrcdehiwoKtm1qB6jgnrwoSgrMJHZ8aqDs7PLzN1Qln5Cr/Xb+jPRrBqkdmHgcM7cz+NHUhh03a+MMzM7N7+QXcwtLa+sruUL648mjDWHKg9lqJ99ZkAKBVUUKOE50sACX8KT371M9adX0EaE6gH7EdQD1laiJThDS937Da+RL7oldxB0GngjUCSjuG0UMvylGfI4AIVcMmNqnhthPWEaBZfwlnuJDUSMd1kbahYqFoCpJ4NZ3+iOZZq0FWr7FNIB+zcjYYEx/cC3PwOGHTOppeR/Wi3G1lk9ESqKERQfNmrFkmJI08VpU2jgKPsWMK6FnZXyDtOMoz3PWJdB7Qj42CZJL1aCh02YYCX2UDNLGsCACZVulVSEinu0InywN1Hwq9qyqbx7JdoCzUHFeqAOrjVAd28qxdriTZowDR7LJe+oVL47Lp5fjAzKkk2yRXaJR07JObkht6RKOGmTd/JBPjPfzraz7xwOvzqZUc4GGQvn5Ad1dMU4</latexit>
<latexit sha1_base64="5KqrWk8OLGSMOzvxwclVW1sn29I=">AAACfHicbVHLSiNBFK20zhgfo0aXbgqj4DAaulXUpaigiywUjQoxhOqbm1ikurqpui0JjZ/gVr/NnxGrYwSTzIWCwzl1nydMlLTk++8Fb2r61++Z4uzc/MKfxaXl0sqtjVMDWINYxeY+FBaV1FgjSQrvE4MiChXehd3TXL97QmNlrG+on2AjEh0t2xIEOeoamkFzuexX/EHwSRAMQZkN47JZKsBDK4Y0Qk2ghLX1wE+okQlDEhQ+zz2kFhMBXdHBuoNaRGgb2WDWZ77pmBZvx8Y9TXzA/szIRGRtPwrdz0jQox3XcvJ/Wj2l9lEjkzpJCTV8NWqnilPM88V5SxoEUn0HBBjpZuXwKIwAcucZ6TKonSCMbJL1Ui0hbuEYq6hHRjjSIkVC6nyrrCp12uNVGaK7icZv1ZXN5a0z2ZFkt6vOA719bhC7fydSnC3BuAmT4Ha3EuxVdq/2y8cnQ4OKbI2tsy0WsEN2zC7YJasxYB32wl7ZW+HD2/D+eTtfX73CMGeVjYR38Al3jMU5</latexit> <latexit sha1_base64="OciOVARK1sKEoilDge+XCapw8Sg=">AAACfHicbVFdbxJBFB1WrS3alupjXyaiCaYt2cWm9ZGoiT7wgGn5SICQ2csFJszObmbuNpANP8FX/W3+GeMsYCLQm0xycs7czxMmSlry/d8F78nTZwfPD4+KL14en5yWzl61bZwawBbEKjbdUFhUUmOLJCnsJgZFFCrshLPPud55QGNlrO9pkeAgEhMtxxIEOeoOhrVhqexX/VXwfRBsQJltojk8K0B/FEMaoSZQwtpe4Cc0yIQhCQqXxX5qMREwExPsOahFhHaQrWZd8neOGfFxbNzTxFfs/xmZiKxdRKH7GQma2l0tJx/TeimNPw4yqZOUUMO60ThVnGKeL85H0iCQWjggwEg3K4epMALInWery6p2grC1STZPtYR4hDusojkZ4UiLFAmp862yhtTpnDdkiO4mGv+prmwuV77IiSR72XAe6MuvBnH2fi/F2RLsmrAP2rVq8KFa+35drn/aGHTIztkbVmEBu2V19o01WYsBm7Af7Cf7VfjjvfUuvKv1V6+wyXnNtsK7+Qt5osU6</latexit> <latexit sha1_base64="DGIsGN9ixCJF6GsZzWTuQPbmAhU=">AAACfHicbVHLSiNBFK30OBp1fM0s3RTGAQc1dKvoLEUFXWShaFSIIVTf3CRFqqubqtuS0OQT3Oq3+TNidYxgEi8UHM6p+zxhoqQl338teD9mfs7OFecXFn8tLa+srv2+tXFqAKsQq9jch8KikhqrJEnhfWJQRKHCu7B7mut3j2isjPUN9ROsR6KtZUuCIEddQ2O/sVryy/4w+DQIRqDERnHZWCvAQzOGNEJNoIS1tcBPqJ4JQxIUDhYeUouJgK5oY81BLSK09Ww464D/dUyTt2LjniY+ZL9mZCKyth+F7mckqGMntZz8Tqul1Ppfz6ROUkINH41aqeIU83xx3pQGgVTfAQFGulk5dIQRQO48Y12GtROEsU2yXqolxE2cYBX1yAhHWqRISJ1vlVWkTnu8IkN0N9H4qbqyubx1JtuS7E7FeaB3zg1i999UirMlmDRhGtzulYP98t7VQen4ZGRQka2zDbbFAnbEjtkFu2RVBqzNntgzeym8eZvetrf78dUrjHL+sLHwDt8Be7jFOw==</latexit> <latexit sha1_base64="JOt/1H2zqv7i0ww80DAT2XJ/owU=">AAACfHicbVHLSiNBFK30OBp1fM0s3RTGAQc1dKvoLEUFXWShaFSIIVTf3CRFqqubqtuS0OQT3Oq3+TNidYxgEi8UHM6p+zxhoqQl338teD9mfs7OFecXFn8tLa+srv2+tXFqAKsQq9jch8KikhqrJEnhfWJQRKHCu7B7mut3j2isjPUN9ROsR6KtZUuCIEddQ+OgsVryy/4w+DQIRqDERnHZWCvAQzOGNEJNoIS1tcBPqJ4JQxIUDhYeUouJgK5oY81BLSK09Ww464D/dUyTt2LjniY+ZL9mZCKyth+F7mckqGMntZz8Tqul1Ppfz6ROUkINH41aqeIU83xx3pQGgVTfAQFGulk5dIQRQO48Y12GtROEsU2yXqolxE2cYBX1yAhHWqRISJ1vlVWkTnu8IkN0N9H4qbqyubx1JtuS7E7FeaB3zg1i999UirMlmDRhGtzulYP98t7VQen4ZGRQka2zDbbFAnbEjtkFu2RVBqzNntgzeym8eZvetrf78dUrjHL+sLHwDt8Bfc7FPA==</latexit> <latexit sha1_base64="wI9CgGlL/wh61/YzwYNb5yZoG+8=">AAACfHicbVHLSitBEO2Mb72+l24acy8oapjxvRQVdJGFco0KMYSeSiU26ekZumskYfAT3Oq3+TNiT4xgEgsaDud0PU+YKGnJ998L3tj4xOTU9Mzs3J/5hcWl5ZVbG6cGsAKxis19KCwqqbFCkhTeJwZFFCq8C9tnuX73hMbKWN9QN8FaJFpaNiUIctR/qB/Ul4p+ye8FHwVBHxRZP67qywV4aMSQRqgJlLC2GvgJ1TJhSILC59mH1GIioC1aWHVQiwhtLevN+sz/OabBm7FxTxPvsT8zMhFZ241C9zMS9GiHtZz8Taum1DyuZVInKaGGr0bNVHGKeb44b0iDQKrrgAAj3awcHoURQO48A116tROEgU2yTqolxA0cYhV1yAhHWqRISJ1vlZWlTju8LEN0N9H4rbqyubxxLluS7HbZeaC3Lwxie3MkxdkSDJswCm53S8Feafd6v3hy2jdomq2xdbbBAnbETtglu2IVBqzFXtgreyt8eH+9LW/n66tX6OessoHwDj8Bf+TFPQ==</latexit> <latexit sha1_base64="WOlJgwemx+DmvqbfEWG3xF6xG2Q=">AAACfHicbVFdSxtBFJ2sVlPbaqKPfRlMC5Zq2I0SfRQr6EMelDYqxBBmb26SIbOzy8xdSVjyE3zV39Y/UzobI5jECwOHc+Z+njBR0pLv/y14K6sf1taLHzc+ff6yuVUqb9/YODWATYhVbO5CYVFJjU2SpPAuMSiiUOFtOPyV67cPaKyM9R8aJ9iORF/LngRBjvoNnXqnVPGr/jT4MghmoMJmcdUpF+C+G0MaoSZQwtpW4CfUzoQhCQonG/epxUTAUPSx5aAWEdp2Np11wr87pst7sXFPE5+ybzMyEVk7jkL3MxI0sItaTr6ntVLqnbQzqZOUUMNLo16qOMU8X5x3pUEgNXZAgJFuVg4DYQSQO89cl2ntBGFuk2yUaglxFxdYRSMywpEWKRJS51tlDanTEW/IEN1NNL6qrmwu753LviS733Ae6P0Lgzj8sZTibAkWTVgGN7VqcFitXR9VTs9mBhXZV7bL9ljAjtkpu2RXrMmA9dkje2LPhX/eN++nd/Dy1SvMcnbYXHj1/4H6xT4=</latexit>
<latexit sha1_base64="b/lZi7cHtUhY0qgyTwdfMpaH82g=">AAACfnicbVFdSxtBFL1ZW6u2WrWPfRkaLAoad6Ogj1IL9iEPFowKMYTZyU28ZHZ2mbkrCUt+g6/60/w3zsYUmsQLA4dz5n6eONPkOAxfKsHSh4/Ln1ZW1z5/Wd/4urm1fe3S3CpsqlSn9jaWDjUZbDKxxtvMokxijTfx4LzUbx7QOkrNFY8ybCeyb6hHSrKnmnGnqI87m9WwFk5CLIJoCqowjcvOVkXddVOVJ2hYaelcKwozbhfSMimN47W73GEm1UD2seWhkQm6djGZdix2PNMVvdT6Z1hM2P8zCpk4N0pi/zORfO/mtZJ8T2vl3DttF2SynNGot0a9XAtORbm66JJFxXrkgVSW/KxC3UsrFfsDzXSZ1M5QzWxSDHNDKu3iHKt5yFZ60iEnkky5VdEgkw9Fg2L0NzH4T/VlS3n3N/WJ3X7Du2D2LyziYG8hxdsSzZuwCK7rteioVv97XD37NTVoBb7DD9iFCE7gDP7AJTRBAcEjPMFzAMHP4CA4fPsaVKY532AmgtNXo8DFRg==</latexit> <latexit sha1_base64="HFj6g0RuKsIntz/MrjsPqy3QnNo=">AAACfnicbVFdSxtBFL3ZqvX7oz76MhgqChp3tVAfxQr6kAeFRoUYwuzkJl4yO7vM3C0JS35DX9uf1n/T2RjBJF4YOJwz9/PEmSbHYfivEnxaWFz6vLyyura+sbm1vfPlwaW5VdhQqU7tUywdajLYYGKNT5lFmcQaH+P+j1J//IXWUWp+8jDDViJ7hrqkJHuqEbeL81F7uxrWwnGIeRBNQBUmcdfeqajnTqryBA0rLZ1rRmHGrUJaJqVxtPqcO8yk6sseNj00MkHXKsbTjsRXz3REN7X+GRZj9n1GIRPnhknsfyaSX9ysVpIfac2cuxetgkyWMxr12qiba8GpKFcXHbKoWA89kMqSn1WoF2mlYn+gqS7j2hmqqU2KQW5IpR2cYTUP2EpPOuREkim3Kupk8oGoU4z+JgbfVF+2lA+vqUfsjuveBXN8YxH7R3Mp3pZo1oR58HBWi85rZ/ffqpdXE4OWYQ/24RAi+A6XcAt30AAFBL/hD/wNIDgIToLT169BZZKzC1MRXPwHpdfFRw==</latexit>
<latexit sha1_base64="Mrac9AmGg1pDfULu3wtz7vyya0Y=">AAACf3icbVFdSxtBFJ2sbY3a1q9HX4aGYgoh7PqB+ha0oA95sNDEQBLC7M1NHDM7u8zclYQl/8FX/Wf+m87GCCbphYHDOXM/T5goacn3Xwve2qfPX9aLG5tbX799397Z3WvaODWADYhVbFqhsKikxgZJUthKDIooVHgXjq5y/e4RjZWx/kuTBLuRGGo5kCDIUU3oZcen095Oya/6s+CrIJiDEpvHbW+3AJ1+DGmEmkAJa9uBn1A3E4YkKJxudlKLiYCRGGLbQS0itN1sNu6U/3RMnw9i454mPmM/ZmQisnYShe5nJOjeLms5+T+tndLgvJtJnaSEGt4aDVLFKeb57rwvDQKpiQMCjHSzcrgXRgC5Cy10mdVOEBY2ycaplhD3cYlVNCYjHGmRIiF1vlVWlzod87oM0d1E47vqyuZy+bccSrKVurNBV64N4ujXSoqzJVg2YRU0j6rBcfXoz0mpdjk3qMgO2A9WZgE7YzV2w25ZgwF7YE/smb14Be/Qq3r+21evMM/ZZwvhXfwDPbHFhw==</latexit>
<latexit sha1_base64="5PN1cqdjLamdY7CGZ+vd+T/Tydo=">AAACf3icbVFdSxtBFJ2sbY3a1q9HX4aGYgoh7Kr48Ra0oA95sNDEQBLC7M1NHDM7u8zclYQl/8FX/Wf+m87GCCbphYHDOXM/T5goacn3Xwve2qfPX9aLG5tbX799397Z3WvaODWADYhVbFqhsKikxgZJUthKDIooVHgXjq5y/e4RjZWx/kuTBLuRGGo5kCDIUU3oZcen095Oya/6s+CrIJiDEpvHbW+3AJ1+DGmEmkAJa9uBn1A3E4YkKJxudlKLiYCRGGLbQS0itN1sNu6U/3RMnw9i454mPmM/ZmQisnYShe5nJOjeLms5+T+tndLgvJtJnaSEGt4aDVLFKeb57rwvDQKpiQMCjHSzcrgXRgC5Cy10mdVOEBY2ycaplhD3cYlVNCYjHGmRIiF1vlVWlzod87oM0d1E47vqyuZy+bccSrKVurNBV64N4ujXSoqzJVg2YRU0j6rBcfXoz0mpdjk3qMgO2A9WZgE7YzV2w25ZgwF7YE/smb14Be/Qq3r+21evMM/ZZwvhXfwDP8jFiA==</latexit>
<latexit sha1_base64="kaC645jKDfiAyiPu7G+ZAQe5fW0=">AAACf3icbVFdSwJBFB23b/vSeuxlSKKCkF0LqreooB58KEgNTGR2vOrk7OwyczeUxf/Qa/2z/k2ztkFqFwYO58z9PH4khUHX/co5C4tLyyura/n1jc2t7UJxp27CWHOo8VCG+tlnBqRQUEOBEp4jDSzwJTT8wU2qN95AGxGqJxxF0ApYT4mu4AwtVffbiXcxbhdKbtmdBJ0HXgZKJIuHdjHHXzohjwNQyCUzpum5EbYSplFwCeP8S2wgYnzAetC0ULEATCuZjDumB5bp0G6o7VNIJ+zfjIQFxowC3/4MGPbNrJaS/2nNGLsXrUSoKEZQ/KdRN5YUQ5ruTjtCA0c5soBxLeyslPeZZhzthaa6TGpHwKc2SYaxEjzswAwrcYiaWdIABkyodKukKlQ8pFXhg72Jgl/Vlk3lo1vRE2hOqtYGdXKnAQbHcynWFm/WhHlQr5S903Ll8ax0dZ0ZtEr2yD45Ih45J1fknjyQGuHklbyTD/Lp5JxDp+y4P1+dXJazS6bCufwGPavFhw==</latexit>
…
18 vertices
36 vertices
<latexit sha1_base64="0KVnyYv6DtN8OkwP1tQAIjKB8QQ=">AAACfHicbVFdbxJBFB1WWyttLdVHXybSJjStZBeN+kjUxD7wQKN8JEDI3eECE2ZnNzN3DWTDT/BVf5t/xjgLNCnQm0xycs7czxMmSlry/b8F78nTg8NnR8+LxyenL85K5y/bNk6NwJaIVWy6IVhUUmOLJCnsJgYhChV2wtmXXO/8RGNlrH/QIsFBBBMtx1IAOeo7DINhqexX/VXwfRBsQJltojk8L4j+KBZphJqEAmt7gZ/QIANDUihcFvupxQTEDCbYc1BDhHaQrWZd8kvHjPg4Nu5p4iv2YUYGkbWLKHQ/I6Cp3dVy8jGtl9L40yCTOkkJtVg3GqeKU8zzxflIGhSkFg6AMNLNysUUDAhy59nqsqqdoNjaJJunWop4hDusojkZcKRFikDqfKusIXU65w0ZoruJxnvVlc3lylc5kWRvGs4DffPNIM6u9lKcLcGuCfugXasG76q1u/fl+ueNQUfsNXvDKixgH1md3bImazHBJuwX+83+FP55F96193b91Stscl6xrfA+/AdzXMU3</latexit>
Approximation Ratio
'
(.*+
= 2.18
# of c nodes = 41(
# of b nodes = 21(
# of a nodes = 1
Ground truth density: 21
KS-Approx density:
23
(3456
Approx Ratio:
(3456
(
Enlarge the
graph
[Khuller and Saha. On finding dense subgraphs. ICALP. 2009].
7∗.
7∗.
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD
2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
48
Densest Directed Subgraph: An Exact
Algorithm
• (", $)-core: An (S, T)-induced
subgraph:
• Every node in S has outdegree ≥ ".
• Every node in T has indegree ≥ $.
• S and T not necessarily disjoint.
• H = ({a,b}, {c,d}) is a (2, 2)-core.
12/13/22 CUHK-Shenzhen, China
c
a b
d
e
⇤
<latexit sha1_base64="vK5hisxuwaLuWz+t3EWeuy906m4=">AAACfHicbVHZSgMxFE3Hre7boy/BKriWGRX1UVTQhz5U7CLUKpn0toZmMkNyR1qGfoKv+m3+jJipFWzrhcDhnNz1+JEUBl33M+NMTE5Nz2Rn5+YXFpeWV1bXKiaMNYcyD2WoH3xmQAoFZRQo4SHSwAJfQtVvX6V69RW0EaEqYTeCesBaSjQFZ2ip+9LT3vNKzs27/aDjwBuAHBlE8Xk1wx8bIY8DUMglM6bmuRHWE6ZRcAm9ucfYQMR4m7WgZqFiAZh60p+1R7ct06DNUNunkPbZvxkJC4zpBr79GTB8MaNaSv6n1WJsntcToaIYQfGfRs1YUgxpujhtCA0cZdcCxrWws1L+wjTjaM8z1KVfOwI+tEnSiZXgYQNGWIkd1MySBjBgQqVbJQWh4g4tCB/sTRT8qrZsKu9ci5ZAc1CwHqiDGw3Q3h1LsbZ4oyaMg8pR3jvOH92d5C4uBwZlyQbZJDvEI2fkgtySIikTTlrkjbyTj8yXs+XsO4c/X53MIGedDIVz+g1Hc8Ui</latexit>
⇤
<latexit sha1_base64="IfdjkWd9tC1nJRISm8srvbkdDxo=">AAACfHicbVHLSgMxFE3HV32/lm6CVaivMqOiLkUFXXRR0bZCrZJJb2toJjMkd6Rl6Ce41W/zZ8RMrWBbLwQO5+Q+jx9JYdB1PzPOxOTU9Ex2dm5+YXFpeWV1rWLCWHMo81CG+sFnBqRQUEaBEh4iDSzwJVT99mWqV19BGxGqe+xGUA9YS4mm4AwtdXf3tPu8knMLbj/oOPAGIEcGUXpezfDHRsjjABRyyYypeW6E9YRpFFxCb+4xNhAx3mYtqFmoWACmnvRn7dFtyzRoM9T2KaR99m9GwgJjuoFvfwYMX8yolpL/abUYm2f1RKgoRlD8p1EzlhRDmi5OG0IDR9m1gHEt7KyUvzDNONrzDHXp146AD22SdGIleNiAEVZiBzWzpAEMmFDpVklRqLhDi8IHexMFv6otm8r5K9ESaPaL1gO1f60B2jtjKdYWb9SEcVA5LHhHhcPb49z5xcCgLNkgmyRPPHJKzskNKZEy4aRF3sg7+ch8OVvOnnPw89XJDHLWyVA4J99FW8Uh</latexit>
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020].
[Ma et al. On Densest Subgraph Discovery. TODS 2021].
49
Densest Directed Subgraph: Core-Exact
Theorem: The DDS of G is contained in the (
"∗
$ %
,
%⋅"∗
$
)-
core.
• a =
)∗
|+∗|
-- unknown; search through all
,
-
: 0 < 1, 2 ≤ 4.
• 6∗
-- unknown: start with good bounds and use binary search.
• E.g., lower bound = any 2-approx. solution and upper bound = 2 ×
lower bound.
• Still 9(4$
:;%<=>?@) but much faster in practice – smaller flow
graphs.
12/13/22 CUHK-Shenzhen, China
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
50
Densest Directed Subgraph: DC-Exact
• Uses a “divide and conquer” approach.
• For a given
!
"
, result of binary search for “best” (S,T) pair
gives enough info. about subranges of ratios that can be
skipped.
• Algorithm DC-Exact: $ %&'()*+,- , e.g., …
• % ≪ /0
in practice.
12/13/22 CUHK-Shenzhen, China
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
51
Densest Directed Subgraph: Core-Approx
• G[S,T] – (x,y)-core of G. Then ! ", $ ≥ &'.
• Let [&∗
, '∗
] be the max core-number pair, i. e. , it
maximizes &' among all (&, ')-cores.
• !∗
≤ 2 &∗'∗.
• èThe (&∗
, '∗
)-core is a 2-approx. solution to DDS.
12/13/22 CUHK-Shenzhen, China
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
52
Densest Directed Subgraph: Core-Approx
• Naïve implementation: for each !, compute all (!, $)-
cores, 0 < $ < (, and return (!∗
, $∗
)-core
à *(( + + ( ) time.
• Can we do better?
12/13/22 CUHK-Shenzhen, China
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
53
Densest Directed Subgraph: Core-Approx
12/13/22 CUHK-Shenzhen, China
x
8
5
2
7
1
6
y
4
3
7
4 8
2
1 3 6
5
?
<latexit sha1_base64="9Gq6BvRBrxDcJNRZdQw3Wu8S3uk=">AAACfnicbVFdSxtBFL3Zfln7pfXRl8HQYsHGXVvQR7FCfciDhUaFJMjdyU28ZHZ2mbkrCYu/oa/tT+u/6WyMYJJeGDicM/fzpIVhL3H8txE9efrs+Yu1l+uvXr95+25j8/2Fz0unqaNzk7urFD0ZttQRFkNXhSPMUkOX6fhbrV/ekvOc258yLaif4cjykDVKoDo9L+iuN5pxK56FWgXJHDRhHufXmw3dG+S6zMiKNuh9N4kL6VfohLWhu/Ve6alAPcYRdQO0mJHvV7Np79SHwAzUMHfhWVEz9nFGhZn30ywNPzOUG7+s1eT/tG4pw6N+xbYohay+bzQsjZJc1aurATvSYqYBoHYcZlX6Bh1qCQda6DKrXZBe2KSalJZ1PqAl1shEHAbSk2TItt6qarMtJ6rNKYWbWHpQQ9la3j3lEYvfawcX7N53RzT+tJISbEmWTVgFFwet5Evr4MfX5vHJ3KA12IYd2IUEDuEYzuAcOqCB4Rf8hj8RRB+jz9H+/deoMc/ZgoWIjv4B/bfFcQ==</latexit>
?
<latexit sha1_base64="9Gq6BvRBrxDcJNRZdQw3Wu8S3uk=">AAACfnicbVFdSxtBFL3Zfln7pfXRl8HQYsHGXVvQR7FCfciDhUaFJMjdyU28ZHZ2mbkrCYu/oa/tT+u/6WyMYJJeGDicM/fzpIVhL3H8txE9efrs+Yu1l+uvXr95+25j8/2Fz0unqaNzk7urFD0ZttQRFkNXhSPMUkOX6fhbrV/ekvOc258yLaif4cjykDVKoDo9L+iuN5pxK56FWgXJHDRhHufXmw3dG+S6zMiKNuh9N4kL6VfohLWhu/Ve6alAPcYRdQO0mJHvV7Np79SHwAzUMHfhWVEz9nFGhZn30ywNPzOUG7+s1eT/tG4pw6N+xbYohay+bzQsjZJc1aurATvSYqYBoHYcZlX6Bh1qCQda6DKrXZBe2KSalJZ1PqAl1shEHAbSk2TItt6qarMtJ6rNKYWbWHpQQ9la3j3lEYvfawcX7N53RzT+tJISbEmWTVgFFwet5Evr4MfX5vHJ3KA12IYd2IUEDuEYzuAcOqCB4Rf8hj8RRB+jz9H+/deoMc/ZgoWIjv4B/bfFcQ==</latexit>
Candidates
[ ⇤
, ⇤
]
Main idea:
for each ! ≤ #, search for the
largest %;
for each % ≤ #, search for the
largest !;
&( ( ⋅ (* + ()) time.
Max equal pair: (#, #).
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021].
54
Sample Experiment Results: Exact Algorithms
12/13/22 CUHK-Shenzhen, China
Up to 6 orders of magnitude faster
Datasets
MO: (~200, ~2.6K)
TC: (~1.2K, ~2.7K)
OF: (~3K, ~30K)
AD: (~6.4K, ~57K) )
AM: (~400K, ~3.4M)
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020].
55
Sample Experiment Results: Approx Algorithms
12/13/22 CUHK-Shenzhen, China
Up to 6 orders of magnitude faster
Datasets
MO: (~200, ~2.6K)
TC: (~1.2K, ~2.7K)
OF: (~3K, ~30K)
AD: (~6.4K, ~57K) )
AM: (~400K, ~3.4M)
AR: (~3.4M, ~5.8M)
BA: (~2.1M, ~17.8M)
TW: (~52.6M, ~1.96B)
[Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs .
SIGMOD 2020].
[Bahmani, Kumar, Vassilvitskii. Densest Subgraph in Streaming and MapReduce. VLDB 2012].
56
Better Approximation Ratio?
• Propose a new LP formulation for DDS problem
• A divide-and-conquer algorithmic framework
• An efficient (1 + $)-approximation algorithm
• An efficient exact algorithm
• Up to 3 orders of magnitude faster than the state-of-the-
art exact and approximation algorithms
12/13/22 CUHK-Shenzhen, China
Any real positive number
[Ma, Fang, Cheng, L., and Han. A Convex-Programming Approach for Efficient Directed Densest Subgraph Discovery.
SIGMOD 2022].
57
Recent Progress on DDS
• A Concurrent work from SODA2022
• Gives (1 + $)-approximation in &
'(
(
)
) time via network
flow for undirected graphs
• Can also be extended to directed graphs with extra time cost
• It would be interesting to compare two algos empirically
12/13/22 CUHK-Shenzhen, China
[Chekuri, Quanrud, and Torres. “Densest Subgraph: Supermodularity, Iterative Peeling, and Flow.” SODA 2022].
58
Mini Case Study: Covid-19
•Covid-19 Retweets.
1,025,937 retweets involving 660,730 users.
è(660,730 nodes, 835193 edges).
•Largest connected component:
(399,962 nodes, 663,506 edges)
12/13/22 CUHK-Shenzhen, China
Courtesy: Thirumuruganathan, QCRI.
59
Directed Densest Subgraph from Covid-19
12/13/22 CUHK-Shenzhen, China
Source Nodes = 777
Target Nodes = 15
Common Nodes = 2
(5 70)-core.
Density: 55.8826
777 nodes “influenced” by
15 “initiators”.
Vaccine side
effects,
Modes of
Transmission.
60
Mini Case Study II: Nepal Earthquake
12/13/22 CUHK-Shenzhen, China
• Graph constructed from cascades of tweets collected following the Nepal earthquake,
April 2015.
• 265383 nodes.
• 3898972 edges.
• largest connected component:
• 258756 nodes.
• 3771999 edges.
https://zenodo.org/record/2587475#.Ypkxmi-caFg.
Courtesy: Thirumuruganathan, QCRI.
61
Directed Densest Subgraph from Nepal
12/13/22 CUHK-Shenzhen, China
Source Nodes: 122637
Target Nodes: 25233
Common nodes: 20713
(1,51)-core
density: 34.309
Tens of thousands of “initiators”
and more than a hundred thousand of
”influenced”.
Info on damage
and requests for
help.
62
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
•Combating via Mitigation: A Refresher on
Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 63
Propagation/Diffusion Models
12/13/22 CUHK-Shenzhen, China
• How does influence/information
travel in networks?
• Example Phenomena: infection,
product adoption, information,
opinion, rumor, etc.
• Stochastic diffusion models –
discrete/continuous time.
• How can we launch campaigns
to optimize design objectives?
[Kempe,Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
[W. Chen, L., and C. Castillo. Information and Influence Propagation in Social Networks. Morgan-Claypool 2013].
64
Influence Maximization
• Core optimization problem in IM: Given a diffusion model M, a network
G = (V, E), model parameters, and problem parameters (e.g., budget). Find a
seed set under budget that maximizes .
expected number of adopters given
initial adopters S (spread).
S ⇢ V M (S)
12/13/22 CUHK-Shenzhen, China 65
e.g., edge propagation probabilities.
65
Complexity of IM
• Theorem: The IM problem is NP-hard for several major diffusion models
under both discrete time and continuous time.
12/13/22 CUHK-Shenzhen, China
[Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
66
Complexity of Spread Computation
• Theorem: It is #P-hard to compute the expected spread of a node set
under major diffusion models. #simple paths in a digraph.
[Chen, Wang, and Yang. Efficient influence maximization in social networks. KDD 2009].
[Chen, Yuan, and Zhang. Scalable influence maximization in social networks under the linear threshold model.
ICDM 2010].
[W. Chen, L., and C. Castillo. Information and Influence Propagation in Social Networks. Morgan-Claypool 2013].
12/13/22 CUHK-Shenzhen, China
67
Properties of Spread Function
is
monotone: S ✓ S0
=) (S)  (S0
).
(S)
12/13/22 CUHK-Shenzhen, China
68
Properties of Spread Function
is
submodular:
(S)
S ⇢ S0
⇢ V, x 2 V  S0
=)
(x|S0
)  (x|s), where
(x|S) := (S [ {x}) (S).
marginal gain.
12/13/22 CUHK-Shenzhen, China
[Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
69
Approximation of Submodular Function
Maximization
• Theorem: Let be a monotone submodular function, with Let
and resp. be the greedy and optimal solutions. Then
OPT
f : 2V
! R 0 f(;) = 0.
SGrd
S⇤
f(SGrd
) (1
1
e
)f(S⇤
).
[Nemhauser, Woolsey, and Fisher. An analysis of the approximations for maximizing submodular set functions. Math. Prog. 1978].
12/13/22 CUHK-Shenzhen, China
70
Approximation of Submodular Function
Maximization

• Theorem: The spread function is monotone and submodular under
various major diffusion models, for both discrete and continuous time.
(.)
12/13/22 CUHK-Shenzhen, China
[Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003].
71
Baseline Approximation Algorithm
Monte Carlo simulations for estimating
expected spread.
Lazy Forward optimization to save useless
updates.
è Greedy still extremely slow on large networks.
[Leskovec, Krause, Guestarin, Faloutsos, VanBriesen, and N. Glance.
Cost-effective outbreak detection in networks. KDD 2007].
[Kempe, Kleinberg, and Tardos. Maximizing the spread
of influence through a social network. KDD 2003].
12/13/22 CUHK-Shenzhen, China
72
Reverse Influence Sampling
• A series of algorithms that guarantee a
-approximation to the optimal
expected spread.
• Key : use random reverse reachable sets
(rr-sets) to gauge quality of (candidate) seeds.
(1
1
e
✏)
<latexit sha1_base64="AW/ZWNJ71ORm2nTuWljbif+hLkI=">AAACAXicbVBNS8NAEN34WetX1IvgZbEI9dCSVEGPBS8eK9gPaErZbCft0s0m7G6EEuLFv+LFgyJe/Rfe/Ddu2xy09cHA470ZZub5MWdKO863tbK6tr6xWdgqbu/s7u3bB4ctFSWSQpNGPJIdnyjgTEBTM82hE0sgoc+h7Y9vpn77AaRikbjXkxh6IRkKFjBKtJH69nHZrXiBJDR1sxSyigexYjwS53275FSdGfAycXNSQjkaffvLG0Q0CUFoyolSXdeJdS8lUjPKISt6iYKY0DEZQtdQQUJQvXT2QYbPjDLAQSRNCY1n6u+JlIRKTULfdIZEj9SiNxX/87qJDq57KRNxokHQ+aIg4VhHeBoHHjAJVPOJIYRKZm7FdERMHtqEVjQhuIsvL5NWrepeVGt3l6V6PY+jgE7QKSojF12hOrpFDdREFD2iZ/SK3qwn68V6tz7mrStWPnOE/sD6/AGGeJZN</latexit>
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014].
12/13/22
12/13/22 CUHK-Shenzhen, China 73
Reverse Reachable Sets (RR-Sets)
A
B
C
E
D
0.4
0.3
0.6
0.5
0.2
0.3 0.4
• rr-set = sample subgraph of G.
• example of rr-set generation under IC model.
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014].
12/13/22 74
12/13/22 CUHK-Shenzhen, China
Reverse Reachable Sets (RR-Sets)
start from a
random node
A
B
C
E
D
0.4
0.3
0.6
0.5
0.2
0.3 0.4
RR-set = {A}
• rr-set = sample subgraph of G.
• example of rr-set generation under IC model.
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014]
12/13/22 75
12/13/22 CUHK-Shenzhen, China
Reverse Reachable Sets (RR-Sets)
• An RR-set is a subgraph sample of !
• Generation of RR-sets under the IC model:
start from a
random node
A
B
C
E
D
0.4
0.3
0.6
0.5
0.2
0.3 0.4
sample its/their
incoming edges
RR-set = {A, C, B, E}
add the sampled
neighbors
• Intuition:
– An rr-set is a sample set of nodes that can
influence node A
[Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014]
12/13/22 76
12/13/22 CUHK-Shenzhen, China
Influence Estimation with RR-Sets
• Theorem: Pr[S overlaps a random rr-set] =
!
"
× expected spread of S.
• Family of approx. algorithms: TIM, IMM, Stop-
and-Stare, …
[Tang et al., “Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency”, SIGMOD 2014]
[Tang et al., “Influence Maximization in Near-Linear Time: A Martingale Approach”, SIGMOD 2015]
[Chen et al. An issue in the Martingale Analysis of the Influence Maximization Algorithm IMM. arXiv 2018].
[Nguyen et al., “Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks”,
SIGMOD 2016] à arXiv
[K. Huang, S. Wang, G. Bevilacqua, X. Xiao, and L. Revisiting the Stop-and-Stare Algorithms for Influence
Maximization, PVLDB 2017]
12/13/22
12/13/22 CUHK-Shenzhen, China 77
What if objective is not submodular?
12/13/22 CUHK-Shenzhen, China
• Max non-decreasing
non-submodular function.
! "#$%
≥
1
(
1 − e+,-
OPT.
[Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017].
78
What if objective is not submodular?
12/13/22 CUHK-Shenzhen, China
• Max non-decreasing
non-submodular function.
! "#$%
≥
1
(
1 − e+,-
OPT.
[Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017].
79
What if objective is not submodular?
12/13/22 CUHK-Shenzhen, China
• Max non-decreasing
non-submodular function.
! "#$%
≥
1
(
1 − e+,-
OPT.
[Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017].
80
What if the objective is not submodular?
12/13/22 CUHK-Shenzhen, China
to the rescue!
[Lu, Chen, and L. From Competition to complementarity: … Maximization. PVLDB 2016].
• f – monotone but not submodular.
• !, # – monotone and submodular and
! (#) lower (resp. upper) bounds f.
• Let $% ($', $() be the Greedy solution to
max
-⊆/, - 01
2 $ (resp. …) and $34 ∈ {$%, $', $(}
be the best w.r.t. f(.).
Then
81
What if the objective is not submodular?
12/13/22 CUHK-Shenzhen, China
to the rescue!
[Lu, Chen, and L. From Competition to complementarity: … Maximization. PVLDB 2016].
! "#$ ≥ max{
!("+)
-("+)
,
/("0
123
)
!("0
123
)
} ⋅ 1 −
1
8
⋅ ! "0
123
.
OPT.
82
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
•Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 83
Filter Bubbles, Echo Chambers, and Polarization
• Selective exposure to viewpoints/issues can engender/worsen
polarization.
[Pariser. The filter bubble: What the Internet is hiding from you. Penguin, 2011].
[Bakshy, Messing, and Adamic. Exposure to ideologically diverse news and opinion on Facebook. Science 2015].
• Aggravated by echo chambers in social media.
[Garrett. Echo chambers online?: Politically motivated selective exposure among internet news users. JCMC 2009].
[Akoglu. Quantifying political polarity based on bipartite opinion networks. ICWSM 2014].
[Amelkin, Singh, and Bogdanov. A Distance Measure for the Analysis of Polar Opinion Dynamics in Social Networks.
TKDD 2019].
[Chen, Lijffijit,, and De Bie. Quantifying and Minimizing Risk of Conflict in Social Media. KDD 2018].
[Garimella, de Morales, Gionis, and Mathioudakis. Quantifying Controversy over Social Media. TOCS 2018].
12/13/22 CUHK-Shenzhen, China 84
Balancing Exposure by Connections
• Link Recommendation
[Amelkin and A. K. Singh. Fighting opinion control in social networks via link recommendation. KDD
2019].
[Musco, Musco, and Tourakakis. Minimizing polarization and disagreement in social networks. WWW
2018],.
[Zhu, Bao, and Zhang. Minimizing Polarization and Disagreement in Social Networks via Link
Recommendation. NeurIPS 2021].
12/13/22 CUHK-Shenzhen, China 85
Interdisciplinary Approach
• Comprehensive solution goes beyond CS: e.g.,
Polarization Lab https://www.polarizationlab.com
• Interdisciplinary (CS, stats, sociology) approach.
• Real-life experiment by recruiting democrat and republican
volunteers incentivized to follow bots tweeting posts initially
aligned with their ideology but gradually from the other side of
the aisle.
• Complemented with offline tracking and study.
[Bail. Breaking the Social Media Prism. Princeton Univ. Press. 2021].
12/13/22 CUHK-Shenzhen, China 86
Balancing via Information Campaigns
• Smart Algorithm Bursts Social Networks' "Filter
Bubbles"
• “Instead of building echo chambers, Facebook, Twitter and
company can tweak their code to broaden exposure to wider
ranges of views.”
• “… results suggest that targeting a strategic group of social
media users and feeding them the right content is more
effective for propagating diverse views through a social media
network …”
12/13/22 CUHK-Shenzhen, China
[IEEE Spectrum Jan 2021. Featuring research of Aslay, Matakos, Galbrun, and Gionis. TKDE 2020].
87
Balancing via Information Campaigns
• Information Campaign Approach
[Garimella, Gionis, Parotsidis, and Tatti. Balancing information exposure in social networks. NeurIPS
2018].
[Aslay, Matakos, Galbrun, and Gionis. Maximizing the Diversity of Exposure in a Social Network. TKDE
2020].
[Tu, Aslay, and Gionis. Co-exposure maximization in online social networks. NeurIPS 2020].
• Common assumptions:
• awareness = adoption.
• Adoption of opposing views is independent.
12/13/22 CUHK-Shenzhen, China 88
Opinions can have complex interaction
12/13/22 CUHK-Shenzhen, China
Adopted and propagated independently?!
The Liberals claim that … they can
cut Canada’s greenhouse gas
emissions by 40 to 45% below
2005 levels by 2030. They passed
a climate plan, C-12, to set legally
binding emissions targets to reach
net-zero emissions in 2050.
New Democrats supported the
Liberals’ net-zero legislation and
have set an emissions reduction
target of 50 per cent below 2005
levels by 2030.
The Conservatives opposed the
Liberals’ net-zero emissions
legislation and say their climate
plan will meet Paris climate
commitments of 30 per cent below
2005 levels by 2030.
The People’s Party platform argues
that there is “no scientific
consensus” that human activity is
driving climate change and has said
warnings of looming
environmental catastrophe are
exaggerated.
Source: https://newsinteractives.cbc.ca/elections/federal/2021/party-platforms/#section-climate-change
89
Opinions can have complex interaction
12/13/22 CUHK-Shenzhen, China
Pure competition.
The Liberals claim that … they can
cut Canada’s greenhouse gas
emissions by 40 to 45% below
2005 levels by 2030. They passed
a climate plan, C-12, to set legally
binding emissions targets to reach
net-zero emissions in 2050.
New Democrats supported the
Liberals’ net-zero legislation and
have set an emissions reduction
target of 50 per cent below 2005
levels by 2030.
The Conservatives opposed the
Liberals’ net-zero emissions
legislation and say their climate
plan will meet Paris climate
commitments of 30 per cent below
2005 levels by 2030.
The People’s Party platform argues
that there is “no scientific
consensus” that human activity is
driving climate change and has said
warnings of looming
environmental catastrophe are
exaggerated.
90
Opinions can have complex interaction
12/13/22 CUHK-Shenzhen, China
Partial competition.
The Liberals claim that … they can
cut Canada’s greenhouse gas
emissions by 40 to 45% below
2005 levels by 2030. They passed
a climate plan, C-12, to set legally
binding emissions targets to reach
net-zero emissions in 2050.
New Democrats supported the
Liberals’ net-zero legislation and
have set an emissions reduction
target of 50 per cent below 2005
levels by 2030.
The Conservatives opposed the
Liberals’ net-zero emissions
legislation and say their climate
plan will meet Paris climate
commitments of 30 per cent below
2005 levels by 2030.
The People’s Party platform argues
that there is “no scientific
consensus” that human activity is
driving climate change and has said
warnings of looming
environmental catastrophe are
exaggerated.
91
Opinions can have complex interaction
12/13/22 CUHK-Shenzhen, China
Complementation/reinforcement.
The Liberals claim that … they can
cut Canada’s greenhouse gas
emissions by 40 to 45% below
2005 levels by 2030. They passed
a climate plan, C-12, to set legally
binding emissions targets to reach
net-zero emissions in 2050.
New Democrats supported the
Liberals’ net-zero legislation and
have set an emissions reduction
target of 50 per cent below 2005
levels by 2030.
The Conservatives opposed the
Liberals’ net-zero emissions
legislation and say their climate
plan will meet Paris climate
commitments of 30 per cent below
2005 levels by 2030.
The People’s Party platform argues
that there is “no scientific
consensus” that human activity is
driving climate change and has said
warnings of looming
environmental catastrophe are
exaggerated.
92
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
•Mitigating Filter Bubbles
•A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
A useful digression.
12/13/22 CUHK-Shenzhen, China 93
Awareness vs adoption
Higher utility!!
Awareness spreads like epidemic, but adoption depends on UTILITY
[Kalish. A new product adoption model with price advertising and uncertainty, Management Science 1985].
12/13/22 CUHK-Shenzhen, China 94
Complementary (aka Reinforcing) Campaigns
12/13/22 CUHK-Shenzhen, China 95
Welfare Maximization: complementary
setting
• Problem: Given social network G = (V,E), propagation
model, item utility model, and budget vector. Find an
allocation of seed nodes to items that maximizes the
expected social welfare.
Expected sum of utilities of
itemsets adopted by users.
12/13/22 CUHK-Shenzhen, China 96
What does the theory say?
12/13/22 CUHK-Shenzhen, China
[Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019].
97
A simple greedy still works
GREEDY ALGORITHM
Does not require specific
utility-parameters as input
(1 −
$
%
) approximation
12/13/22 CUHK-Shenzhen, China
[Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019].
98
Prefix-preserving seed selection - PRIMA
1 −
#
$
%&'()*+
1 −
#
$
%&'(#
,# ,-
1 −
#
$
%&'(-
,)*+ = max
2
b2
Select enough samples corresponding to every
budget of the budget vector
○ Challenge: The number of samples required is not monotone in
budget
12/13/22 CUHK-Shenzhen, China
[Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019].
99
12/13/22 CUHK-Shenzhen, China
Competing Campaigns
100
Welfare Maximization: competing setting
• Problem: Given social network G = (V,E), propagation
model, item utility model, budget vector, and a fixed
(partial) allocation of seed nodes to items, find an allocation
of seed nodes to items that maximizes the expected
social welfare.
Expected sum of utilities of
itemsets adopted by users.
12/13/22 CUHK-Shenzhen, China 101
How hard is (the) competition?
12/13/22 CUHK-Shenzhen, China
[Banerjee, Chen, and L. Maximizing Social Welfare in a Competitive Diffusion Model. PVLDB 2021].
102
[Banerjee, Chen, and L. Maximizing Social Welfare in a Competitive Diffusion Model. PVLDB 2021].
General case algorithm - SeqGRD
!"
!#
$# $% $"
• Instance dependent approximation :
&'()
&'*+
(- −
-
/
)123
• Sort the items based on their utilities – {$# > $% > ⋯ > $"}
!%
…
… ∑!9
12/13/22 CUHK-Shenzhen, China
$":; = max exp.
utility of any
bundle.
$"9<= exp. min
utility of any item.
PRIMA+.
103
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
•Mitigating Filter Bubbles
• A User Utility Perspective
•A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 104
Filter bubble problem
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
YAY!
NAY!
• Items (opinions) are complementary objective-wise
• Items (opinions) are competing propagation-wise
[Garrett Echo chambers online?: Politically motivated selective exposure among Internet news users. Journal of computer-mediated communication 2009].
[Aslay, Matakos, Galbrun, and Gionis. Maximizing the Diversity of Exposure in a Social Network. TKDE 2020].
12/13/22 CUHK-Shenzhen, China 105
105
Problem: Key Ingredients
§Competition parameter
§ After being influenced, adopt the second item w.p. = !, 0 ≤ ! < 1
§(Host’s) Reward of adoption is supermodular, models
complementarity
§ &, for the first item
§ & + Δ, for the second item, & < Δ
§Expected (host) utility for user adopting both & + !Δ
§Goal is to maximize the sum of utilities under a competition-
driven diffusion
12/13/22 CUHK-Shenzhen, China 106
[Banerjee. Welfare maximization… influence. PhD Thesis. UBC. 2022].
106
Filter bubble mitigation
• There is an existing bubble
• A more general setting
Item A
Problem FB Mitigation (FBM): Given graph ! = #, %, & ,
competition parameter ', 0 < ' < 1, fixed A seeds +,, and
budget -, find B seeds +., such that +. ≤ - and the
expected welfare is maximized.
12/13/22 CUHK-Shenzhen, China
107
Inherent Challenges – Strike One
• FBM is neither monotone nor submodular.
• Restricted (sequential) setting: propagation of follower
doesn’t start before that of leader ends. FBM in the
sequential setting is monotone and submodular! J
• But wait! FBM can be arbitrarily worse than FBM$%& and
vice versa! L
12/13/22 CUHK-Shenzhen, China 108
Another Attempt
12/13/22 CUHK-Shenzhen, China
Item A
First
Level
Competition
Item B
• Expected reward at each FLC node = ! + #Δ.
Surrogate objective: Expected # FLC nodes ×
(! + #Δ).
• Clearly a lower bound for FBM.
• But the FLC objective is neither monotone
nor submodular.
109
Algorithm 1 – SPReadGRD
• Greedily selects B seeds that maximize the marginal
spread
• Ignore the welfare objective
• PRIMA+ is used to do the seed selection
• Given fixed !"
, PRIMA selects !#
, such that
• %(!#
∪ !"
) = 1 −
,
-
− . %(!#∗
∪ !"
)
12/13/22 CUHK-Shenzhen, China 110
110
Analyzing SpreadGRD
• Given !, for the welfare function # the following holds:
• $% ! ≤ # ! ≤ $ + (Δ %(!)
• SPRGRD therefore has the following bound:
# !,
∪ !.
≥ $ ⋅ % !,
∪ !.
≥ $ ⋅ 1 −
1
3
− 4 ⋅ % !,
∪ !∗
≥
$
(Δ + $
(1 −
1
3
− 4)#(!,
∪ !∗
)
12/13/22 CUHK-Shenzhen, China 111
111
Algorithm 2 – Sandwich
• Assume a tattler diffusion model
• A node influences its neighbors, with every item in the
awareness set
• !" # ≥ !(#)
• !"(⋅) is monotone and submodular
12/13/22 CUHK-Shenzhen, China 112
112
Algorithm 2 – Sandwich
• Assume a tattler diffusion model
• !" # ≥ !(#)
• Assume diffusion model with ' = )
• !* # ≤ !(#)
• !*(⋅) is monotone and submodular
12/13/22 CUHK-Shenzhen, China 113
113
Algorithm 2 – Sandwich
• Assume a tattler diffusion model
• !" # ≥ !(#)
• Assume diffusion model with ' = )
• !* # ≤ !(#)
• Using sandwich
• Let #,-./ = 0123045678∈ 5:,5,5<
!(#,=>)
• ! #,-./ ≥ max
B 5<
B< 5<
,
B: 5∗
B 5∗ 1 −
F
G
!(#∗
)
12/13/22 CUHK-Shenzhen, China 114
114
Algorithm 3 - NetRewGRD
Item A
Item B
First
Level
Competition
• Extends state of the sampling for
welfare objective
• Reverse reachable trees
• Recursive weight update using a
linear pass
• Scales for large networks
12/13/22 CUHK-Shenzhen, China
[Banerjee. Welfare maximization… influence. PhD Thesis. UBC. 2022].
115
Experiments
• Baselines considered:
• COEX: Maximizes co-adoptions of both items
• TDEM: Maximizes welfare based on leaning scores
[Tu, Aslay, and Gionis. Co-exposure maximization in online social networks. NeurIPS 2020].
[Aslay, Matakos, Galbrun, and Gionis. "Maximizing the diversity of exposure in a social network. TKDE 2020]
12/13/22 CUHK-Shenzhen, China
116
Sample of Results - Quality
12/13/22 CUHK-Shenzhen, China 117
117
Sample of Results – Running Time
12/13/22 CUHK-Shenzhen, China 118
118
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
•Mitigating Misinformation
• Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 119
Misinformation Mitigation – Prior Art
• Influence Blocking
• Temporal aspects ignored or not differentiated
• Focus on scalability
[Ceren, Agrawal, and El Abbadi. "Limiting the spread of misinformation in social networks." WWW
2011],
[He, Song, Chen, and Jiang. Influence blocking maximization in social networks under the competitive
linear threshold model. SDM 2012],
[Song,, Hsu, and Lee. Temporal influence blocking: Minimizing the effect of misinformation in social
networks. ICDE 2017],
[Tong,Wu, Guo et al. An efficient randomized algorithm for rumor blocking in online social
networks." IEEE TNSE 2017],
[Tong, Du, and Wu. On misinformation containment in online social networks. NeurIPS 2018],
[Simpson, Srinivasan, and Thomo. Reverse Prevention Sampling for Misinformation Mitigation in
Social Networks. ICDT 2020].
12/13/22 CUHK-Shenzhen, China 120
Temporal Aspects of Propagation
[Vosoughi, Roy, and Aral. The spread of true and false news online. Science 2018]
Together these have important consequences for effective seed set selection
[Mitchell, Stocking, and Matsa. Long-form reading shows signs of life in our mobile news world. Pew
Research Center 2016]
Misinformation spreads faster, farther, and wider than truth! Adoption decisions
have varying lengths
12/13/22 CUHK-Shenzhen, China 121
Temporal Aspects of Propagation
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022]
• Associate meeting probabilities with each edge
• User reaction times sampled from a data-driven distribution
t = 0 t = 2 t = 3 t = 6
12/13/22 CUHK-Shenzhen, China
Adoption decisions of !", !$, !%, !&, !' uncontested.
!( faces a tie; broken with a random permutation, e.g., !', !" .
F->3.
DW: [3,6].
M->4.
Tie!
122
Misinformation Mitigation Problem
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB
2022]
Reward function !(⋅) measures effectiveness of mitigation
P1 is not submodular!
P1: Given fake seeds %& and reward function !(⋅),
find a seed set that maximizes the expected reward
12/13/22 CUHK-Shenzhen, China
Truth reaches well
before misinfo.
Truth arrives too late!
123
Sandwiching the Mitigation Objective
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022]
Observe: Supermodular behavior arises due to joint effect of mitigation seeds, i.e. acting
alone they would not achieve the same reward.
LB: Maximum reward over singleton seed sets from !" (tight).
!" = {%&, %(}
LB = *+,
-∈{/0,/1}
2(%4, {5})
12/13/22 CUHK-Shenzhen, China 124
Sandwiching the Mitigation Objective
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022]
Simple Candidate: drop meeting events and enforce dominant tie-breaking.
Tighter UB: remove meeting events on edges that can be traversed by both sides.
!" = {%&, %(}
12/13/22 CUHK-Shenzhen, China 125
Importance Sampling
[M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022]
Observe: only nodes reached
by the misinformation are
eligible for reward.
Idea: only sample roots from
nodes that misinfo campaign
reaches → tighter bounds!
RDR sets: weighted analog to
RR sets for reward probabilities
12/13/22 CUHK-Shenzhen, China 126
Experiments
[M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022]
Two settings for selecting misinformation seeds: (1) from top-k influential users and (2) uniformly at random
12/13/22 CUHK-Shenzhen, China
Small # popular instigators. Several bots or newly created puppet accounts.
127
Experiments
[M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022]
Reward distribution dominated by uncontested mitigation adoption
12/13/22 CUHK-Shenzhen, China 128
Experiments
[M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022]
Mitigation seeds remain effective under simultaneous perturbation of model parameters.
12/13/22 CUHK-Shenzhen, China 129
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
•Misinformation Intervention
• Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 130
Intervention Challenges
Detectors are fallible
Hard vs Soft intervention
12/13/22 CUHK-Shenzhen, China 131
Misinformation Intervention – Prior Art
• Disadvantaging posts with misleading info, deleting
edges, removing nodes, … à too hard?
• No correction for wrong intervention!
[Farajtabar, Mehrdad, et al. Fake news mitigation via point process based intervention. ICML 2017],
[Tong et al. Gelling, and melting, large graphs by edge manipulation. CIKM 2012],
[Khalil, Boutros, Dilkina, and Song. "Scalable diffusion-aware optimization of network topology KDD 2014],
[Chen, Chen, et al. "Node immunization on large graphs: Theory and algorithms." TKDE 2015],
[Medya,, Silva, and Singh. "Approximate Algorithms for Data-driven Influence Limitation." TKDE 2020],
[Caraban et al. "23 ways to nudge: A review of technology-mediated nudging in human-computer
interaction." SIGCHI 2019],
[Caraban, Konstantinou, and Karapanos. "The Nudge Deck: A design support tool for technology-mediated
nudging." ACM Designing Interactive Systems Conference. 2020],
[Bhuiyan et al. "NudgeCred: Supporting News Credibility Assessment on Social Media Through Nudges." CSCW2
2021].
12/13/22 CUHK-Shenzhen, China 132
Cost Aware Intervention
[Thirumuruganathan, Simpson, L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
12/13/22 CUHK-Shenzhen, China 133
Reward Function
!"
#$%
− reach of item '" after intervention.
!"
$()#$%
− reach of item '" w/ no intervention.
12/13/22 CUHK-Shenzhen, China 134
[Thirumuruganathan, Simpson, L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
Cost Aware Intervention
12/13/22 CUHK-Shenzhen, China 135
dEFEND [Shu et al. KDD 2019].
Marked Hawke Process [Mishra et al. CIKM 2016].
Experiments
[Thirumuruganathan, Simpson, and L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
NCB-TS: Neural Contextual Bandits w/ Thompson Sampling
CB-TS: Contextual Bandits w/ Thompson Sampling
RB: (Learned) Rule based
CSC: Cost Sensitive Classification
12/13/22 CUHK-Shenzhen, China 136
Experiments
[Thirumuruganathan, Simpson, and L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021]
Real-time Evaluation from Twitter’s stream during 10-Oct-2020 to 10-Nov-2020.
• 5 million tweets w/ 1800 distinct English news articles
• Topics include Politics (32%), Healthcare (26%), Entertainment (30%), Misc. (12%)
Manual Evaluation
• Random sample of 750 viral and non-viral
tweets
• 3 volunteers evaluated intervention
• Accuracy of 92.1%
Automated Evaluation
• Google FactCheck Claim Search API
• TiKL: That is a Known Lie
• Accuracy of 96.6%
12/13/22 CUHK-Shenzhen, China 137
• Filter Bubbles and Echo Chambers
• Misinformation
• Detecting Densest Subgraphs – Undirected
• Detecting Densest Subgraphs – Directed
• Combating via Mitigation: A Refresher on Influence Maximization
• Mitigating Filter Bubbles
• A User Utility Perspective
• A Network Host Utility Perspective
• Mitigating Misinformation
• Misinformation Intervention
•Summary & Open Questions
12/13/22 CUHK-Shenzhen, China 138
Summary
• Efficient detection of dense subgraphs in undirected and
directed graphs is useful for finding filter bubbles and groups
of actors engaged in spreading misinformation.
• In mitigating filter bubbles via information campaigns,
competition between viewpoints/opinions cannot be ignored.
• In mitigating misinformation, it’s critical to incorporate
temporal aspects.
• In misinformation intervention, it’s important to watch your
step and correct your gait in the face of mistakes.
12/13/22 CUHK-Shenzhen, China 139
Open Questions – Detection
• Integrating content analysis in going after the “right”
densest subgraphs.
• Can we detect filter bubbles and groups promoting
misinformation as they form?
• Longitudinal: (how) do these groups transform over time?
12/13/22 CUHK-Shenzhen, China 140
Open Questions – Countering
• Multiple campaigns of items involving partial/pure
competition, complementation?
• How can we learn propagation probabilities, competition
parameters, utilities from available propagation traces?
• Go beyond expected outcome? E.g., as filter bubbles or
misinformation spreading occur, can we counter them?
12/13/22 CUHK-Shenzhen, China 141
Open Questions --
• Case studies reflecting the effect of mitigation campaigns on
filter bubbles and misinformation diffusion.
• Integrating with claim verification and (computational) fact
checking efforts.
• Incentivizing balance of adoption (in case of filter bubbles)
and adoption of truth (in case of misinformation).
12/13/22 CUHK-Shenzhen, China 142
Acknowledgments
12/13/22 CUHK-Shenzhen, China
Chenhao Ma Farnoosh Hashemi Glenn Bevilacqua Michael Simpson
HKU UBC UBC->Oracle UBC
Prithu Banerjee Reynold Cheng Saravanan Thirimuruganathan Xiaolin Han
UBC ->Oracle HKU QCRI, HBKU HKU
Xuemin Lin Wenjie Zhang Yixiang Fang Wei Chen Wei Lu
UNSW UNSW CUHK MSRA UBC→LinkedIn
143
12/13/22 CUHK-Shenzhen, China
ந"
றி!
144

More Related Content

Similar to cuhk-fb-mi-talk.pdf

Collective Intelligence and Online Deliberation Platforms for Citizen Engagem...
Collective Intelligence and Online Deliberation Platforms for Citizen Engagem...Collective Intelligence and Online Deliberation Platforms for Citizen Engagem...
Collective Intelligence and Online Deliberation Platforms for Citizen Engagem...Anna De Liddo
 
An Introduction to Network Theory
An Introduction to Network TheoryAn Introduction to Network Theory
An Introduction to Network TheorySocialphysicist
 
Social Media: Digital Content Creation & Sharing - Symposium Nov 2010
Social Media:  Digital Content Creation & Sharing -  Symposium Nov 2010Social Media:  Digital Content Creation & Sharing -  Symposium Nov 2010
Social Media: Digital Content Creation & Sharing - Symposium Nov 2010Middlesex University
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceUniversity of Washington
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)Han Woo PARK
 
Citation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docx
Citation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docxCitation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docx
Citation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docxrichardnorman90310
 
Blockchainified Science - Meetup#1
Blockchainified Science - Meetup#1Blockchainified Science - Meetup#1
Blockchainified Science - Meetup#1Soenke Bartling
 
Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...
Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...
Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...Marc Smith
 
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...Daniel Katz
 
Asia Triple Helix Society Summer Seminar/Conference Proceedings
Asia Triple Helix Society Summer Seminar/Conference ProceedingsAsia Triple Helix Society Summer Seminar/Conference Proceedings
Asia Triple Helix Society Summer Seminar/Conference ProceedingsHan Woo PARK
 
Online social network mining current trends and research issues
Online social network mining current trends and research issuesOnline social network mining current trends and research issues
Online social network mining current trends and research issueseSAT Publishing House
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115Divita Madaan
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Axel Bruns
 
A final submision by todd vatalaro
A final submision by todd vatalaroA final submision by todd vatalaro
A final submision by todd vatalaroTodd Vatalaro
 
New Metrics for New Media Bay Area CIO IT Executives Meetup
New Metrics for New Media Bay Area CIO IT Executives MeetupNew Metrics for New Media Bay Area CIO IT Executives Meetup
New Metrics for New Media Bay Area CIO IT Executives MeetupTatyana Kanzaveli
 

Similar to cuhk-fb-mi-talk.pdf (20)

Collective Intelligence and Online Deliberation Platforms for Citizen Engagem...
Collective Intelligence and Online Deliberation Platforms for Citizen Engagem...Collective Intelligence and Online Deliberation Platforms for Citizen Engagem...
Collective Intelligence and Online Deliberation Platforms for Citizen Engagem...
 
An Introduction to Network Theory
An Introduction to Network TheoryAn Introduction to Network Theory
An Introduction to Network Theory
 
Social Media: Digital Content Creation & Sharing - Symposium Nov 2010
Social Media:  Digital Content Creation & Sharing -  Symposium Nov 2010Social Media:  Digital Content Creation & Sharing -  Symposium Nov 2010
Social Media: Digital Content Creation & Sharing - Symposium Nov 2010
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data science
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)
 
Citation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docx
Citation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docxCitation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docx
Citation Hisseine, M.A.; Chen, D.;Yang, X. The Applicatio.docx
 
Blockchainified Science - Meetup#1
Blockchainified Science - Meetup#1Blockchainified Science - Meetup#1
Blockchainified Science - Meetup#1
 
Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...
Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...
Autobiography, Mobile Social Life-Logging and the Transition from Ephemeral t...
 
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
Network Society & Open Government
Network Society & Open GovernmentNetwork Society & Open Government
Network Society & Open Government
 
2014 ATHS Summer
2014 ATHS Summer2014 ATHS Summer
2014 ATHS Summer
 
SSRN-id2242623
SSRN-id2242623SSRN-id2242623
SSRN-id2242623
 
Asia Triple Helix Society Summer Seminar/Conference Proceedings
Asia Triple Helix Society Summer Seminar/Conference ProceedingsAsia Triple Helix Society Summer Seminar/Conference Proceedings
Asia Triple Helix Society Summer Seminar/Conference Proceedings
 
Online social network mining current trends and research issues
Online social network mining current trends and research issuesOnline social network mining current trends and research issues
Online social network mining current trends and research issues
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...
 
Open Notebook Science
Open Notebook ScienceOpen Notebook Science
Open Notebook Science
 
A final submision by todd vatalaro
A final submision by todd vatalaroA final submision by todd vatalaro
A final submision by todd vatalaro
 
New Metrics for New Media Bay Area CIO IT Executives Meetup
New Metrics for New Media Bay Area CIO IT Executives MeetupNew Metrics for New Media Bay Area CIO IT Executives Meetup
New Metrics for New Media Bay Area CIO IT Executives Meetup
 

More from Laks Lakshmanan

Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...Laks Lakshmanan
 
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the ObviousBig-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the ObviousLaks Lakshmanan
 
Big-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsBig-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsLaks Lakshmanan
 
Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivLaks Lakshmanan
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiLaks Lakshmanan
 
Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiLaks Lakshmanan
 
Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iLaks Lakshmanan
 

More from Laks Lakshmanan (9)

Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
Combating Fake News: Combating Fake News: A Data Management and Mining Perspe...
 
SDM 2019 Keynote
SDM 2019 KeynoteSDM 2019 Keynote
SDM 2019 Keynote
 
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the ObviousBig-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
 
Big-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsBig-O(Q) Social Network Analytics
Big-O(Q) Social Network Analytics
 
Pro max icdm2012-slides
Pro max icdm2012-slidesPro max icdm2012-slides
Pro max icdm2012-slides
 
Kdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-ivKdd12 tutorial-inf-part-iv
Kdd12 tutorial-inf-part-iv
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iii
 
Kdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-iiKdd12 tutorial-inf-part-ii
Kdd12 tutorial-inf-part-ii
 
Kdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-iKdd12 tutorial-inf-part-i
Kdd12 tutorial-inf-part-i
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

cuhk-fb-mi-talk.pdf

  • 1. On A Quest for Combating Filter Bubbles and Misinformation Laks V.S. Lakshmanan University of British Columbia Vancouver, BC, Canada
  • 2. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 12/13/22 CUHK-Shenzhen, China 2
  • 3. Prolegomenon • What this talk is not about and will not do for you. • Classify different kinds of “fake news”: e.g., mis/disinformation ... • Computational Fact Checking or Claim Verification • Offer a comprehensive solution to the filter bubble/echo chambers or “fake news” problems. • The scope of both stretch beyond just tech (e.g., models and algorithms). • Even the “tech-restricted” versions we won’t get to completely solve today (in this talk). 12/13/22 CUHK-Shenzhen, China 3
  • 4. Prolegomenon • Instead, we will examine some (necessarily restricted) models and formulations of problems. • Offer a view of how research done in some different contexts may inspire techniques for solving restricted versions of the filter bubbles / echo chambers and the misinformation problems. • In case I missed your work, … 12/13/22 CUHK-Shenzhen, China 4
  • 5. Not long ago, or maybe long ago … 12/13/22 CUHK-Shenzhen, China 5
  • 6. And then came … 12/13/22 CUHK-Shenzhen, China but arguably also these … Which led to many great things 6
  • 8. •Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 12/13/22 CUHK-Shenzhen, China 8
  • 9. ["Political polarization 1994-2017." Pew Research Center., Washington, DC October 2017]. Filter Bubble and Echo Chambers exacerbate polarization 12/13/22 CUHK-Shenzhen, China 9
  • 10. Filter Bubble and Echo Chambers exacerbate polarization ["Political polarization 1994-2017." Pew Research Center., Washington, DC October 2017]. 12/13/22 CUHK-Shenzhen, China 10
  • 11. Political Echo Chambers ● Members of densely connected groups are likely to have the same opinions and attitudes. ● Study focus on opposing political echo chambers (~250K each) on Twitter in Japan. ● Political echo chambers have denser and more core-periphery information spreading structures than those of most other communities. 12/13/22 CUHK-Shenzhen, China [Asatani et al. Dense and influential core promotion of daily viral information spread in political echo chambers. Scientific Reports 2021]. 11
  • 12. The Price of Filter Bubbles • Filter bubbles and echo chambers can impede natural opinion formation [Musco, Musco, and Tourakakis. Minimizing polarization and disagreement in social networks. WWW 2018]. • Can lead to one-sided policy decisions [Perrone and Wieder. Pro-painkiller echo chamber shaped policy amid drug epidemic. The Center for Public Integrity, 2016]. • And erosion of societal trust [Nguyen. Echo chambers and epistemic bubbles. Episteme, 2020]. 12/13/22 CUHK-Shenzhen, China 12
  • 13. • Filter Bubbles and Echo Chambers •Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 12/13/22 CUHK-Shenzhen, China 13
  • 14. Misinformation is Not a New Problem 12/13/22 CUHK-Shenzhen, China 14
  • 15. Economic Cost of Misinformation 12/13/22 CUHK-Shenzhen, China 15
  • 16. Economic Impact of Misinformation 12/13/22 CUHK-Shenzhen, China FAKE NEWS: ELECTIONS THE U.S. TO SPEND $200 MILLION ALONE ADVANCING FAKE NEWS $400 MILLION SPENT GLOBALLY ON FAKE POLITICAL NEWS COVID-19 Vaccine Misinformation and Disinformation Costs an Estimated $50 to $300 Million Each Day [Bruns, Hosangadi, Trotochaud, and Sell. Johns Hopkins Center for Health Security. 2021]. [U. of Baltimore and CHEQ. The economic cost of bad actors on the internet. Fake News 2019]. 16
  • 17. Misinformation Propagation (US Politics) ● The connections between misinformation spreaders are denser than connections between fact-checkers. ● Increasing the value of k takes us from the periphery to the denser inner core structure. 12/13/22 CUHK-Shenzhen, China k-Core decomposition of the pre-Election retweet network. Orange = fact- checks and purple = claims. [Shao, Hui, Wang et al. Anatomy of an online misinformation network. PLoS ONE 2018]. 18
  • 18. Misinformation Propagation + Bubbles (Covid-19) ● Echo-chambers with misinformed sub-communities are much denser than those with informed sub-communities. 12/13/22 CUHK-Shenzhen, China [Memon and Carley. Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset. CEUR Workshop 2020]. (a) Retweet (b) Mention (c) Reply (d) Retweet+Mention+Reply 19
  • 19. • Filter Bubbles and Echo Chambers • Misinformation •Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 12/13/22 CUHK-Shenzhen, China 20
  • 20. Densest Subgraphs: Undirected • What is a good notion of density? • Classical: average degree: ! " = $ % . • Average #motifs/vertex: ' ", Ψ = * +,, % . '-./ − optimal density. • E.g., Δ-density. • More generally, Ψ-density for pattern Ψ (e.g., h-clique). • Intuition: densest subgraphs may indicate echo chambers. 12/13/22 CUHK-Shenzhen, China #instances of Ψ (motif) in G. 21
  • 21. Different notions of density. 12/13/22 CUHK-Shenzhen, China -densest subgraph. density = 11/7. -densest subgraph. density = 2/4. • Clique-density. • Pattern-density. 22
  • 22. k-cores and k-clique-cores 12/13/22 CUHK-Shenzhen, China 3 2 1, 0 <latexit sha1_base64="h+a8v17wh/Dw4VGrNfEaZ6hpP7Q=">AAAB9XicbVDLSgNBEJyNrxhfUY9eBoPgxbArQT0GvHiMYB6QrGF20kmGzGOZmVXDkv/w4kERr/6LN//GSbIHTSxoKKq66e6KYs6M9f1vL7eyura+kd8sbG3v7O4V9w8aRiWaQp0qrnQrIgY4k1C3zHJoxRqIiDg0o9H11G8+gDZMyTs7jiEUZCBZn1FinXQ/6ohIPaVnE6o0mG6x5Jf9GfAyCTJSQhlq3eJXp6doIkBayokx7cCPbZgSbRnlMCl0EgMxoSMygLajkggwYTq7eoJPnNLDfaVdSYtn6u+JlAhjxiJynYLYoVn0puJ/Xjux/aswZTJOLEg6X9RPOLYKTyPAPaaBWj52hFDN3K2YDokm1LqgCi6EYPHlZdI4LwcX5cptpVStZHHk0RE6RqcoQJeoim5QDdURRRo9o1f05j16L9679zFvzXnZzCH6A+/zB+i2kr8=</latexit> k-cores <latexit sha1_base64="9hFQByLqhvsg+DyYKvGpEOrpZNE=">AAACBHicbVDLSgNBEJyNrxhfUY+5DAYhgoZdCeox4MVjBPOA7BJmJ51kyOzMMjMrhiUHL/6KFw+KePUjvPk3Th4HTSxoKKq66e4KY860cd1vJ7Oyura+kd3MbW3v7O7l9w8aWiaKQp1KLlUrJBo4E1A3zHBoxQpIFHJohsPrid+8B6WZFHdmFEMQkb5gPUaJsVInXygNT7FvFCOiz+HEj0L5kJ6NqVSgO/miW3anwMvEm5MimqPWyX/5XUmTCIShnGjd9tzYBClRhlEO45yfaIgJHZI+tC0VJAIdpNMnxvjYKl3ck8qWMHiq/p5ISaT1KAptZ0TMQC96E/E/r52Y3lWQMhEnBgSdLeolHBuJJ4ngLlNADR9ZQqhi9lZMB0QRamxuORuCt/jyMmmcl72LcuW2UqxW5nFkUQEdoRLy0CWqohtUQ3VE0SN6Rq/ozXlyXpx352PWmnHmM4foD5zPHyBll8E=</latexit> (k, 4)-cores 0 1 2, 3 (", $)-core of G – maximal subgraph where each vertex participates in ≥ ' instances of Ψ. 23
  • 23. Densest Subgraph Discovery 12/13/22 CUHK-Shenzhen, China Problem: Given a graph G(V, E) and an h-clique Ψ "#, %# , find the subgraph D with the highest h-clique density & ', Ψ . Ψ can be any pattern: e.g., a 3-star, Δ, etc. Focus of this talk: h-cliques. 24
  • 24. SOTA1 : Densest Subgraph Discovery: Exact • Binary search to guess the density • Construct the flow network • Based on guessed density and original graph • Use max-flow algorithm to check the feasibility • Example: ! = 0, % = 1 (max triangle deg) • α= (l+r)/2=0.5. • Run time: ' ( ) * − 1 ℎ − 1 + ) Λ + min ), Λ 2 . 1 As of 2017. 12/13/22 CUHK-Shenzhen, China [Mitzenmacher, Pachocki, Peng, Tourakakis, and Xu. Scalable large near-clique detection in large-scale networks via sampling. KDD 2015]. #instances of Ψ. ⇒⇒ 25
  • 25. A DS Discovery – A Triangle Example 12/13/22 CUHK-Shenzhen, China B C D s t Ψ" Ψ# Ψ$ Ψ% 0 1 1 1 3& 3& 3& 3& +∞ +∞ +∞ +∞ +∞ +∞ +∞ +∞ 1 1 1 Flow network. If ) = 0.5 If ) = 1/3 ⇐ 26
  • 26. SOTA1 Densest Subgraph Discovery: Approximation • Approximation algorithm: PeelApp • Iteratively peel the vertex w/ smallest h-clique-degree. • Let !", !$, … be the list of residual subgraphs generated. • Return !& with the highest density. • Approximation: • The density of S is at least " '( ⋅ *+,- = " / ⋅ *012. • Running time: time. 12/13/22 CUHK-Shenzhen, China <latexit sha1_base64="iHkLEsdke5bqZTUfsJFWe3g6ats=">AAACBHicbVDLSsNAFJ34rPUVddnNYBHqoiWRoi5cFNy4s4J9QBPKZDJph05mwsxEKKELN/6KGxeKuPUj3Pk3TtsstPXAhcM593LvPUHCqNKO822trK6tb2wWtorbO7t7+/bBYVuJVGLSwoIJ2Q2QIoxy0tJUM9JNJEFxwEgnGF1P/c4DkYoKfq/HCfFjNOA0ohhpI/Xt0m2FezgUGmZh1fXwUAhF4LDqTk5h3y47NWcGuEzcnJRBjmbf/vJCgdOYcI0ZUqrnOon2MyQ1xYxMil6qSILwCA1Iz1COYqL8bPbEBJ4YJYSRkKa4hjP190SGYqXGcWA6Y6SHatGbiv95vVRHl35GeZJqwvF8UZQyqAWcJgJDKgnWbGwIwpKaWyEeIomwNrkVTQju4svLpH1Wc89r9bt6uXGVx1EAJXAMKsAFF6ABbkATtAAGj+AZvII368l6sd6tj3nripXPHIE/sD5/AEI0lo0=</latexit> O(n · ✓ d 1 h 1 ◆ ) [Tsourakakis. The k-clique densest subgraph problem. WWW 2015]. 1 As of 2017. 27
  • 27. DSD: SOTA Limitations • Initial bounds on ! not tight. • Size of flow network can be large: e.g., large G with many instances of Ψ. • Flow network built from original G each time. • Even PeelApp does redundant work. 12/13/22 CUHK-Shenzhen, China $, Ψ -core to the rescue! Can we “bound” the densest subgraph? 28
  • 28. Bounding Densest Subgraphs with Cores • Theorem: G, k, Ψ as before. H a (#, Ψ)-core of G. Then: # &' ≤ ) *, Ψ ≤ #+,-. Special case: #+,--core has density in /012 3 , #+,- . 12/13/22 CUHK-Shenzhen, China [Fang, Yu, Cheng, L., and Lin. PVLDB 2019]. h 29
  • 29. Bounding DSG with cores: An Example 12/13/22 CUHK-Shenzhen, China For !"#$ = 2 and a 2-core, LB = 1 and UB = 2. ' = 1. ' = 5 4 , 9 6 , 13 8 , ⋯ → 2. [Fang, Yu, Cheng, L., and Lin. PVLDB 2019]. 30
  • 30. Bounding Densest Subgraphs with Cores • Lemma: The DSG of G must be contained in its (⌈#$%&⌉, Ψ)-core. 12/13/22 CUHK-Shenzhen, China [Fang, Yu, Cheng, L., and Lin. PVLDB 2019]. 31
  • 31. Exact algorithm: CoreExact • Our algorithm: CoreExact • Follow the same framework as existing exact algorithm • Three core-based optimization techniques • Binary search to guess the density • Construct the flow network • Based on guessed density and original graph • Use max-flow algorithm to check the feasibility 12/13/22 CUHK-Shenzhen, China 1. Tighter bounds derived from cores [ "#$% &' , )*+,] 2. Build the flow network on cores 3. Locate Clique-densest subgraph in even smaller cores after each checking [Fang, Yu, Cheng, L., and Lin. PVLDB 2019]. 32
  • 32. Approximation Algorithms • IncApp: • Do a (", Ψ)-core decomposition of G. time. • Return the ("&'(, Ψ)-core. • ) |+,| = ) . -approximation. • Finding (repeatedly) clique-degree can be expensive for large cliques. • CoreApp: Heuristic to directly find ("&'(, Ψ)-core. 12/13/22 CUHK-Shenzhen, China <latexit sha1_base64="ojo/HvrAsrswEIka12R2Rr1XIFU=">AAACBHicbVDLSsNAFJ3UV62vqMtuBotQFy2JFHVZcOPOCvYBTSiTyaQdOpkJMxOhhC7c+CtuXCji1o9w5984bbPQ1gMXDufcy733BAmjSjvOt1VYW9/Y3Cpul3Z29/YP7MOjjhKpxKSNBROyFyBFGOWkralmpJdIguKAkW4wvp753QciFRX8Xk8S4sdoyGlEMdJGGtjl2yr3cCg0zMKa6+GREIrAUc2dnsGBXXHqzhxwlbg5qYAcrYH95YUCpzHhGjOkVN91Eu1nSGqKGZmWvFSRBOExGpK+oRzFRPnZ/IkpPDVKCCMhTXEN5+rviQzFSk3iwHTGSI/UsjcT//P6qY6u/IzyJNWE48WiKGVQCzhLBIZUEqzZxBCEJTW3QjxCEmFtciuZENzll1dJ57zuXtQbd41K08njKIIyOAFV4IJL0AQ3oAXaAINH8AxewZv1ZL1Y79bHorVg5TPH4A+szx8+mJaB</latexit> O(n · ✓ d 1 h 1 ◆ ) [Fang, Yu, Cheng, L., and Lin. PVLDB 2019]. 33
  • 33. Approximation Algorithms Core App: 1. Sort vertices of G in ↓ order of their h-clique-based core number, using cheaper proxy. 2. Obtain the max core & core number " from top-# vertices 3. If the max degree of remaining vertices is larger than " • # = 2×#, repeat 2. • Otherwise, output the max core 12/13/22 CUHK-Shenzhen, China Same worst case time complexity as IncApp and PeelApp (SOTA) but much faster in practice. [Fang, Yu, Cheng, L., and Lin. PVLDB 2019]. 34
  • 34. Sample Experiment Results 12/13/22 CUHK-Shenzhen, China As-Caida (n = 26K, m = 106K). Friendster (n = 20M, m = 106M). [Fang, Yu, Cheng, L., and Lin. PVLDB 2019]. 35
  • 35. Mini Case Study: Covid-19 •Covid-19 Retweets. 1,025,937 retweets involving 660,730 users. è(660,730 nodes, 835193 edges). •Largest connected component: (399,962 nodes, 663,506 edges) 12/13/22 CUHK-Shenzhen, China Courtesy: Thirumuruganathan, QCRI. 36
  • 36. 12/13/22 CUHK-Shenzhen, China Densest subgraph : 86 vertices 18-core density : 12.5407 Top-2 densest subgraph: 1134 vertices 13-core density : 10.0150 Cross edges: 296 Side effects of Vaccine Modes of Transm- ission of Virus. Case counts in diff states and countries. 37
  • 37. Mini Case Study II: Voter Fraud 2020 12/13/22 CUHK-Shenzhen, China Tweets on US Presidential Election 2020. Number of nodes : 1,385,225 Number of edges : 6,631,720 Number of Tweets: 8,085,323 Size of the largest connected component: Number of nodes: 1,356,657 Number of edges : 6,611,465 Courtesy: Thirumuruganathan, QCRI. 38
  • 38. 12/13/22 CUHK-Shenzhen, China 1962 vertices 91-core Density: 83.7665 2206 vertices 54-core Density: 50.9231 Cross edges: 1385 39
  • 39. 12/13/22 CUHK-Shenzhen, China Repeated allegations of voter fraud. retweeting Sydney Powell’s tweet warning states against certifying the election. Quoting Trump “dirty rolls ==> dirty polls”. big tech is colluding with dems to defeat Trump. Vote in person to fight against mail-in voter fraud. FBI said many military mail-in votes, all for Trump, were thrown away in a ditch in PA. Biggest voter fraud in American history. Voting machines known to be insecure. Need proof of citizenship and photo ID to prevent fraud. Fact-checkers from AP, Politifact, & Reuters confirm -- no evidence of widespread election fraud. Experts confirm elections are secure; most of the interference comes from misinformation campaigns. GOP and Trump team are sowing disinfo. and panic. Need to protect democracy. Trump’s narrower margin wins in 2016 vs Biden’s wider ones in 2020. Debunk “Deborah Jean Christiansen’s vote is fraud” by quoting her. More former Trump aides getting infected than voter fraud cases! Quotes of Sydney Powell’s tweet; replies that there is no evidence of widespread fraud; Biden brags about having “the most extensive and inclusive VOTER FRAUD organization in the history of American politics; (CNN) dishonesty taxonomy of Trump rally; Phily Mayor hiding info. from people. Anyone caught cheating with Voter Fraud games should be federally charged; State officials from both parties stated the election went well. Losing side refusing to recognize clear winner; weaving conspiracy theories and strangling faith and belief. 40
  • 40. Mini Case Study III: Nepal Earthquake 12/13/22 CUHK-Shenzhen, China • Graph constructed from cascades of tweets collected following the Nepal earthquake, April 2015. • 265383 nodes. • 3898972 edges. • largest connected component: • 258756 nodes. • 3771999 edges. Courtesy: Thirumuruganathan, QCRI. https://zenodo.org/record/2587475#.Ypkxmi-caFg. 41
  • 41. 12/13/22 CUHK-Shenzhen, China 1463 vertices 129-core density: 105.328 370 vertices 115-core density : 71.9378 129 edges Requests for help Info on earthquake – magnitude, distance to cities affected from capital Reports on damage and ruin 42
  • 42. Recent Progress on DSGs WWW2020 Provide near optimal via multiple peeling 1 + # -approx within $( & '( ) *∗ ⋅ - ./) proved by [SODA2022] STOC2020 (1 + #)-approximation on dynamic graph With $(log4 5 ⋅ #67) per edge insertion/deletion WWW2020 Define and find minimal DSG Minimal: no proper subgraph is a DSGs SODA2022 A flow-based 1 + # - approx algo With 8 $( 9 . ) 12/13/22 CUHK-Shenzhen, China [Digvijay, Gao, Peng et al. Flowless: Extracting densest subgraphs without flow computations. WWW 2020]. [Sawlani and Wang. Near-optimal fully dynamic densest subgraph. STOC 2020]. [Chang and Qiao. Deconstruct Densest Subgraphs. WWW 2020]. [Chekuri, Quanrud, and Torres. Densest Subgraph: Supermodularity, Iterative Peeling, and Flow. SODA 2022]. 43
  • 43. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected •Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 12/13/22 CUHK-Shenzhen, China 44
  • 44. Directed Densest Subgraphs 12/13/22 CUHK-Shenzhen, China a e d c b !∗ #∗ A directed densest subgraph (DDS) of a digraph is a pair of vertex sets (S, T). Its density is <latexit sha1_base64="jzi2npcaaUdd+d3XTNTd2P0/iEE=">AAACGnicbVBNS8NAEN3U7/oV9ehlsQgKUhIp6kUoiuBRsdVCU8pmu2mXbrJxdyKUJL/Di3/FiwdFvIkX/43b2oNaHwy8fW+GnXl+LLgGx/m0ClPTM7Nz8wvFxaXllVV7bf1ay0RRVqdSSNXwiWaCR6wOHARrxIqR0Bfsxu+fDv2bO6Y0l1ENBjFrhaQb8YBTAkZq266nenLnaq+2i4+xFyhC0+xs9M7y1NO3CtLsKvNoRwLOalme47ZdcsrOCHiSuGNSQmNctO13ryNpErIIqCBaN10nhlZKFHAqWF70Es1iQvuky5qGRiRkupWOTsvxtlE6OJDKVAR4pP6cSEmo9SD0TWdIoKf/ekPxP6+ZQHDUSnkUJ8Ai+v1RkAgMEg9zwh2uGAUxMIRQxc2umPaIyQdMmkUTgvv35ElyvV92D8qVy0qpejKOYx5toi20g1x0iKroHF2gOqLoHj2iZ/RiPVhP1qv19t1asMYzG+gXrI8vxligKA==</latexit> ⇢(S, T) = |E(S, T)| p |S| · |T| -- generalizes edge density from undirected graphs. Problem: Find !∗ , #∗ with max. %. [Kannan and Vinay. Analyzing the structure of large graphs. Tech Report 1999]. 45
  • 45. SOTA1 DDS Discovery: Exact • Repeatedly solve Max-flow, similarly to the undirected case. • for each value of ! = |$| |%| : 0 < ) , |+| ≤ - • Find the max density by binary search. • Build flow network and solve Max-flow. • Overall time: . -/ 01234567 . • > 2 days on ~1,200 vertices and ~2,600 edges. 12/13/22 CUHK-Shenzhen, China [Khuller and Saha. On finding dense subgraphs. ICALP 2009]. 1As of 2019. 46
  • 46. SOTA DDS Discovery: Approximation 12/13/22 CUHK-Shenzhen, China Greedy Peeling Algorithm: • Build a bipartite graph (L,R,E) where ! = # = $ • The edges are all from ! copy to # copy • Each time remove a node with least degree • Report densest subgraph among those obtained. c a b d e % & + ( time. Approximation? G [Khuller and Saha. On finding dense subgraphs. ICALP 2009]. 47
  • 47. SOTA DDS Discovery: Approximation 12/13/22 CUHK-Shenzhen, China • Fix [personal communication with authors]. • 2-approximation algorithm • !(#(# + %)) KS-Approx density: 2.75 Ground truth density: 6 <latexit sha1_base64="Whotl/O/SEtWiMWhAbdFJgi04F4=">AAACfHicbVFdSwJBFB23L7MvrcdehiwoKtm1qB6jgnrwoSgrMJHZ8aqDs7PLzN1Qln5Cr/Xb+jPRrBqkdmHgcM7cz+NHUhh03a+MMzM7N7+QXcwtLa+sruUL648mjDWHKg9lqJ99ZkAKBVUUKOE50sACX8KT371M9adX0EaE6gH7EdQD1laiJThDS937Da+RL7oldxB0GngjUCSjuG0UMvylGfI4AIVcMmNqnhthPWEaBZfwlnuJDUSMd1kbahYqFoCpJ4NZ3+iOZZq0FWr7FNIB+zcjYYEx/cC3PwOGHTOppeR/Wi3G1lk9ESqKERQfNmrFkmJI08VpU2jgKPsWMK6FnZXyDtOMoz3PWJdB7Qj42CZJL1aCh02YYCX2UDNLGsCACZVulVSEinu0InywN1Hwq9qyqbx7JdoCzUHFeqAOrjVAd28qxdriTZowDR7LJe+oVL47Lp5fjAzKkk2yRXaJR07JObkht6RKOGmTd/JBPjPfzraz7xwOvzqZUc4GGQvn5Ad1dMU4</latexit> <latexit sha1_base64="5KqrWk8OLGSMOzvxwclVW1sn29I=">AAACfHicbVHLSiNBFK20zhgfo0aXbgqj4DAaulXUpaigiywUjQoxhOqbm1ikurqpui0JjZ/gVr/NnxGrYwSTzIWCwzl1nydMlLTk++8Fb2r61++Z4uzc/MKfxaXl0sqtjVMDWINYxeY+FBaV1FgjSQrvE4MiChXehd3TXL97QmNlrG+on2AjEh0t2xIEOeoamkFzuexX/EHwSRAMQZkN47JZKsBDK4Y0Qk2ghLX1wE+okQlDEhQ+zz2kFhMBXdHBuoNaRGgb2WDWZ77pmBZvx8Y9TXzA/szIRGRtPwrdz0jQox3XcvJ/Wj2l9lEjkzpJCTV8NWqnilPM88V5SxoEUn0HBBjpZuXwKIwAcucZ6TKonSCMbJL1Ui0hbuEYq6hHRjjSIkVC6nyrrCp12uNVGaK7icZv1ZXN5a0z2ZFkt6vOA719bhC7fydSnC3BuAmT4Ha3EuxVdq/2y8cnQ4OKbI2tsy0WsEN2zC7YJasxYB32wl7ZW+HD2/D+eTtfX73CMGeVjYR38Al3jMU5</latexit> <latexit sha1_base64="OciOVARK1sKEoilDge+XCapw8Sg=">AAACfHicbVFdbxJBFB1WrS3alupjXyaiCaYt2cWm9ZGoiT7wgGn5SICQ2csFJszObmbuNpANP8FX/W3+GeMsYCLQm0xycs7czxMmSlry/d8F78nTZwfPD4+KL14en5yWzl61bZwawBbEKjbdUFhUUmOLJCnsJgZFFCrshLPPud55QGNlrO9pkeAgEhMtxxIEOeoOhrVhqexX/VXwfRBsQJltojk8K0B/FEMaoSZQwtpe4Cc0yIQhCQqXxX5qMREwExPsOahFhHaQrWZd8neOGfFxbNzTxFfs/xmZiKxdRKH7GQma2l0tJx/TeimNPw4yqZOUUMO60ThVnGKeL85H0iCQWjggwEg3K4epMALInWery6p2grC1STZPtYR4hDusojkZ4UiLFAmp862yhtTpnDdkiO4mGv+prmwuV77IiSR72XAe6MuvBnH2fi/F2RLsmrAP2rVq8KFa+35drn/aGHTIztkbVmEBu2V19o01WYsBm7Af7Cf7VfjjvfUuvKv1V6+wyXnNtsK7+Qt5osU6</latexit> <latexit sha1_base64="DGIsGN9ixCJF6GsZzWTuQPbmAhU=">AAACfHicbVHLSiNBFK30OBp1fM0s3RTGAQc1dKvoLEUFXWShaFSIIVTf3CRFqqubqtuS0OQT3Oq3+TNidYxgEi8UHM6p+zxhoqQl338teD9mfs7OFecXFn8tLa+srv2+tXFqAKsQq9jch8KikhqrJEnhfWJQRKHCu7B7mut3j2isjPUN9ROsR6KtZUuCIEddQ2O/sVryy/4w+DQIRqDERnHZWCvAQzOGNEJNoIS1tcBPqJ4JQxIUDhYeUouJgK5oY81BLSK09Ww464D/dUyTt2LjniY+ZL9mZCKyth+F7mckqGMntZz8Tqul1Ppfz6ROUkINH41aqeIU83xx3pQGgVTfAQFGulk5dIQRQO48Y12GtROEsU2yXqolxE2cYBX1yAhHWqRISJ1vlVWkTnu8IkN0N9H4qbqyubx1JtuS7E7FeaB3zg1i999UirMlmDRhGtzulYP98t7VQen4ZGRQka2zDbbFAnbEjtkFu2RVBqzNntgzeym8eZvetrf78dUrjHL+sLHwDt8Be7jFOw==</latexit> <latexit sha1_base64="JOt/1H2zqv7i0ww80DAT2XJ/owU=">AAACfHicbVHLSiNBFK30OBp1fM0s3RTGAQc1dKvoLEUFXWShaFSIIVTf3CRFqqubqtuS0OQT3Oq3+TNidYxgEi8UHM6p+zxhoqQl338teD9mfs7OFecXFn8tLa+srv2+tXFqAKsQq9jch8KikhqrJEnhfWJQRKHCu7B7mut3j2isjPUN9ROsR6KtZUuCIEddQ+OgsVryy/4w+DQIRqDERnHZWCvAQzOGNEJNoIS1tcBPqJ4JQxIUDhYeUouJgK5oY81BLSK09Ww464D/dUyTt2LjniY+ZL9mZCKyth+F7mckqGMntZz8Tqul1Ppfz6ROUkINH41aqeIU83xx3pQGgVTfAQFGulk5dIQRQO48Y12GtROEsU2yXqolxE2cYBX1yAhHWqRISJ1vlVWkTnu8IkN0N9H4qbqyubx1JtuS7E7FeaB3zg1i999UirMlmDRhGtzulYP98t7VQen4ZGRQka2zDbbFAnbEjtkFu2RVBqzNntgzeym8eZvetrf78dUrjHL+sLHwDt8Bfc7FPA==</latexit> <latexit sha1_base64="wI9CgGlL/wh61/YzwYNb5yZoG+8=">AAACfHicbVHLSitBEO2Mb72+l24acy8oapjxvRQVdJGFco0KMYSeSiU26ekZumskYfAT3Oq3+TNiT4xgEgsaDud0PU+YKGnJ998L3tj4xOTU9Mzs3J/5hcWl5ZVbG6cGsAKxis19KCwqqbFCkhTeJwZFFCq8C9tnuX73hMbKWN9QN8FaJFpaNiUIctR/qB/Ul4p+ye8FHwVBHxRZP67qywV4aMSQRqgJlLC2GvgJ1TJhSILC59mH1GIioC1aWHVQiwhtLevN+sz/OabBm7FxTxPvsT8zMhFZ241C9zMS9GiHtZz8Taum1DyuZVInKaGGr0bNVHGKeb44b0iDQKrrgAAj3awcHoURQO48A116tROEgU2yTqolxA0cYhV1yAhHWqRISJ1vlZWlTju8LEN0N9H4rbqyubxxLluS7HbZeaC3Lwxie3MkxdkSDJswCm53S8Feafd6v3hy2jdomq2xdbbBAnbETtglu2IVBqzFXtgreyt8eH+9LW/n66tX6OessoHwDj8Bf+TFPQ==</latexit> <latexit sha1_base64="WOlJgwemx+DmvqbfEWG3xF6xG2Q=">AAACfHicbVFdSxtBFJ2sVlPbaqKPfRlMC5Zq2I0SfRQr6EMelDYqxBBmb26SIbOzy8xdSVjyE3zV39Y/UzobI5jECwOHc+Z+njBR0pLv/y14K6sf1taLHzc+ff6yuVUqb9/YODWATYhVbO5CYVFJjU2SpPAuMSiiUOFtOPyV67cPaKyM9R8aJ9iORF/LngRBjvoNnXqnVPGr/jT4MghmoMJmcdUpF+C+G0MaoSZQwtpW4CfUzoQhCQonG/epxUTAUPSx5aAWEdp2Np11wr87pst7sXFPE5+ybzMyEVk7jkL3MxI0sItaTr6ntVLqnbQzqZOUUMNLo16qOMU8X5x3pUEgNXZAgJFuVg4DYQSQO89cl2ntBGFuk2yUaglxFxdYRSMywpEWKRJS51tlDanTEW/IEN1NNL6qrmwu753LviS733Ae6P0Lgzj8sZTibAkWTVgGN7VqcFitXR9VTs9mBhXZV7bL9ljAjtkpu2RXrMmA9dkje2LPhX/eN++nd/Dy1SvMcnbYXHj1/4H6xT4=</latexit> <latexit sha1_base64="b/lZi7cHtUhY0qgyTwdfMpaH82g=">AAACfnicbVFdSxtBFL1ZW6u2WrWPfRkaLAoad6Ogj1IL9iEPFowKMYTZyU28ZHZ2mbkrCUt+g6/60/w3zsYUmsQLA4dz5n6eONPkOAxfKsHSh4/Ln1ZW1z5/Wd/4urm1fe3S3CpsqlSn9jaWDjUZbDKxxtvMokxijTfx4LzUbx7QOkrNFY8ybCeyb6hHSrKnmnGnqI87m9WwFk5CLIJoCqowjcvOVkXddVOVJ2hYaelcKwozbhfSMimN47W73GEm1UD2seWhkQm6djGZdix2PNMVvdT6Z1hM2P8zCpk4N0pi/zORfO/mtZJ8T2vl3DttF2SynNGot0a9XAtORbm66JJFxXrkgVSW/KxC3UsrFfsDzXSZ1M5QzWxSDHNDKu3iHKt5yFZ60iEnkky5VdEgkw9Fg2L0NzH4T/VlS3n3N/WJ3X7Du2D2LyziYG8hxdsSzZuwCK7rteioVv97XD37NTVoBb7DD9iFCE7gDP7AJTRBAcEjPMFzAMHP4CA4fPsaVKY532AmgtNXo8DFRg==</latexit> <latexit sha1_base64="HFj6g0RuKsIntz/MrjsPqy3QnNo=">AAACfnicbVFdSxtBFL3ZqvX7oz76MhgqChp3tVAfxQr6kAeFRoUYwuzkJl4yO7vM3C0JS35DX9uf1n/T2RjBJF4YOJwz9/PEmSbHYfivEnxaWFz6vLyyura+sbm1vfPlwaW5VdhQqU7tUywdajLYYGKNT5lFmcQaH+P+j1J//IXWUWp+8jDDViJ7hrqkJHuqEbeL81F7uxrWwnGIeRBNQBUmcdfeqajnTqryBA0rLZ1rRmHGrUJaJqVxtPqcO8yk6sseNj00MkHXKsbTjsRXz3REN7X+GRZj9n1GIRPnhknsfyaSX9ysVpIfac2cuxetgkyWMxr12qiba8GpKFcXHbKoWA89kMqSn1WoF2mlYn+gqS7j2hmqqU2KQW5IpR2cYTUP2EpPOuREkim3Kupk8oGoU4z+JgbfVF+2lA+vqUfsjuveBXN8YxH7R3Mp3pZo1oR58HBWi85rZ/ffqpdXE4OWYQ/24RAi+A6XcAt30AAFBL/hD/wNIDgIToLT169BZZKzC1MRXPwHpdfFRw==</latexit> <latexit sha1_base64="Mrac9AmGg1pDfULu3wtz7vyya0Y=">AAACf3icbVFdSxtBFJ2sbY3a1q9HX4aGYgoh7PqB+ha0oA95sNDEQBLC7M1NHDM7u8zclYQl/8FX/Wf+m87GCCbphYHDOXM/T5goacn3Xwve2qfPX9aLG5tbX799397Z3WvaODWADYhVbFqhsKikxgZJUthKDIooVHgXjq5y/e4RjZWx/kuTBLuRGGo5kCDIUU3oZcen095Oya/6s+CrIJiDEpvHbW+3AJ1+DGmEmkAJa9uBn1A3E4YkKJxudlKLiYCRGGLbQS0itN1sNu6U/3RMnw9i454mPmM/ZmQisnYShe5nJOjeLms5+T+tndLgvJtJnaSEGt4aDVLFKeb57rwvDQKpiQMCjHSzcrgXRgC5Cy10mdVOEBY2ycaplhD3cYlVNCYjHGmRIiF1vlVWlzod87oM0d1E47vqyuZy+bccSrKVurNBV64N4ujXSoqzJVg2YRU0j6rBcfXoz0mpdjk3qMgO2A9WZgE7YzV2w25ZgwF7YE/smb14Be/Qq3r+21evMM/ZZwvhXfwDPbHFhw==</latexit> <latexit sha1_base64="5PN1cqdjLamdY7CGZ+vd+T/Tydo=">AAACf3icbVFdSxtBFJ2sbY3a1q9HX4aGYgoh7Kr48Ra0oA95sNDEQBLC7M1NHDM7u8zclYQl/8FX/Wf+m87GCCbphYHDOXM/T5goacn3Xwve2qfPX9aLG5tbX799397Z3WvaODWADYhVbFqhsKikxgZJUthKDIooVHgXjq5y/e4RjZWx/kuTBLuRGGo5kCDIUU3oZcen095Oya/6s+CrIJiDEpvHbW+3AJ1+DGmEmkAJa9uBn1A3E4YkKJxudlKLiYCRGGLbQS0itN1sNu6U/3RMnw9i454mPmM/ZmQisnYShe5nJOjeLms5+T+tndLgvJtJnaSEGt4aDVLFKeb57rwvDQKpiQMCjHSzcrgXRgC5Cy10mdVOEBY2ycaplhD3cYlVNCYjHGmRIiF1vlVWlzod87oM0d1E47vqyuZy+bccSrKVurNBV64N4ujXSoqzJVg2YRU0j6rBcfXoz0mpdjk3qMgO2A9WZgE7YzV2w25ZgwF7YE/smb14Be/Qq3r+21evMM/ZZwvhXfwDP8jFiA==</latexit> <latexit sha1_base64="kaC645jKDfiAyiPu7G+ZAQe5fW0=">AAACf3icbVFdSwJBFB23b/vSeuxlSKKCkF0LqreooB58KEgNTGR2vOrk7OwyczeUxf/Qa/2z/k2ztkFqFwYO58z9PH4khUHX/co5C4tLyyura/n1jc2t7UJxp27CWHOo8VCG+tlnBqRQUEOBEp4jDSzwJTT8wU2qN95AGxGqJxxF0ApYT4mu4AwtVffbiXcxbhdKbtmdBJ0HXgZKJIuHdjHHXzohjwNQyCUzpum5EbYSplFwCeP8S2wgYnzAetC0ULEATCuZjDumB5bp0G6o7VNIJ+zfjIQFxowC3/4MGPbNrJaS/2nNGLsXrUSoKEZQ/KdRN5YUQ5ruTjtCA0c5soBxLeyslPeZZhzthaa6TGpHwKc2SYaxEjzswAwrcYiaWdIABkyodKukKlQ8pFXhg72Jgl/Vlk3lo1vRE2hOqtYGdXKnAQbHcynWFm/WhHlQr5S903Ll8ax0dZ0ZtEr2yD45Ih45J1fknjyQGuHklbyTD/Lp5JxDp+y4P1+dXJazS6bCufwGPavFhw==</latexit> … 18 vertices 36 vertices <latexit sha1_base64="0KVnyYv6DtN8OkwP1tQAIjKB8QQ=">AAACfHicbVFdbxJBFB1WWyttLdVHXybSJjStZBeN+kjUxD7wQKN8JEDI3eECE2ZnNzN3DWTDT/BVf5t/xjgLNCnQm0xycs7czxMmSlry/b8F78nTg8NnR8+LxyenL85K5y/bNk6NwJaIVWy6IVhUUmOLJCnsJgYhChV2wtmXXO/8RGNlrH/QIsFBBBMtx1IAOeo7DINhqexX/VXwfRBsQJltojk8L4j+KBZphJqEAmt7gZ/QIANDUihcFvupxQTEDCbYc1BDhHaQrWZd8kvHjPg4Nu5p4iv2YUYGkbWLKHQ/I6Cp3dVy8jGtl9L40yCTOkkJtVg3GqeKU8zzxflIGhSkFg6AMNLNysUUDAhy59nqsqqdoNjaJJunWop4hDusojkZcKRFikDqfKusIXU65w0ZoruJxnvVlc3lylc5kWRvGs4DffPNIM6u9lKcLcGuCfugXasG76q1u/fl+ueNQUfsNXvDKixgH1md3bImazHBJuwX+83+FP55F96193b91Stscl6xrfA+/AdzXMU3</latexit> Approximation Ratio ' (.*+ = 2.18 # of c nodes = 41( # of b nodes = 21( # of a nodes = 1 Ground truth density: 21 KS-Approx density: 23 (3456 Approx Ratio: (3456 ( Enlarge the graph [Khuller and Saha. On finding dense subgraphs. ICALP. 2009]. 7∗. 7∗. [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021]. 48
  • 48. Densest Directed Subgraph: An Exact Algorithm • (", $)-core: An (S, T)-induced subgraph: • Every node in S has outdegree ≥ ". • Every node in T has indegree ≥ $. • S and T not necessarily disjoint. • H = ({a,b}, {c,d}) is a (2, 2)-core. 12/13/22 CUHK-Shenzhen, China c a b d e ⇤ <latexit sha1_base64="vK5hisxuwaLuWz+t3EWeuy906m4=">AAACfHicbVHZSgMxFE3Hre7boy/BKriWGRX1UVTQhz5U7CLUKpn0toZmMkNyR1qGfoKv+m3+jJipFWzrhcDhnNz1+JEUBl33M+NMTE5Nz2Rn5+YXFpeWV1bXKiaMNYcyD2WoH3xmQAoFZRQo4SHSwAJfQtVvX6V69RW0EaEqYTeCesBaSjQFZ2ip+9LT3vNKzs27/aDjwBuAHBlE8Xk1wx8bIY8DUMglM6bmuRHWE6ZRcAm9ucfYQMR4m7WgZqFiAZh60p+1R7ct06DNUNunkPbZvxkJC4zpBr79GTB8MaNaSv6n1WJsntcToaIYQfGfRs1YUgxpujhtCA0cZdcCxrWws1L+wjTjaM8z1KVfOwI+tEnSiZXgYQNGWIkd1MySBjBgQqVbJQWh4g4tCB/sTRT8qrZsKu9ci5ZAc1CwHqiDGw3Q3h1LsbZ4oyaMg8pR3jvOH92d5C4uBwZlyQbZJDvEI2fkgtySIikTTlrkjbyTj8yXs+XsO4c/X53MIGedDIVz+g1Hc8Ui</latexit> ⇤ <latexit sha1_base64="IfdjkWd9tC1nJRISm8srvbkdDxo=">AAACfHicbVHLSgMxFE3HV32/lm6CVaivMqOiLkUFXXRR0bZCrZJJb2toJjMkd6Rl6Ce41W/zZ8RMrWBbLwQO5+Q+jx9JYdB1PzPOxOTU9Ex2dm5+YXFpeWV1rWLCWHMo81CG+sFnBqRQUEaBEh4iDSzwJVT99mWqV19BGxGqe+xGUA9YS4mm4AwtdXf3tPu8knMLbj/oOPAGIEcGUXpezfDHRsjjABRyyYypeW6E9YRpFFxCb+4xNhAx3mYtqFmoWACmnvRn7dFtyzRoM9T2KaR99m9GwgJjuoFvfwYMX8yolpL/abUYm2f1RKgoRlD8p1EzlhRDmi5OG0IDR9m1gHEt7KyUvzDNONrzDHXp146AD22SdGIleNiAEVZiBzWzpAEMmFDpVklRqLhDi8IHexMFv6otm8r5K9ESaPaL1gO1f60B2jtjKdYWb9SEcVA5LHhHhcPb49z5xcCgLNkgmyRPPHJKzskNKZEy4aRF3sg7+ch8OVvOnnPw89XJDHLWyVA4J99FW8Uh</latexit> [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021]. 49
  • 49. Densest Directed Subgraph: Core-Exact Theorem: The DDS of G is contained in the ( "∗ $ % , %⋅"∗ $ )- core. • a = )∗ |+∗| -- unknown; search through all , - : 0 < 1, 2 ≤ 4. • 6∗ -- unknown: start with good bounds and use binary search. • E.g., lower bound = any 2-approx. solution and upper bound = 2 × lower bound. • Still 9(4$ :;%<=>?@) but much faster in practice – smaller flow graphs. 12/13/22 CUHK-Shenzhen, China [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021]. 50
  • 50. Densest Directed Subgraph: DC-Exact • Uses a “divide and conquer” approach. • For a given ! " , result of binary search for “best” (S,T) pair gives enough info. about subranges of ratios that can be skipped. • Algorithm DC-Exact: $ %&'()*+,- , e.g., … • % ≪ /0 in practice. 12/13/22 CUHK-Shenzhen, China [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021]. 51
  • 51. Densest Directed Subgraph: Core-Approx • G[S,T] – (x,y)-core of G. Then ! ", $ ≥ &'. • Let [&∗ , '∗ ] be the max core-number pair, i. e. , it maximizes &' among all (&, ')-cores. • !∗ ≤ 2 &∗'∗. • èThe (&∗ , '∗ )-core is a 2-approx. solution to DDS. 12/13/22 CUHK-Shenzhen, China [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021]. 52
  • 52. Densest Directed Subgraph: Core-Approx • Naïve implementation: for each !, compute all (!, $)- cores, 0 < $ < (, and return (!∗ , $∗ )-core à *(( + + ( ) time. • Can we do better? 12/13/22 CUHK-Shenzhen, China [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021]. 53
  • 53. Densest Directed Subgraph: Core-Approx 12/13/22 CUHK-Shenzhen, China x 8 5 2 7 1 6 y 4 3 7 4 8 2 1 3 6 5 ? <latexit sha1_base64="9Gq6BvRBrxDcJNRZdQw3Wu8S3uk=">AAACfnicbVFdSxtBFL3Zfln7pfXRl8HQYsHGXVvQR7FCfciDhUaFJMjdyU28ZHZ2mbkrCYu/oa/tT+u/6WyMYJJeGDicM/fzpIVhL3H8txE9efrs+Yu1l+uvXr95+25j8/2Fz0unqaNzk7urFD0ZttQRFkNXhSPMUkOX6fhbrV/ekvOc258yLaif4cjykDVKoDo9L+iuN5pxK56FWgXJHDRhHufXmw3dG+S6zMiKNuh9N4kL6VfohLWhu/Ve6alAPcYRdQO0mJHvV7Np79SHwAzUMHfhWVEz9nFGhZn30ywNPzOUG7+s1eT/tG4pw6N+xbYohay+bzQsjZJc1aurATvSYqYBoHYcZlX6Bh1qCQda6DKrXZBe2KSalJZ1PqAl1shEHAbSk2TItt6qarMtJ6rNKYWbWHpQQ9la3j3lEYvfawcX7N53RzT+tJISbEmWTVgFFwet5Evr4MfX5vHJ3KA12IYd2IUEDuEYzuAcOqCB4Rf8hj8RRB+jz9H+/deoMc/ZgoWIjv4B/bfFcQ==</latexit> ? <latexit sha1_base64="9Gq6BvRBrxDcJNRZdQw3Wu8S3uk=">AAACfnicbVFdSxtBFL3Zfln7pfXRl8HQYsHGXVvQR7FCfciDhUaFJMjdyU28ZHZ2mbkrCYu/oa/tT+u/6WyMYJJeGDicM/fzpIVhL3H8txE9efrs+Yu1l+uvXr95+25j8/2Fz0unqaNzk7urFD0ZttQRFkNXhSPMUkOX6fhbrV/ekvOc258yLaif4cjykDVKoDo9L+iuN5pxK56FWgXJHDRhHufXmw3dG+S6zMiKNuh9N4kL6VfohLWhu/Ve6alAPcYRdQO0mJHvV7Np79SHwAzUMHfhWVEz9nFGhZn30ywNPzOUG7+s1eT/tG4pw6N+xbYohay+bzQsjZJc1aurATvSYqYBoHYcZlX6Bh1qCQda6DKrXZBe2KSalJZ1PqAl1shEHAbSk2TItt6qarMtJ6rNKYWbWHpQQ9la3j3lEYvfawcX7N53RzT+tJISbEmWTVgFFwet5Evr4MfX5vHJ3KA12IYd2IUEDuEYzuAcOqCB4Rf8hj8RRB+jz9H+/deoMc/ZgoWIjv4B/bfFcQ==</latexit> Candidates [ ⇤ , ⇤ ] Main idea: for each ! ≤ #, search for the largest %; for each % ≤ #, search for the largest !; &( ( ⋅ (* + ()) time. Max equal pair: (#, #). [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Ma et al. On Densest Subgraph Discovery. TODS 2021]. 54
  • 54. Sample Experiment Results: Exact Algorithms 12/13/22 CUHK-Shenzhen, China Up to 6 orders of magnitude faster Datasets MO: (~200, ~2.6K) TC: (~1.2K, ~2.7K) OF: (~3K, ~30K) AD: (~6.4K, ~57K) ) AM: (~400K, ~3.4M) [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. 55
  • 55. Sample Experiment Results: Approx Algorithms 12/13/22 CUHK-Shenzhen, China Up to 6 orders of magnitude faster Datasets MO: (~200, ~2.6K) TC: (~1.2K, ~2.7K) OF: (~3K, ~30K) AD: (~6.4K, ~57K) ) AM: (~400K, ~3.4M) AR: (~3.4M, ~5.8M) BA: (~2.1M, ~17.8M) TW: (~52.6M, ~1.96B) [Ma, Fang, Cheng, L., Zhang, and Lin. Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs . SIGMOD 2020]. [Bahmani, Kumar, Vassilvitskii. Densest Subgraph in Streaming and MapReduce. VLDB 2012]. 56
  • 56. Better Approximation Ratio? • Propose a new LP formulation for DDS problem • A divide-and-conquer algorithmic framework • An efficient (1 + $)-approximation algorithm • An efficient exact algorithm • Up to 3 orders of magnitude faster than the state-of-the- art exact and approximation algorithms 12/13/22 CUHK-Shenzhen, China Any real positive number [Ma, Fang, Cheng, L., and Han. A Convex-Programming Approach for Efficient Directed Densest Subgraph Discovery. SIGMOD 2022]. 57
  • 57. Recent Progress on DDS • A Concurrent work from SODA2022 • Gives (1 + $)-approximation in & '( ( ) ) time via network flow for undirected graphs • Can also be extended to directed graphs with extra time cost • It would be interesting to compare two algos empirically 12/13/22 CUHK-Shenzhen, China [Chekuri, Quanrud, and Torres. “Densest Subgraph: Supermodularity, Iterative Peeling, and Flow.” SODA 2022]. 58
  • 58. Mini Case Study: Covid-19 •Covid-19 Retweets. 1,025,937 retweets involving 660,730 users. è(660,730 nodes, 835193 edges). •Largest connected component: (399,962 nodes, 663,506 edges) 12/13/22 CUHK-Shenzhen, China Courtesy: Thirumuruganathan, QCRI. 59
  • 59. Directed Densest Subgraph from Covid-19 12/13/22 CUHK-Shenzhen, China Source Nodes = 777 Target Nodes = 15 Common Nodes = 2 (5 70)-core. Density: 55.8826 777 nodes “influenced” by 15 “initiators”. Vaccine side effects, Modes of Transmission. 60
  • 60. Mini Case Study II: Nepal Earthquake 12/13/22 CUHK-Shenzhen, China • Graph constructed from cascades of tweets collected following the Nepal earthquake, April 2015. • 265383 nodes. • 3898972 edges. • largest connected component: • 258756 nodes. • 3771999 edges. https://zenodo.org/record/2587475#.Ypkxmi-caFg. Courtesy: Thirumuruganathan, QCRI. 61
  • 61. Directed Densest Subgraph from Nepal 12/13/22 CUHK-Shenzhen, China Source Nodes: 122637 Target Nodes: 25233 Common nodes: 20713 (1,51)-core density: 34.309 Tens of thousands of “initiators” and more than a hundred thousand of ”influenced”. Info on damage and requests for help. 62
  • 62. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed •Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 12/13/22 CUHK-Shenzhen, China 63
  • 63. Propagation/Diffusion Models 12/13/22 CUHK-Shenzhen, China • How does influence/information travel in networks? • Example Phenomena: infection, product adoption, information, opinion, rumor, etc. • Stochastic diffusion models – discrete/continuous time. • How can we launch campaigns to optimize design objectives? [Kempe,Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003]. [W. Chen, L., and C. Castillo. Information and Influence Propagation in Social Networks. Morgan-Claypool 2013]. 64
  • 64. Influence Maximization • Core optimization problem in IM: Given a diffusion model M, a network G = (V, E), model parameters, and problem parameters (e.g., budget). Find a seed set under budget that maximizes . expected number of adopters given initial adopters S (spread). S ⇢ V M (S) 12/13/22 CUHK-Shenzhen, China 65 e.g., edge propagation probabilities. 65
  • 65. Complexity of IM • Theorem: The IM problem is NP-hard for several major diffusion models under both discrete time and continuous time. 12/13/22 CUHK-Shenzhen, China [Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003]. 66
  • 66. Complexity of Spread Computation • Theorem: It is #P-hard to compute the expected spread of a node set under major diffusion models. #simple paths in a digraph. [Chen, Wang, and Yang. Efficient influence maximization in social networks. KDD 2009]. [Chen, Yuan, and Zhang. Scalable influence maximization in social networks under the linear threshold model. ICDM 2010]. [W. Chen, L., and C. Castillo. Information and Influence Propagation in Social Networks. Morgan-Claypool 2013]. 12/13/22 CUHK-Shenzhen, China 67
  • 67. Properties of Spread Function is monotone: S ✓ S0 =) (S)  (S0 ). (S) 12/13/22 CUHK-Shenzhen, China 68
  • 68. Properties of Spread Function is submodular: (S) S ⇢ S0 ⇢ V, x 2 V S0 =) (x|S0 )  (x|s), where (x|S) := (S [ {x}) (S). marginal gain. 12/13/22 CUHK-Shenzhen, China [Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003]. 69
  • 69. Approximation of Submodular Function Maximization • Theorem: Let be a monotone submodular function, with Let and resp. be the greedy and optimal solutions. Then OPT f : 2V ! R 0 f(;) = 0. SGrd S⇤ f(SGrd ) (1 1 e )f(S⇤ ). [Nemhauser, Woolsey, and Fisher. An analysis of the approximations for maximizing submodular set functions. Math. Prog. 1978]. 12/13/22 CUHK-Shenzhen, China 70
  • 70. Approximation of Submodular Function Maximization • Theorem: The spread function is monotone and submodular under various major diffusion models, for both discrete and continuous time. (.) 12/13/22 CUHK-Shenzhen, China [Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003]. 71
  • 71. Baseline Approximation Algorithm Monte Carlo simulations for estimating expected spread. Lazy Forward optimization to save useless updates. è Greedy still extremely slow on large networks. [Leskovec, Krause, Guestarin, Faloutsos, VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. KDD 2007]. [Kempe, Kleinberg, and Tardos. Maximizing the spread of influence through a social network. KDD 2003]. 12/13/22 CUHK-Shenzhen, China 72
  • 72. Reverse Influence Sampling • A series of algorithms that guarantee a -approximation to the optimal expected spread. • Key : use random reverse reachable sets (rr-sets) to gauge quality of (candidate) seeds. (1 1 e ✏) <latexit sha1_base64="AW/ZWNJ71ORm2nTuWljbif+hLkI=">AAACAXicbVBNS8NAEN34WetX1IvgZbEI9dCSVEGPBS8eK9gPaErZbCft0s0m7G6EEuLFv+LFgyJe/Rfe/Ddu2xy09cHA470ZZub5MWdKO863tbK6tr6xWdgqbu/s7u3bB4ctFSWSQpNGPJIdnyjgTEBTM82hE0sgoc+h7Y9vpn77AaRikbjXkxh6IRkKFjBKtJH69nHZrXiBJDR1sxSyigexYjwS53275FSdGfAycXNSQjkaffvLG0Q0CUFoyolSXdeJdS8lUjPKISt6iYKY0DEZQtdQQUJQvXT2QYbPjDLAQSRNCY1n6u+JlIRKTULfdIZEj9SiNxX/87qJDq57KRNxokHQ+aIg4VhHeBoHHjAJVPOJIYRKZm7FdERMHtqEVjQhuIsvL5NWrepeVGt3l6V6PY+jgE7QKSojF12hOrpFDdREFD2iZ/SK3qwn68V6tz7mrStWPnOE/sD6/AGGeJZN</latexit> [Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014]. 12/13/22 12/13/22 CUHK-Shenzhen, China 73
  • 73. Reverse Reachable Sets (RR-Sets) A B C E D 0.4 0.3 0.6 0.5 0.2 0.3 0.4 • rr-set = sample subgraph of G. • example of rr-set generation under IC model. [Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014]. 12/13/22 74 12/13/22 CUHK-Shenzhen, China
  • 74. Reverse Reachable Sets (RR-Sets) start from a random node A B C E D 0.4 0.3 0.6 0.5 0.2 0.3 0.4 RR-set = {A} • rr-set = sample subgraph of G. • example of rr-set generation under IC model. [Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014] 12/13/22 75 12/13/22 CUHK-Shenzhen, China
  • 75. Reverse Reachable Sets (RR-Sets) • An RR-set is a subgraph sample of ! • Generation of RR-sets under the IC model: start from a random node A B C E D 0.4 0.3 0.6 0.5 0.2 0.3 0.4 sample its/their incoming edges RR-set = {A, C, B, E} add the sampled neighbors • Intuition: – An rr-set is a sample set of nodes that can influence node A [Borgs, Brautbar, and Chayes, Maximizing Social Influence in Nearly Optimal Time. SODA 2014] 12/13/22 76 12/13/22 CUHK-Shenzhen, China
  • 76. Influence Estimation with RR-Sets • Theorem: Pr[S overlaps a random rr-set] = ! " × expected spread of S. • Family of approx. algorithms: TIM, IMM, Stop- and-Stare, … [Tang et al., “Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency”, SIGMOD 2014] [Tang et al., “Influence Maximization in Near-Linear Time: A Martingale Approach”, SIGMOD 2015] [Chen et al. An issue in the Martingale Analysis of the Influence Maximization Algorithm IMM. arXiv 2018]. [Nguyen et al., “Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks”, SIGMOD 2016] à arXiv [K. Huang, S. Wang, G. Bevilacqua, X. Xiao, and L. Revisiting the Stop-and-Stare Algorithms for Influence Maximization, PVLDB 2017] 12/13/22 12/13/22 CUHK-Shenzhen, China 77
  • 77. What if objective is not submodular? 12/13/22 CUHK-Shenzhen, China • Max non-decreasing non-submodular function. ! "#$% ≥ 1 ( 1 − e+,- OPT. [Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017]. 78
  • 78. What if objective is not submodular? 12/13/22 CUHK-Shenzhen, China • Max non-decreasing non-submodular function. ! "#$% ≥ 1 ( 1 − e+,- OPT. [Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017]. 79
  • 79. What if objective is not submodular? 12/13/22 CUHK-Shenzhen, China • Max non-decreasing non-submodular function. ! "#$% ≥ 1 ( 1 − e+,- OPT. [Bian, Buhmann, Krause, and Tschiatschek. Guarantees … Applications. PMLR 2017]. 80
  • 80. What if the objective is not submodular? 12/13/22 CUHK-Shenzhen, China to the rescue! [Lu, Chen, and L. From Competition to complementarity: … Maximization. PVLDB 2016]. • f – monotone but not submodular. • !, # – monotone and submodular and ! (#) lower (resp. upper) bounds f. • Let $% ($', $() be the Greedy solution to max -⊆/, - 01 2 $ (resp. …) and $34 ∈ {$%, $', $(} be the best w.r.t. f(.). Then 81
  • 81. What if the objective is not submodular? 12/13/22 CUHK-Shenzhen, China to the rescue! [Lu, Chen, and L. From Competition to complementarity: … Maximization. PVLDB 2016]. ! "#$ ≥ max{ !("+) -("+) , /("0 123 ) !("0 123 ) } ⋅ 1 − 1 8 ⋅ ! "0 123 . OPT. 82
  • 82. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization •Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 12/13/22 CUHK-Shenzhen, China 83
  • 83. Filter Bubbles, Echo Chambers, and Polarization • Selective exposure to viewpoints/issues can engender/worsen polarization. [Pariser. The filter bubble: What the Internet is hiding from you. Penguin, 2011]. [Bakshy, Messing, and Adamic. Exposure to ideologically diverse news and opinion on Facebook. Science 2015]. • Aggravated by echo chambers in social media. [Garrett. Echo chambers online?: Politically motivated selective exposure among internet news users. JCMC 2009]. [Akoglu. Quantifying political polarity based on bipartite opinion networks. ICWSM 2014]. [Amelkin, Singh, and Bogdanov. A Distance Measure for the Analysis of Polar Opinion Dynamics in Social Networks. TKDD 2019]. [Chen, Lijffijit,, and De Bie. Quantifying and Minimizing Risk of Conflict in Social Media. KDD 2018]. [Garimella, de Morales, Gionis, and Mathioudakis. Quantifying Controversy over Social Media. TOCS 2018]. 12/13/22 CUHK-Shenzhen, China 84
  • 84. Balancing Exposure by Connections • Link Recommendation [Amelkin and A. K. Singh. Fighting opinion control in social networks via link recommendation. KDD 2019]. [Musco, Musco, and Tourakakis. Minimizing polarization and disagreement in social networks. WWW 2018],. [Zhu, Bao, and Zhang. Minimizing Polarization and Disagreement in Social Networks via Link Recommendation. NeurIPS 2021]. 12/13/22 CUHK-Shenzhen, China 85
  • 85. Interdisciplinary Approach • Comprehensive solution goes beyond CS: e.g., Polarization Lab https://www.polarizationlab.com • Interdisciplinary (CS, stats, sociology) approach. • Real-life experiment by recruiting democrat and republican volunteers incentivized to follow bots tweeting posts initially aligned with their ideology but gradually from the other side of the aisle. • Complemented with offline tracking and study. [Bail. Breaking the Social Media Prism. Princeton Univ. Press. 2021]. 12/13/22 CUHK-Shenzhen, China 86
  • 86. Balancing via Information Campaigns • Smart Algorithm Bursts Social Networks' "Filter Bubbles" • “Instead of building echo chambers, Facebook, Twitter and company can tweak their code to broaden exposure to wider ranges of views.” • “… results suggest that targeting a strategic group of social media users and feeding them the right content is more effective for propagating diverse views through a social media network …” 12/13/22 CUHK-Shenzhen, China [IEEE Spectrum Jan 2021. Featuring research of Aslay, Matakos, Galbrun, and Gionis. TKDE 2020]. 87
  • 87. Balancing via Information Campaigns • Information Campaign Approach [Garimella, Gionis, Parotsidis, and Tatti. Balancing information exposure in social networks. NeurIPS 2018]. [Aslay, Matakos, Galbrun, and Gionis. Maximizing the Diversity of Exposure in a Social Network. TKDE 2020]. [Tu, Aslay, and Gionis. Co-exposure maximization in online social networks. NeurIPS 2020]. • Common assumptions: • awareness = adoption. • Adoption of opposing views is independent. 12/13/22 CUHK-Shenzhen, China 88
  • 88. Opinions can have complex interaction 12/13/22 CUHK-Shenzhen, China Adopted and propagated independently?! The Liberals claim that … they can cut Canada’s greenhouse gas emissions by 40 to 45% below 2005 levels by 2030. They passed a climate plan, C-12, to set legally binding emissions targets to reach net-zero emissions in 2050. New Democrats supported the Liberals’ net-zero legislation and have set an emissions reduction target of 50 per cent below 2005 levels by 2030. The Conservatives opposed the Liberals’ net-zero emissions legislation and say their climate plan will meet Paris climate commitments of 30 per cent below 2005 levels by 2030. The People’s Party platform argues that there is “no scientific consensus” that human activity is driving climate change and has said warnings of looming environmental catastrophe are exaggerated. Source: https://newsinteractives.cbc.ca/elections/federal/2021/party-platforms/#section-climate-change 89
  • 89. Opinions can have complex interaction 12/13/22 CUHK-Shenzhen, China Pure competition. The Liberals claim that … they can cut Canada’s greenhouse gas emissions by 40 to 45% below 2005 levels by 2030. They passed a climate plan, C-12, to set legally binding emissions targets to reach net-zero emissions in 2050. New Democrats supported the Liberals’ net-zero legislation and have set an emissions reduction target of 50 per cent below 2005 levels by 2030. The Conservatives opposed the Liberals’ net-zero emissions legislation and say their climate plan will meet Paris climate commitments of 30 per cent below 2005 levels by 2030. The People’s Party platform argues that there is “no scientific consensus” that human activity is driving climate change and has said warnings of looming environmental catastrophe are exaggerated. 90
  • 90. Opinions can have complex interaction 12/13/22 CUHK-Shenzhen, China Partial competition. The Liberals claim that … they can cut Canada’s greenhouse gas emissions by 40 to 45% below 2005 levels by 2030. They passed a climate plan, C-12, to set legally binding emissions targets to reach net-zero emissions in 2050. New Democrats supported the Liberals’ net-zero legislation and have set an emissions reduction target of 50 per cent below 2005 levels by 2030. The Conservatives opposed the Liberals’ net-zero emissions legislation and say their climate plan will meet Paris climate commitments of 30 per cent below 2005 levels by 2030. The People’s Party platform argues that there is “no scientific consensus” that human activity is driving climate change and has said warnings of looming environmental catastrophe are exaggerated. 91
  • 91. Opinions can have complex interaction 12/13/22 CUHK-Shenzhen, China Complementation/reinforcement. The Liberals claim that … they can cut Canada’s greenhouse gas emissions by 40 to 45% below 2005 levels by 2030. They passed a climate plan, C-12, to set legally binding emissions targets to reach net-zero emissions in 2050. New Democrats supported the Liberals’ net-zero legislation and have set an emissions reduction target of 50 per cent below 2005 levels by 2030. The Conservatives opposed the Liberals’ net-zero emissions legislation and say their climate plan will meet Paris climate commitments of 30 per cent below 2005 levels by 2030. The People’s Party platform argues that there is “no scientific consensus” that human activity is driving climate change and has said warnings of looming environmental catastrophe are exaggerated. 92
  • 92. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization •Mitigating Filter Bubbles •A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions A useful digression. 12/13/22 CUHK-Shenzhen, China 93
  • 93. Awareness vs adoption Higher utility!! Awareness spreads like epidemic, but adoption depends on UTILITY [Kalish. A new product adoption model with price advertising and uncertainty, Management Science 1985]. 12/13/22 CUHK-Shenzhen, China 94
  • 94. Complementary (aka Reinforcing) Campaigns 12/13/22 CUHK-Shenzhen, China 95
  • 95. Welfare Maximization: complementary setting • Problem: Given social network G = (V,E), propagation model, item utility model, and budget vector. Find an allocation of seed nodes to items that maximizes the expected social welfare. Expected sum of utilities of itemsets adopted by users. 12/13/22 CUHK-Shenzhen, China 96
  • 96. What does the theory say? 12/13/22 CUHK-Shenzhen, China [Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019]. 97
  • 97. A simple greedy still works GREEDY ALGORITHM Does not require specific utility-parameters as input (1 − $ % ) approximation 12/13/22 CUHK-Shenzhen, China [Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019]. 98
  • 98. Prefix-preserving seed selection - PRIMA 1 − # $ %&'()*+ 1 − # $ %&'(# ,# ,- 1 − # $ %&'(- ,)*+ = max 2 b2 Select enough samples corresponding to every budget of the budget vector ○ Challenge: The number of samples required is not monotone in budget 12/13/22 CUHK-Shenzhen, China [Banerjee, Chen, and L. Maximizing Welfare … Diffusion Model. SIGMOD 2019]. 99
  • 100. Welfare Maximization: competing setting • Problem: Given social network G = (V,E), propagation model, item utility model, budget vector, and a fixed (partial) allocation of seed nodes to items, find an allocation of seed nodes to items that maximizes the expected social welfare. Expected sum of utilities of itemsets adopted by users. 12/13/22 CUHK-Shenzhen, China 101
  • 101. How hard is (the) competition? 12/13/22 CUHK-Shenzhen, China [Banerjee, Chen, and L. Maximizing Social Welfare in a Competitive Diffusion Model. PVLDB 2021]. 102
  • 102. [Banerjee, Chen, and L. Maximizing Social Welfare in a Competitive Diffusion Model. PVLDB 2021]. General case algorithm - SeqGRD !" !# $# $% $" • Instance dependent approximation : &'() &'*+ (- − - / )123 • Sort the items based on their utilities – {$# > $% > ⋯ > $"} !% … … ∑!9 12/13/22 CUHK-Shenzhen, China $":; = max exp. utility of any bundle. $"9<= exp. min utility of any item. PRIMA+. 103
  • 103. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization •Mitigating Filter Bubbles • A User Utility Perspective •A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 12/13/22 CUHK-Shenzhen, China 104
  • 104. Filter bubble problem YAY! YAY! YAY! YAY! YAY! YAY! YAY! YAY! YAY! YAY! YAY! YAY! NAY! • Items (opinions) are complementary objective-wise • Items (opinions) are competing propagation-wise [Garrett Echo chambers online?: Politically motivated selective exposure among Internet news users. Journal of computer-mediated communication 2009]. [Aslay, Matakos, Galbrun, and Gionis. Maximizing the Diversity of Exposure in a Social Network. TKDE 2020]. 12/13/22 CUHK-Shenzhen, China 105 105
  • 105. Problem: Key Ingredients §Competition parameter § After being influenced, adopt the second item w.p. = !, 0 ≤ ! < 1 §(Host’s) Reward of adoption is supermodular, models complementarity § &, for the first item § & + Δ, for the second item, & < Δ §Expected (host) utility for user adopting both & + !Δ §Goal is to maximize the sum of utilities under a competition- driven diffusion 12/13/22 CUHK-Shenzhen, China 106 [Banerjee. Welfare maximization… influence. PhD Thesis. UBC. 2022]. 106
  • 106. Filter bubble mitigation • There is an existing bubble • A more general setting Item A Problem FB Mitigation (FBM): Given graph ! = #, %, & , competition parameter ', 0 < ' < 1, fixed A seeds +,, and budget -, find B seeds +., such that +. ≤ - and the expected welfare is maximized. 12/13/22 CUHK-Shenzhen, China 107
  • 107. Inherent Challenges – Strike One • FBM is neither monotone nor submodular. • Restricted (sequential) setting: propagation of follower doesn’t start before that of leader ends. FBM in the sequential setting is monotone and submodular! J • But wait! FBM can be arbitrarily worse than FBM$%& and vice versa! L 12/13/22 CUHK-Shenzhen, China 108
  • 108. Another Attempt 12/13/22 CUHK-Shenzhen, China Item A First Level Competition Item B • Expected reward at each FLC node = ! + #Δ. Surrogate objective: Expected # FLC nodes × (! + #Δ). • Clearly a lower bound for FBM. • But the FLC objective is neither monotone nor submodular. 109
  • 109. Algorithm 1 – SPReadGRD • Greedily selects B seeds that maximize the marginal spread • Ignore the welfare objective • PRIMA+ is used to do the seed selection • Given fixed !" , PRIMA selects !# , such that • %(!# ∪ !" ) = 1 − , - − . %(!#∗ ∪ !" ) 12/13/22 CUHK-Shenzhen, China 110 110
  • 110. Analyzing SpreadGRD • Given !, for the welfare function # the following holds: • $% ! ≤ # ! ≤ $ + (Δ %(!) • SPRGRD therefore has the following bound: # !, ∪ !. ≥ $ ⋅ % !, ∪ !. ≥ $ ⋅ 1 − 1 3 − 4 ⋅ % !, ∪ !∗ ≥ $ (Δ + $ (1 − 1 3 − 4)#(!, ∪ !∗ ) 12/13/22 CUHK-Shenzhen, China 111 111
  • 111. Algorithm 2 – Sandwich • Assume a tattler diffusion model • A node influences its neighbors, with every item in the awareness set • !" # ≥ !(#) • !"(⋅) is monotone and submodular 12/13/22 CUHK-Shenzhen, China 112 112
  • 112. Algorithm 2 – Sandwich • Assume a tattler diffusion model • !" # ≥ !(#) • Assume diffusion model with ' = ) • !* # ≤ !(#) • !*(⋅) is monotone and submodular 12/13/22 CUHK-Shenzhen, China 113 113
  • 113. Algorithm 2 – Sandwich • Assume a tattler diffusion model • !" # ≥ !(#) • Assume diffusion model with ' = ) • !* # ≤ !(#) • Using sandwich • Let #,-./ = 0123045678∈ 5:,5,5< !(#,=>) • ! #,-./ ≥ max B 5< B< 5< , B: 5∗ B 5∗ 1 − F G !(#∗ ) 12/13/22 CUHK-Shenzhen, China 114 114
  • 114. Algorithm 3 - NetRewGRD Item A Item B First Level Competition • Extends state of the sampling for welfare objective • Reverse reachable trees • Recursive weight update using a linear pass • Scales for large networks 12/13/22 CUHK-Shenzhen, China [Banerjee. Welfare maximization… influence. PhD Thesis. UBC. 2022]. 115
  • 115. Experiments • Baselines considered: • COEX: Maximizes co-adoptions of both items • TDEM: Maximizes welfare based on leaning scores [Tu, Aslay, and Gionis. Co-exposure maximization in online social networks. NeurIPS 2020]. [Aslay, Matakos, Galbrun, and Gionis. "Maximizing the diversity of exposure in a social network. TKDE 2020] 12/13/22 CUHK-Shenzhen, China 116
  • 116. Sample of Results - Quality 12/13/22 CUHK-Shenzhen, China 117 117
  • 117. Sample of Results – Running Time 12/13/22 CUHK-Shenzhen, China 118 118
  • 118. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective •Mitigating Misinformation • Misinformation Intervention • Summary & Open Questions 12/13/22 CUHK-Shenzhen, China 119
  • 119. Misinformation Mitigation – Prior Art • Influence Blocking • Temporal aspects ignored or not differentiated • Focus on scalability [Ceren, Agrawal, and El Abbadi. "Limiting the spread of misinformation in social networks." WWW 2011], [He, Song, Chen, and Jiang. Influence blocking maximization in social networks under the competitive linear threshold model. SDM 2012], [Song,, Hsu, and Lee. Temporal influence blocking: Minimizing the effect of misinformation in social networks. ICDE 2017], [Tong,Wu, Guo et al. An efficient randomized algorithm for rumor blocking in online social networks." IEEE TNSE 2017], [Tong, Du, and Wu. On misinformation containment in online social networks. NeurIPS 2018], [Simpson, Srinivasan, and Thomo. Reverse Prevention Sampling for Misinformation Mitigation in Social Networks. ICDT 2020]. 12/13/22 CUHK-Shenzhen, China 120
  • 120. Temporal Aspects of Propagation [Vosoughi, Roy, and Aral. The spread of true and false news online. Science 2018] Together these have important consequences for effective seed set selection [Mitchell, Stocking, and Matsa. Long-form reading shows signs of life in our mobile news world. Pew Research Center 2016] Misinformation spreads faster, farther, and wider than truth! Adoption decisions have varying lengths 12/13/22 CUHK-Shenzhen, China 121
  • 121. Temporal Aspects of Propagation [M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022] • Associate meeting probabilities with each edge • User reaction times sampled from a data-driven distribution t = 0 t = 2 t = 3 t = 6 12/13/22 CUHK-Shenzhen, China Adoption decisions of !", !$, !%, !&, !' uncontested. !( faces a tie; broken with a random permutation, e.g., !', !" . F->3. DW: [3,6]. M->4. Tie! 122
  • 122. Misinformation Mitigation Problem [M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022] Reward function !(⋅) measures effectiveness of mitigation P1 is not submodular! P1: Given fake seeds %& and reward function !(⋅), find a seed set that maximizes the expected reward 12/13/22 CUHK-Shenzhen, China Truth reaches well before misinfo. Truth arrives too late! 123
  • 123. Sandwiching the Mitigation Objective [M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022] Observe: Supermodular behavior arises due to joint effect of mitigation seeds, i.e. acting alone they would not achieve the same reward. LB: Maximum reward over singleton seed sets from !" (tight). !" = {%&, %(} LB = *+, -∈{/0,/1} 2(%4, {5}) 12/13/22 CUHK-Shenzhen, China 124
  • 124. Sandwiching the Mitigation Objective [M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022] Simple Candidate: drop meeting events and enforce dominant tie-breaking. Tighter UB: remove meeting events on edges that can be traversed by both sides. !" = {%&, %(} 12/13/22 CUHK-Shenzhen, China 125
  • 125. Importance Sampling [M. Simpson, F. Hashemi, and L. Misinformation Mitigation under Differential Propagation Rates and Temporal Penalties. VLDB 2022] Observe: only nodes reached by the misinformation are eligible for reward. Idea: only sample roots from nodes that misinfo campaign reaches → tighter bounds! RDR sets: weighted analog to RR sets for reward probabilities 12/13/22 CUHK-Shenzhen, China 126
  • 126. Experiments [M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022] Two settings for selecting misinformation seeds: (1) from top-k influential users and (2) uniformly at random 12/13/22 CUHK-Shenzhen, China Small # popular instigators. Several bots or newly created puppet accounts. 127
  • 127. Experiments [M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022] Reward distribution dominated by uncontested mitigation adoption 12/13/22 CUHK-Shenzhen, China 128
  • 128. Experiments [M. Simpson, F. Hashemi, and L. Misinformation mitigation under differential propagation rates and temporal penalties. VLDB 2022] Mitigation seeds remain effective under simultaneous perturbation of model parameters. 12/13/22 CUHK-Shenzhen, China 129
  • 129. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation •Misinformation Intervention • Summary & Open Questions 12/13/22 CUHK-Shenzhen, China 130
  • 130. Intervention Challenges Detectors are fallible Hard vs Soft intervention 12/13/22 CUHK-Shenzhen, China 131
  • 131. Misinformation Intervention – Prior Art • Disadvantaging posts with misleading info, deleting edges, removing nodes, … à too hard? • No correction for wrong intervention! [Farajtabar, Mehrdad, et al. Fake news mitigation via point process based intervention. ICML 2017], [Tong et al. Gelling, and melting, large graphs by edge manipulation. CIKM 2012], [Khalil, Boutros, Dilkina, and Song. "Scalable diffusion-aware optimization of network topology KDD 2014], [Chen, Chen, et al. "Node immunization on large graphs: Theory and algorithms." TKDE 2015], [Medya,, Silva, and Singh. "Approximate Algorithms for Data-driven Influence Limitation." TKDE 2020], [Caraban et al. "23 ways to nudge: A review of technology-mediated nudging in human-computer interaction." SIGCHI 2019], [Caraban, Konstantinou, and Karapanos. "The Nudge Deck: A design support tool for technology-mediated nudging." ACM Designing Interactive Systems Conference. 2020], [Bhuiyan et al. "NudgeCred: Supporting News Credibility Assessment on Social Media Through Nudges." CSCW2 2021]. 12/13/22 CUHK-Shenzhen, China 132
  • 132. Cost Aware Intervention [Thirumuruganathan, Simpson, L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021] 12/13/22 CUHK-Shenzhen, China 133
  • 133. Reward Function !" #$% − reach of item '" after intervention. !" $()#$% − reach of item '" w/ no intervention. 12/13/22 CUHK-Shenzhen, China 134
  • 134. [Thirumuruganathan, Simpson, L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021] Cost Aware Intervention 12/13/22 CUHK-Shenzhen, China 135 dEFEND [Shu et al. KDD 2019]. Marked Hawke Process [Mishra et al. CIKM 2016].
  • 135. Experiments [Thirumuruganathan, Simpson, and L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021] NCB-TS: Neural Contextual Bandits w/ Thompson Sampling CB-TS: Contextual Bandits w/ Thompson Sampling RB: (Learned) Rule based CSC: Cost Sensitive Classification 12/13/22 CUHK-Shenzhen, China 136
  • 136. Experiments [Thirumuruganathan, Simpson, and L. To intervene or not to intervene: cost based intervention for combating fake news. SIGMOD 2021] Real-time Evaluation from Twitter’s stream during 10-Oct-2020 to 10-Nov-2020. • 5 million tweets w/ 1800 distinct English news articles • Topics include Politics (32%), Healthcare (26%), Entertainment (30%), Misc. (12%) Manual Evaluation • Random sample of 750 viral and non-viral tweets • 3 volunteers evaluated intervention • Accuracy of 92.1% Automated Evaluation • Google FactCheck Claim Search API • TiKL: That is a Known Lie • Accuracy of 96.6% 12/13/22 CUHK-Shenzhen, China 137
  • 137. • Filter Bubbles and Echo Chambers • Misinformation • Detecting Densest Subgraphs – Undirected • Detecting Densest Subgraphs – Directed • Combating via Mitigation: A Refresher on Influence Maximization • Mitigating Filter Bubbles • A User Utility Perspective • A Network Host Utility Perspective • Mitigating Misinformation • Misinformation Intervention •Summary & Open Questions 12/13/22 CUHK-Shenzhen, China 138
  • 138. Summary • Efficient detection of dense subgraphs in undirected and directed graphs is useful for finding filter bubbles and groups of actors engaged in spreading misinformation. • In mitigating filter bubbles via information campaigns, competition between viewpoints/opinions cannot be ignored. • In mitigating misinformation, it’s critical to incorporate temporal aspects. • In misinformation intervention, it’s important to watch your step and correct your gait in the face of mistakes. 12/13/22 CUHK-Shenzhen, China 139
  • 139. Open Questions – Detection • Integrating content analysis in going after the “right” densest subgraphs. • Can we detect filter bubbles and groups promoting misinformation as they form? • Longitudinal: (how) do these groups transform over time? 12/13/22 CUHK-Shenzhen, China 140
  • 140. Open Questions – Countering • Multiple campaigns of items involving partial/pure competition, complementation? • How can we learn propagation probabilities, competition parameters, utilities from available propagation traces? • Go beyond expected outcome? E.g., as filter bubbles or misinformation spreading occur, can we counter them? 12/13/22 CUHK-Shenzhen, China 141
  • 141. Open Questions -- • Case studies reflecting the effect of mitigation campaigns on filter bubbles and misinformation diffusion. • Integrating with claim verification and (computational) fact checking efforts. • Incentivizing balance of adoption (in case of filter bubbles) and adoption of truth (in case of misinformation). 12/13/22 CUHK-Shenzhen, China 142
  • 142. Acknowledgments 12/13/22 CUHK-Shenzhen, China Chenhao Ma Farnoosh Hashemi Glenn Bevilacqua Michael Simpson HKU UBC UBC->Oracle UBC Prithu Banerjee Reynold Cheng Saravanan Thirimuruganathan Xiaolin Han UBC ->Oracle HKU QCRI, HBKU HKU Xuemin Lin Wenjie Zhang Yixiang Fang Wei Chen Wei Lu UNSW UNSW CUHK MSRA UBC→LinkedIn 143