Can x2vec Save Lives?
Automatic Mental Health Classification in Online
Settings Using Graph and Language Embeddings
Alexander Ruch, MPH MA
PhD Student, Cornell University
amr442@cornell.edu ~ alexruch.weebly.com
The study was supported by the US NSF (1756822) and NIH (R25HD079352)
Background
● Analyzing massive multimodal graphs is complex and resource-intensive (RAM)
● ML approaches to SNA circumvent many of these issues via online learning
while retaining graphs’ relational attributes and clustering propensity
○ Many extend word2vec embedding architecture to preserve homophily,
structural equivalence, edge context(Goyal and Ferrara 2017, Mikolov et al. 2013)
● Online communities’ language dynamics (e.g., norms, conformity, innovation)
correlate with users’ interaction patterns and “life cycles” (DNM et al. 2013)
● Document embeddings effectively measure language similarity in documents,
classes of documents, and authors of documents (Le and Mikolov 2014)
● Few researchers, however, have tested how graph and document embeddings
may be combined to analyze behavior and language dynamics together over
massive networks of millions of nodes and edges (cf., Bail 2016)
Questions
● How well can graph embeddings predict where users post submissions?
● How well can document embeddings predict where users make posts?
● Does integrating graph and document embeddings generate the best
predictions of where users post submissions or are they better separate?
● How correlated are graph and document embeddings?
Goal
● Can these methods help us predict individuals at risk of suicide?
Overview
1
sampling,
processing
metapath2vec
graph embeddings
2
doc2vec document
embeddings
3
similarities,
prediction tasks
4
Sampling and Processing
Population:
490M submissions,
4.3T comments,
66M authors,
27M subreddits
Timeframe: June 2005 – June 2018
Datasource: https://files.pushshift.io/reddit/
Reddit Data Sample
● Main SW sample: 10M nodes with 45.3M edges (= largest component)
○ 6.6M observations collected starting from 5K SW author seeds
■ 700K submission authors (= 1% of Reddit authors)
■ 1.3M submissions
■ 1.6M comment authors
■ 6.5M comments
■ 21K subreddits
● Complement samples: 35.5M nodes with 190M edges (= largest component)
○ 6.6M observations from main sample
○ 7.0M observations from a subsample of mental health subreddits
○ 7.6M observations from a subsample of self-help subreddits
○ 5.8M observations randomly selected across all subreddits
Total-degree: x̄ (sd)
SW: 9.1 (0.63)
MH:9.0 (0.59)
SH: 9.0 (0.56)
R: 9.4 (0.64)
metapath2vec graph embedding
How close are authors and subreddits
over a network’s interaction space?
Dong et al. (2017) metapath2vec
Embedding multi-relational networks has unique challenges from their many types of nodes
and edges, which limits the feasibility of conventional network embedding techniques
metapath2vec uses metapath-based random walks to sample nodes’ heterogeneous
neighborhoods and embeds nodes using a heterogeneous skip-gram model (cf. word2vec)
metapath2vec++ enables simultaneous modeling of structural and contextual correlations
Both models outperform state-of-the-art embedding models
in many network mining tasks, including node classification,
similarity search, and clustering
Strong results are often achieved with very little data (5%)
MP2V Sampling
and Embedding
Sampling Graphs with Biased Random Walks
Random walks are computationally efficient in terms of space and time:
● Storing nodes’ immediate neighbors is O(|E|)
● Retrieving nodes’ neighbors is then O(|V |)
● Storing the interconnections between nodes’ neighbors is O(a2
|V |), where a
is the graph’s average degree and is usually small for real-world networks
Preprocessing transition probabilities makes walking from nodes O(1)
Writing walks’ real-time sampling results to disc instead of RAM saves memory
Since walks are independent, can parallelize sampler using multiprocessing to
greatly enhanced speed or to run multiple samplers over different metapaths
∴ metapath2vec is 8-times more efficient than SBM and requires much less RAM
Sampled nodes using metapath
subreddit → submission → author → submission → subreddit
● Goal: extract similarities between subreddits via the authors who post in them
Walked from each node 1000 times for a length of 100 steps
Embedded subreddit and author nodes appearing ≥5 times in 128 dimensions
using a neighborhood size of 7 and a negative sampling rate of 5
Result: embedding vectors for 1.8M subreddits and authors
≠ 10M due to minimum appearance thresholds and skipping submission/comment nodes
Total sampling and processing time < 1 day; mp2v model file size = 0.9 GB
MP2V Sampling and Embedding
Similarity between SuicideWatch and ...
Depression: 0.83 Advice: 0.74
Anxiety: 0.82 socialanxiety: 0.74
Mentalhealth: 0.75 selfharm: 0.73
AskDocs: 0.75 Needafriend: 0.72
MMFB: 0.74 StopSelfHarm:0.72
doc2vec document embedding
How similar is the language in authors’ submissions
to language that’s common in different subreddits?
Background: DBOW & DM doc2vec Models
Random subreddit submissions
x̄ (sd): DBOW = 0.45 (0.15); DM = 0.40 (0.09)
SuicideWatch submissions
x̄ (sd): DBOW = 0.77 (0.07); DM = 0.49 (0.06)
doc2vec Similarities: SW’s 15 nearest neighbors
DMM similarities to SuicideWatch:
depression, 0.99
depressed, 0.98
depression_help, 0.98
Suicide_help, 0.97
getting_over_it, 0.96
Prevent_Suicide, 0.96
mentalhealth, 0.96
sad, 0.96
MMFB, 0.95
SanctionedSuicide, 0.95
venting, 0.95
suicidenotes, 0.95
mentalillness, 0.95
ptsd, 0.94
BPD, 0.94
DBOW similarities to SuicideWatch:
depression, 0.96
MMFB, 0.93
depression_help, 0.92
whatsbotheringyou, 0.92
depressed, 0.92
Suicide_help, 0.91
sad, 0.91
suicidenotes, 0.91
offmychest, 0.91
SanctionedSuicide, 0.90
getting_over_it, 0.90
venting, 0.90
mentalhealth, 0.89
selfhelp, 0.89
Vent, 0.89
Similarities: doc2vec vs metapath2vec
Correlation of embedding distances to SW
MP2V DBOW DM D2V_x̄
MP2V 1.00 0.23 0.15 0.22
DBOW 0.23 1.00 0.59 0.93
DM 0.15 0.59 1.00 0.83
D2V_x̄ 0.22 0.93 0.83 1.00
Prediction Task:
Will an author post in SuicideWatch?
Model trained/tested with a 80/20 split (n=8610/2153)
= subsamples of the embedding data to balance training
● Training/test split of SW authors = 4060/1015
● Covariates = 128 MP2V embedding positions
Testing accuracy = 69%
Quite a few false-positives (25%) and false-negatives (38%)
Overall: not bad for only including unsupervised positional
data based on network connections
Logistic Regression: MP2V only
Logistic Regression: D2V only
Model trained/tested with a 80/20 split (n=8610/2153)
= subsamples of the embedding data to balance training
● Training/test split of SW authors = 4060/1015
● Covariates = DBOW and DM distances to SW
Testing accuracy = 76%
Fewer false-positives (21%); still many false-negatives (27%)
Overall: surprisingly good results for only two covariates
Logistic Regression: MP2V+D2V
Model trained/tested with a 3/97 split (n=9,865/283,638)
= subsampled training data to balance training
● Training/test split of SW authors = 4073/1002
● Covariates = 128 MP2V embedding positions + DBOW
and DMM distances to SW
Testing accuracy = 90%
Few false-positives (10%) and false-negatives (12%)
∴ graph & document embeddings work very well together
● Users’ behavior and language are both important,
especially for reducing false-positives/false-negatives
ŷ = SW author
ŷ = not SW author
Next Steps
● Better compare graph/document embeddings
○ Predict membership in other subreddits (e.g., depression, Anxiety, stopdrinking)
○ Determine where/when one type of embedding helps more than the other
● Reveal differences in membership among similar subreddits
○ For example, between alcoholicsanonymous, AlAnon, cripplingalcoholism, stopdrinking, addiction
● Use embeddings to predict the presentation of psychiatric attributes in posts
○ Use multi-label neural networks to predict suicidality, depression, anxiety, substance abuse, etc.
● Discover users’ “emotional arcs” before/after posting in SW (Reagan et al. 2016)
○ How do users’ paths to posting in SW differ, and how do paths leaving SW differ over time?
● Test social influence, social contagion, and other social dynamics
○ Use DeepInf to analyze and visualize neighbors’ social influence over time (Qiu et al. 2018)
Questions/Comments?
Special thanks to Drs. Jennifer Ruch, Michael Macy, David Mimno, Lillian Lee, Christopher Bail, and Thomas Gilovich for
feedback and support on parts of this project. Thanks as well to Seunghyun Kim, Lillyan Pan, Hannah Lee, Helen Sun,
James Zou, Gary Zhuge, Jeffrey Tsang, Bryan Min, Juliana Hong, Yejeong Choi, Cornell’s Social Dynamics Laboratory,
Cornell’s Computational Social Science Reading Group, Duke’s NAC, NSF, and NIH for assistance and funding support.
Citations
Bail (2016) “Combining natural language processing and network analysis…”
Danescu-Niculescu-Mizil et al. (2014) “No Country for Old Members”
Dong et al. (2018) “metapath2vec: Scalable Representation Learning for Heterogeneous Networks”
Goyal and Ferrara (2017) “Graph Embedding Techniques, Applications, and Performance: A Survey”
Le and Mikolov (2014) “Distributed Representations of Sentences and Documents”
Mikolov et al. (2013) “Distributed Representations of Words and Phrases and their Compositionality”
Pedregosa et al. (2011) “Scikit-learn: Machine Learning in Python”
Peixoto (2014) “The graph-tool python library” (graph-tool.skewed.de/)
Reagan et al. (2016) “The emotional arcs of stories are dominated by six basic shapes”
Qiu et al. (2018) “DeepInf: Social Influence Prediction with Deep Learning”
metapath2vec code
metapath2vec original code: https://ericdongyx.github.io/metapath2vec/m2v.html
● This repository contains Dong et al.’s scripts to sample and embed graphs
stellargraph: https://github.com/stellargraph/stellargraph
● Please note that stellargraph runs on networkx, which is extremely memory
inefficient and slow compared to graph-tool
● stellargraph works well with small/moderate graphs, but you should use Dong
et al.’s original code for large/massive graphs (especially for sampling)
Sampling Process
SuicideWatch (main sample)
24,281 (n>=20): get SW submission authors
20% sample → 4,948 (= distinct authors)
777,243 (= 94 subm/auth): get SW authors’ submissions
20% sample → 155,646 (= 31 subm/auth)
9,611,359 (= 1942 coms-subms/auth): get SW authors’ comments & com-subm info
20% sample → 1,415,357 (= 286 distinct coms-subms/auth)
447,579,856: get all comments to all submissions
2,109,393: get SW authors’ info
1% sample → 4,475,200: get non-SW authors info
6,584,593 (= 2,109,393 + 4,475,200): final sample count
Mental Health
7,035,904 (= 2,449,045 + 4,586,859): from 20 MH subreddit seeds
Selfhelp
7,576,231 (= 2,621,663 + 4,954,568): from 10 SH subreddit seeds
Random
5,880,113 (= 1,791,641 + 4,088,472): from a simple random sample of 5000 distinct user seeds
Metapath Examples and Complexity
# author to subr (via subm or comm)
["author", "submission", "subreddit", "submission", "author"], #subm to same subr
["author", "comment", "submission", "subreddit", "submission", "comment", "author"], #comm to same subr
# submission to submission (via subm or auth)
["submission", "subreddit", "submission"], #subm to same subr
["submission", "author", "submission"], #subm by same auth
# comment to comment (via subm or auth)
["comment", "submission", "comment"], #comm to same subm
["comment", "author", "comment"], #comm by same auth
# subreddit to subreddit
["subreddit", "submission", "author", "submission", "subreddit"], #subr by same auth via subm
["subreddit", "submission", "comment", "author", "comment", "submission", "subreddit"] #subr by same auth via comm
Estimated complexity:
SBM: O((Vln2
V + E) × MCMC_sweeps) → O(SMBv=10M,e=45M,s=15
) = 40B (1 sweep = 2.7B)
HSBM:O((Vln2
V + E×Blocks) × MCMC_sweeps) → O(HSBMv=10M,e=45M,b=20,s=15
) = 52B
MP2V: O((V_seeds × walks × walk_length) + ((V_sampled - mp2v_window) × mp2v_iter)) → O(MP2Vv=10M,e=45M,iter=15
) =
2B
∴ MP2V 8 times more computationally efficient
Same graph: SFDP vs MP2V
metapath2vec vs metapath2vec++ Results
Clinical Diagnostic Criteria: Risk Factor Keywords
00 Automatic Mental Health Classification in Online Settings and Language Embeddings

00 Automatic Mental Health Classification in Online Settings and Language Embeddings

  • 1.
    Can x2vec SaveLives? Automatic Mental Health Classification in Online Settings Using Graph and Language Embeddings Alexander Ruch, MPH MA PhD Student, Cornell University amr442@cornell.edu ~ alexruch.weebly.com The study was supported by the US NSF (1756822) and NIH (R25HD079352)
  • 2.
    Background ● Analyzing massivemultimodal graphs is complex and resource-intensive (RAM) ● ML approaches to SNA circumvent many of these issues via online learning while retaining graphs’ relational attributes and clustering propensity ○ Many extend word2vec embedding architecture to preserve homophily, structural equivalence, edge context(Goyal and Ferrara 2017, Mikolov et al. 2013) ● Online communities’ language dynamics (e.g., norms, conformity, innovation) correlate with users’ interaction patterns and “life cycles” (DNM et al. 2013) ● Document embeddings effectively measure language similarity in documents, classes of documents, and authors of documents (Le and Mikolov 2014) ● Few researchers, however, have tested how graph and document embeddings may be combined to analyze behavior and language dynamics together over massive networks of millions of nodes and edges (cf., Bail 2016)
  • 3.
    Questions ● How wellcan graph embeddings predict where users post submissions? ● How well can document embeddings predict where users make posts? ● Does integrating graph and document embeddings generate the best predictions of where users post submissions or are they better separate? ● How correlated are graph and document embeddings? Goal ● Can these methods help us predict individuals at risk of suicide?
  • 4.
  • 5.
  • 6.
    Population: 490M submissions, 4.3T comments, 66Mauthors, 27M subreddits Timeframe: June 2005 – June 2018 Datasource: https://files.pushshift.io/reddit/
  • 7.
    Reddit Data Sample ●Main SW sample: 10M nodes with 45.3M edges (= largest component) ○ 6.6M observations collected starting from 5K SW author seeds ■ 700K submission authors (= 1% of Reddit authors) ■ 1.3M submissions ■ 1.6M comment authors ■ 6.5M comments ■ 21K subreddits ● Complement samples: 35.5M nodes with 190M edges (= largest component) ○ 6.6M observations from main sample ○ 7.0M observations from a subsample of mental health subreddits ○ 7.6M observations from a subsample of self-help subreddits ○ 5.8M observations randomly selected across all subreddits
  • 8.
    Total-degree: x̄ (sd) SW:9.1 (0.63) MH:9.0 (0.59) SH: 9.0 (0.56) R: 9.4 (0.64)
  • 9.
    metapath2vec graph embedding Howclose are authors and subreddits over a network’s interaction space?
  • 10.
    Dong et al.(2017) metapath2vec Embedding multi-relational networks has unique challenges from their many types of nodes and edges, which limits the feasibility of conventional network embedding techniques metapath2vec uses metapath-based random walks to sample nodes’ heterogeneous neighborhoods and embeds nodes using a heterogeneous skip-gram model (cf. word2vec) metapath2vec++ enables simultaneous modeling of structural and contextual correlations Both models outperform state-of-the-art embedding models in many network mining tasks, including node classification, similarity search, and clustering Strong results are often achieved with very little data (5%)
  • 11.
  • 12.
    Sampling Graphs withBiased Random Walks Random walks are computationally efficient in terms of space and time: ● Storing nodes’ immediate neighbors is O(|E|) ● Retrieving nodes’ neighbors is then O(|V |) ● Storing the interconnections between nodes’ neighbors is O(a2 |V |), where a is the graph’s average degree and is usually small for real-world networks Preprocessing transition probabilities makes walking from nodes O(1) Writing walks’ real-time sampling results to disc instead of RAM saves memory Since walks are independent, can parallelize sampler using multiprocessing to greatly enhanced speed or to run multiple samplers over different metapaths ∴ metapath2vec is 8-times more efficient than SBM and requires much less RAM
  • 13.
    Sampled nodes usingmetapath subreddit → submission → author → submission → subreddit ● Goal: extract similarities between subreddits via the authors who post in them Walked from each node 1000 times for a length of 100 steps Embedded subreddit and author nodes appearing ≥5 times in 128 dimensions using a neighborhood size of 7 and a negative sampling rate of 5 Result: embedding vectors for 1.8M subreddits and authors ≠ 10M due to minimum appearance thresholds and skipping submission/comment nodes Total sampling and processing time < 1 day; mp2v model file size = 0.9 GB MP2V Sampling and Embedding
  • 14.
    Similarity between SuicideWatchand ... Depression: 0.83 Advice: 0.74 Anxiety: 0.82 socialanxiety: 0.74 Mentalhealth: 0.75 selfharm: 0.73 AskDocs: 0.75 Needafriend: 0.72 MMFB: 0.74 StopSelfHarm:0.72
  • 15.
    doc2vec document embedding Howsimilar is the language in authors’ submissions to language that’s common in different subreddits?
  • 16.
    Background: DBOW &DM doc2vec Models
  • 17.
    Random subreddit submissions x̄(sd): DBOW = 0.45 (0.15); DM = 0.40 (0.09) SuicideWatch submissions x̄ (sd): DBOW = 0.77 (0.07); DM = 0.49 (0.06)
  • 18.
    doc2vec Similarities: SW’s15 nearest neighbors DMM similarities to SuicideWatch: depression, 0.99 depressed, 0.98 depression_help, 0.98 Suicide_help, 0.97 getting_over_it, 0.96 Prevent_Suicide, 0.96 mentalhealth, 0.96 sad, 0.96 MMFB, 0.95 SanctionedSuicide, 0.95 venting, 0.95 suicidenotes, 0.95 mentalillness, 0.95 ptsd, 0.94 BPD, 0.94 DBOW similarities to SuicideWatch: depression, 0.96 MMFB, 0.93 depression_help, 0.92 whatsbotheringyou, 0.92 depressed, 0.92 Suicide_help, 0.91 sad, 0.91 suicidenotes, 0.91 offmychest, 0.91 SanctionedSuicide, 0.90 getting_over_it, 0.90 venting, 0.90 mentalhealth, 0.89 selfhelp, 0.89 Vent, 0.89
  • 19.
    Similarities: doc2vec vsmetapath2vec Correlation of embedding distances to SW MP2V DBOW DM D2V_x̄ MP2V 1.00 0.23 0.15 0.22 DBOW 0.23 1.00 0.59 0.93 DM 0.15 0.59 1.00 0.83 D2V_x̄ 0.22 0.93 0.83 1.00
  • 20.
    Prediction Task: Will anauthor post in SuicideWatch?
  • 21.
    Model trained/tested witha 80/20 split (n=8610/2153) = subsamples of the embedding data to balance training ● Training/test split of SW authors = 4060/1015 ● Covariates = 128 MP2V embedding positions Testing accuracy = 69% Quite a few false-positives (25%) and false-negatives (38%) Overall: not bad for only including unsupervised positional data based on network connections Logistic Regression: MP2V only
  • 22.
    Logistic Regression: D2Vonly Model trained/tested with a 80/20 split (n=8610/2153) = subsamples of the embedding data to balance training ● Training/test split of SW authors = 4060/1015 ● Covariates = DBOW and DM distances to SW Testing accuracy = 76% Fewer false-positives (21%); still many false-negatives (27%) Overall: surprisingly good results for only two covariates
  • 23.
    Logistic Regression: MP2V+D2V Modeltrained/tested with a 3/97 split (n=9,865/283,638) = subsampled training data to balance training ● Training/test split of SW authors = 4073/1002 ● Covariates = 128 MP2V embedding positions + DBOW and DMM distances to SW Testing accuracy = 90% Few false-positives (10%) and false-negatives (12%) ∴ graph & document embeddings work very well together ● Users’ behavior and language are both important, especially for reducing false-positives/false-negatives
  • 24.
    ŷ = SWauthor ŷ = not SW author
  • 25.
    Next Steps ● Bettercompare graph/document embeddings ○ Predict membership in other subreddits (e.g., depression, Anxiety, stopdrinking) ○ Determine where/when one type of embedding helps more than the other ● Reveal differences in membership among similar subreddits ○ For example, between alcoholicsanonymous, AlAnon, cripplingalcoholism, stopdrinking, addiction ● Use embeddings to predict the presentation of psychiatric attributes in posts ○ Use multi-label neural networks to predict suicidality, depression, anxiety, substance abuse, etc. ● Discover users’ “emotional arcs” before/after posting in SW (Reagan et al. 2016) ○ How do users’ paths to posting in SW differ, and how do paths leaving SW differ over time? ● Test social influence, social contagion, and other social dynamics ○ Use DeepInf to analyze and visualize neighbors’ social influence over time (Qiu et al. 2018)
  • 26.
    Questions/Comments? Special thanks toDrs. Jennifer Ruch, Michael Macy, David Mimno, Lillian Lee, Christopher Bail, and Thomas Gilovich for feedback and support on parts of this project. Thanks as well to Seunghyun Kim, Lillyan Pan, Hannah Lee, Helen Sun, James Zou, Gary Zhuge, Jeffrey Tsang, Bryan Min, Juliana Hong, Yejeong Choi, Cornell’s Social Dynamics Laboratory, Cornell’s Computational Social Science Reading Group, Duke’s NAC, NSF, and NIH for assistance and funding support.
  • 27.
    Citations Bail (2016) “Combiningnatural language processing and network analysis…” Danescu-Niculescu-Mizil et al. (2014) “No Country for Old Members” Dong et al. (2018) “metapath2vec: Scalable Representation Learning for Heterogeneous Networks” Goyal and Ferrara (2017) “Graph Embedding Techniques, Applications, and Performance: A Survey” Le and Mikolov (2014) “Distributed Representations of Sentences and Documents” Mikolov et al. (2013) “Distributed Representations of Words and Phrases and their Compositionality” Pedregosa et al. (2011) “Scikit-learn: Machine Learning in Python” Peixoto (2014) “The graph-tool python library” (graph-tool.skewed.de/) Reagan et al. (2016) “The emotional arcs of stories are dominated by six basic shapes” Qiu et al. (2018) “DeepInf: Social Influence Prediction with Deep Learning”
  • 28.
    metapath2vec code metapath2vec originalcode: https://ericdongyx.github.io/metapath2vec/m2v.html ● This repository contains Dong et al.’s scripts to sample and embed graphs stellargraph: https://github.com/stellargraph/stellargraph ● Please note that stellargraph runs on networkx, which is extremely memory inefficient and slow compared to graph-tool ● stellargraph works well with small/moderate graphs, but you should use Dong et al.’s original code for large/massive graphs (especially for sampling)
  • 29.
    Sampling Process SuicideWatch (mainsample) 24,281 (n>=20): get SW submission authors 20% sample → 4,948 (= distinct authors) 777,243 (= 94 subm/auth): get SW authors’ submissions 20% sample → 155,646 (= 31 subm/auth) 9,611,359 (= 1942 coms-subms/auth): get SW authors’ comments & com-subm info 20% sample → 1,415,357 (= 286 distinct coms-subms/auth) 447,579,856: get all comments to all submissions 2,109,393: get SW authors’ info 1% sample → 4,475,200: get non-SW authors info 6,584,593 (= 2,109,393 + 4,475,200): final sample count Mental Health 7,035,904 (= 2,449,045 + 4,586,859): from 20 MH subreddit seeds Selfhelp 7,576,231 (= 2,621,663 + 4,954,568): from 10 SH subreddit seeds Random 5,880,113 (= 1,791,641 + 4,088,472): from a simple random sample of 5000 distinct user seeds
  • 30.
    Metapath Examples andComplexity # author to subr (via subm or comm) ["author", "submission", "subreddit", "submission", "author"], #subm to same subr ["author", "comment", "submission", "subreddit", "submission", "comment", "author"], #comm to same subr # submission to submission (via subm or auth) ["submission", "subreddit", "submission"], #subm to same subr ["submission", "author", "submission"], #subm by same auth # comment to comment (via subm or auth) ["comment", "submission", "comment"], #comm to same subm ["comment", "author", "comment"], #comm by same auth # subreddit to subreddit ["subreddit", "submission", "author", "submission", "subreddit"], #subr by same auth via subm ["subreddit", "submission", "comment", "author", "comment", "submission", "subreddit"] #subr by same auth via comm Estimated complexity: SBM: O((Vln2 V + E) × MCMC_sweeps) → O(SMBv=10M,e=45M,s=15 ) = 40B (1 sweep = 2.7B) HSBM:O((Vln2 V + E×Blocks) × MCMC_sweeps) → O(HSBMv=10M,e=45M,b=20,s=15 ) = 52B MP2V: O((V_seeds × walks × walk_length) + ((V_sampled - mp2v_window) × mp2v_iter)) → O(MP2Vv=10M,e=45M,iter=15 ) = 2B ∴ MP2V 8 times more computationally efficient
  • 33.
  • 34.
  • 36.
    Clinical Diagnostic Criteria:Risk Factor Keywords