SlideShare a Scribd company logo
Detecting HIV at-risk MSM in San Diego through
Social Networks
Digital Epidemiology
Current Relevance of HIV
● 33.4 million cases.
● Second growth phase of
HIV already been
reported in some of the
countries.
● Need to intensify HIV
prevention efforts - this
is difficult.
How can technology help?
● Philosophical question : Can social networks help in identifying users with
high risk of HIV infection?
● Goal of project: Characterize HIV vulnerable populations by extracting
user sentiments from social networks like Twitter.
History & Related work
● Epidemiology - Hippocrates, 400 B.C. -> Digital Epidemiology - Marcel
Salathe et. al. 2012.
● Unraveling Abstinence and Relapse: Smoking Cessation Reflected in
Social Media - Dr. Elizabeth Murnane, CHI 2014.
● Methods of using real-time social media technologies for detection and
remote monitoring of HIV outcomes - Sean D. Young et. al., Elsevier
Preventive Medicine, 2014.
Data source
● 210 notable social networks - 43 things to Zooppa.
● Twitter was chosen because of the results published in earlier studies.
● Programmatic access to tweets using Streaming API.
○ Sample Hose (~4200 tweets/min)
○ Filter Hose (~40 tweets/min)
○ Fire Hose (~420000 tweets/min)
Data collection
● Streaming API
● MongoDB
○ Tweets
○ HIV Corpus
○ HIV Corpus cleaned
○ Related tweets/users
● Neo4j
Data classification & cleaning
● Classification
○ Filter tweets based on a pre-defined set of HIV risk words.
○ Five Risk Buckets : Drug, SexVenues, STI, Sex, Homosexual.
● Cleaning
○ Keep or discard tweets based on co-occurring words.
○ Manually scavenged through classified tweets to create lists with Dr.
Nella Green’s help.
○ Exception and Inclusion lists for every HIV risk word.
Why Graph DB?
● Twitter’s deeply associative data can be easily modeled.
● Most use cases correspond to analyzing sub-structures and
connectedness. Queries on a graph are much faster than join bombs in
relational data models.
● We use Neo4j - mature and scalable native graph store with good
support.
Property graph Data Model
Nodes
1. USER
2. TWEET
3. HASHTAG
4. URL
5. FOLLOWER_USER
6. ONTOLOGY_BUCKET
7. ONTOLOGY_INSTANCE
Edges
1. FOLLOWS
2. TWEETED
3. MENTIONED_IN
4. IS_REPLY_FOR
5. RETWEET_FOR
6. HAS_HASHTAG
7. HAS_URL
8. HAS_RISK_WORD
9. INSTANCE_OF
Property graph Data Model
Nodes
1. USER
2. TWEET
3. HASHTAG
4. URL
5. FOLLOWER_USER
6. ONTOLOGY_BUCKET
7. ONTOLOGY_INSTANCE
Edges
1. FOLLOWS
2. TWEETED
3. MENTIONED_IN
4. IS_REPLY_FOR
5. RETWEET_FOR
6. HAS_HASHTAG
7. HAS_URL
8. HAS_RISK_WORD
9. INSTANCE_OF
Property graph Data Model
Nodes
1. USER
2. TWEET
3. HASHTAG
4. URL
5. FOLLOWER_USER
6. ONTOLOGY_BUCKET
7. ONTOLOGY_INSTANCE
Edges
1. FOLLOWS
2. TWEETED
3. MENTIONED_IN
4. IS_REPLY_FOR
5. RETWEET_FOR
6. HAS_HASTAG
7. HAS_URL
8. HAS_RISK_WORD
9. INSTANCE_OF
Property graph Data Model
Nodes
1. USER
2. TWEET
3. HASHTAG
4. URL
5. FOLLOWER_USER
6. ONTOLOGY_BUCKET
7. ONTOLOGY_INSTANCE
Edges
1. FOLLOWS
2. TWEETED
3. MENTIONED_IN
4. IS_REPLY_FOR
5. RETWEET_FOR
6. HAS_HASTAG
7. HAS_URL
8. HAS_RISK_WORD
9. INSTANCE_OF
Property graph Data Model
Nodes
1. USER
2. TWEET
3. HASHTAG
4. URL
5. FOLLOWER_USER
6. ONTOLOGY_BUCKET
7. ONTOLOGY_INSTANCE
Edges
1. FOLLOWS
2. TWEETED
3. MENTIONED_IN
4. IS_REPLY_FOR
5. RETWEET_FOR
6. HAS_HASTAG
7. HAS_URL
8. HAS_RISK_WORD
9. INSTANCE_OF
Property graph Data Model
Nodes
1. USER
2. TWEET
3. HASHTAG
4. URL
5. FOLLOWER_USER
6. ONTOLOGY_BUCKET
7. ONTOLOGY_INSTANCE
Edges
1. FOLLOWS
2. TWEETED
3. MENTIONED_IN
4. IS_REPLY_FOR
5. RETWEET_FOR
6. HAS_HASTAG
7. HAS_URL
8. HAS_RISK_WORD
9. INSTANCE_OF
Property graph Data Model
Nodes
1. USER
2. TWEET
3. HASHTAG
4. URL
5. FOLLOWER_USER
6. ONTOLOGY_BUCKET
7. ONTOLOGY_INSTANCE
Edges
1. FOLLOWS
2. TWEETED
3. MENTIONED_IN
4. IS_REPLY_FOR
5. RETWEET_FOR
6. HAS_HASHTAG
7. HAS_URL
8. HAS_RISK_WORD
9. INSTANCE_OF
Property graph Data Model
Nodes
1. USER
2. TWEET
3. HASHTAG
4. URL
5. FOLLOWER_USER
6. ONTOLOGY_BUCKET
7. ONTOLOGY_INSTANCE
Edges
1. FOLLOWS
2. TWEETED
3. MENTIONED_IN
4. IS_REPLY_FOR
5. RETWEET_FOR
6. HAS_HASTAG
7. HAS_URL
8. HAS_RISK_WORD
9. INSTANCE_OF
Property graph Data Model
Nodes
1. USER
2. TWEET
3. HASHTAG
4. URL
5. FOLLOWER_USER
6. ONTOLOGY_BUCKET
7. ONTOLOGY_INSTANCE
Edges
1. FOLLOWS
2. TWEETED
3. MENTIONED_IN
4. IS_REPLY_FOR
5. RETWEET_FOR
6. HAS_HASTAG
7. HAS_URL
8. HAS_RISK_WORD
9. INSTANCE_OF
Property graph Data Model
Nodes
1. USER
2. TWEET
3. HASHTAG
4. URL
5. FOLLOWER_USER
6. ONTOLOGY_BUCKET
7. ONTOLOGY_INSTANCE
Edges
1. FOLLOWS
2. TWEETED
3. MENTIONED_IN
4. IS_REPLY_FOR
5. RETWEET_FOR
6. HAS_HASTAG
7. HAS_URL
8. HAS_RISK_WORD
9. INSTANCE_OF
Migration from mongoDB to Neo4j
● Using python - Py2neo library.
● Modular scripts
And.. this is what we got!
ONTOLOGY_
BUCKET {id:
“DrugBucket”}
ONTOLOGY
_INSTANCE
{id:“Meth”}
ONTOLOGY
_INSTANCE
{id:“Coke”}
USER {name:”
Bob”}
USER {name:”
Alice”}
TWEET {text:”
Hello World! I
like meth!
#drugs http://t.
co/ran1”}
FOLLOWER_
USER {name:”
Eve”}
FOLLOWER_
USER {name:”
Fred”}
TWEET {text:”
@Alice:Want
some coke? I
am at the loft”}
ONTOLOGY
_BUCKET
{id:
“SexVenues”
}
ONTOLOGY
_INSTANCE
{id:“The Loft”}
HASHTAG
{name:”drugs”}
URL {name:”
http://t.co/ran1”}
INSTANCE_OF
HAS_RISK_WORD
TWEETED
HAS_URL
HAS_HASHTAG
FOLLOWS
RETWEET_FOR
IS_REPLY_FOR
MENTIONED_IN
Data Model..
ONTOLOGY_
BUCKET {id:
“DrugBucket
”}
ONTOLOGY
_INSTANCE
{id:“Meth”}
ONTOLOGY
_INSTANCE
{id:“Coke”}
USER
{name:”
Bob”}
USER
{name:”
Alice”}
TWEET {text:”
Hello World! I like
meth! #drugs http:
//t.co/ran1”}
FOLLOWER_
USER
{name:”
Eve”}
FOLLOWER_
USER
{name:”
Fred”}
TWEET {text:”
@Alice:Want
some coke? I
am at the loft”}
ONTOLOGY_
BUCKET {id:
“SexVenues”
}
ONTOLOGY
_INSTANCE
{id:“The
Loft”}
HASHTAG
{name:”drugs”}
URL {name:”
http://t.
co/ran1”}
INSTANCE_OF
HAS_RISK_WORD
TWEETED
HAS_URL
HAS_HASHTAG
FOLLOWS
TWEET
TWEET
RETWEET_FOR
IS_REPLY_FOR
MENTIONED_IN
Current Results
Conversations among users..
“How many conversations are happening among the drug bucket users alone ,
sex bucket users alone and across drug bucket users and sex bucket users?”
MATCH p=(
(n:ONTOLOGY_BUCKET{id: 'DrugBucket'})-[r]-(m:ONTOLOGY_INSTANCE)
-[r1]-
(t:TWEET)<-[r2:IS_REPLY_FOR*2..]-(t1:TWEET))
where not
(t)-[:`IS_REPLY_FOR`]->(:`TWEET`)
RETURN count(DISTINCT t)
Queries
Output:
8 (1692 ms)
Conversations among users..
“How many conversations are happening among the drug bucket users alone ,
sex bucket users alone and across drug bucket users and sex bucket users?”
MATCH p=((n:ONTOLOGY_BUCKET)-[r]-(m:ONTOLOGY_INSTANCE)
-[r1]-
(t:TWEET)<-[r2:IS_REPLY_FOR*2..]-(t1:TWEET))
where n.id in ["HomosexualTermsBucket","STIBucket","SexBucket","
SexVenues"]
and not (t)-[:`IS_REPLY_FOR`]->(:`TWEET`)
RETURN count(DISTINCT t);
Queries
Output:
20 (2350 ms)
Conversations among users..
“How many conversations are happening among the drug bucket users alone ,
sex bucket users alone and across drug bucket users and sex bucket users?”
MATCH p1=((n:ONTOLOGY_BUCKET)-[r]-(m:ONTOLOGY_INSTANCE)
-[r1]-
(t:TWEET)<-[r2:IS_REPLY_FOR*2..]-(t1:TWEET)
-[r3]-
(o:ONTOLOGY_INSTANCE)-[r4]-(p:ONTOLOGY_BUCKET {id: 'DrugBucket'}))
where n.id in ["HomosexualTermsBucket","STIBucket","SexBucket","
SexVenues"]
and not (t)-[:`IS_REPLY_FOR`]->(:`TWEET`)
RETURN count(DISTINCT t);
Queries
Output:
2 (207952 ms)
Conversations among users..
“How many conversations are happening among the drug bucket users alone ,
sex bucket users alone and across drug bucket users and sex bucket users?”
MATCH p1=((n:ONTOLOGY_BUCKET {id: 'DrugBucket'})-[r]-(m:ONTOLOGY_INSTANCE)
-[r1]-
(t:TWEET)<-[r2:IS_REPLY_FOR*2..]-(t1:TWEET)
-[r3]-
(o:ONTOLOGY_INSTANCE)-[r4]-(p:ONTOLOGY_BUCKET))
where p.id in ["HomosexualTermsBucket","STIBucket","SexBucket","
SexVenues"]
and not (t)-[:`IS_REPLY_FOR`]->(:`TWEET`)
RETURN count(DISTINCT t);
Queries
Output:
1 (234202 ms)
Finding most referred users..
“List users in the descending order of referral counts”
MATCH p=((u:USER)-[r:MENTIONED_IN]->() )
RETURN u.name,count(p) AS num_mentions
ORDER BY num_mentions DESC limit 5;
Queries
Output:
+--------------------------------------+
| u.name | num_mentions |
+--------------------------------------+
| "cc7764343d" | 261 |
| "972b1707f7" | 256 |
| "9be7e77265" | 235 |
| "8dc5aaf21a" | 232 |
| "e1095646aa" | 220 |
+--------------------------------------+
(172 ms)
Finding most referred users..
“List users in the descending order of referral counts”
MATCH p=((u:USER)-[r:MENTIONED_IN]->(t) )
where not (t)-[:`IS_REPLY_FOR`]->(:`TWEET`)
RETURN u.name,count(p) AS num_mentions
ORDER BY num_mentions DESC limit 5;
Queries
Output:
+----------------------------------+
| u.name | num_mentions |
+----------------------------------+
| "00f4edeac2" | 28 |
| "8987f033aa" | 16 |
| "e6e67c5cef" | 10 |
| "fdf2ce82fd" | 6 |
| "86609dbd6e" | 5 |
+----------------------------------+
(198 ms)
Forbidden substructure
Topics of interest around a hub..
“What are the main topics in the discussions among people who are at a one-
hop following distance from their sub-graph’s hubs.”
MATCH (n:USER)<-[r:FOLLOWS*1..]-(m)
OPTIONAL MATCH (m)-[r1:TWEETED]->( t:TWEET)-[o]->(p:ONTOLOGY_INSTANCE)-[q]-
>(s:ONTOLOGY_BUCKET {id:” DrugBucket”})
WITH COUNT(t) as count, n as hub
WHERE count >= 2
MATCH (o:ONTOLOGY_BUCKET)<-[r2*2..2]-(t1:TWEET)
<-[TWEETED]-(neighbour:USER)-[r3:FOLLOWS]-hub
return o.id, hub.name, count(t1)
ORDER BY count(t1) DESC limit 5
Queries
Output:
+-------------------------------------+
| o.id | hub.name | count(t1) |
+-------------------------------------+
| "SexBucket" | "b4f30295f9" | 1 |
| "DrugBucket" | "b4f30295f9" | 1 |
+-------------------------------------+
(589 ms)
Two most consulted drug users..
“The real world data tells us that lots of homosexual (MSM) people consume
drugs or psycho-stimulants. Identify two drug bucket users who are most
consulted by homosexual people on Twitter”
MATCH (o:ONTOLOGY_BUCKET {id:"DrugBucket"})
<-[ri1:INSTANCE_OF]-(oi1:ONTOLOGY_INSTANCE)
<-[rhr1:HAS_RISK_WORD]-(t1:TWEET)
<-[rt1:TWEETED]-(drug:USER)-[MENTIONED_IN]->
(t:TWEET)<-[rt2:TWEETED]-(homosex:USER)
-[rt3:TWEETED]->(t2:TWEET)-[rhr2:HAS_RISK_WORD]
->(oi2:ONTOLOGY_INSTANCE)-[ri2:INSTANCE_OF]
->(o1:ONTOLOGY_BUCKET {id:"HomosexualTermsBucket"})
RETURN drug.name, count(DISTINCT t)
ORDER BY count(DISTINCT t) DESC
LIMIT 2
Queries
Output:
+------------------------------------------+
| drug.name | count(DISTINCT t) |
+------------------------------------------+
| "748d9dc913" | 26 |
| "5a74f759b8" | 13 |
+------------------------------------------+
(13825 ms)
Proximity of drug bucket users..
“How close are drug bucket users to other homosexual bucket users in terms
of proximity in the social graph?”
MATCH p =
(o1:ONTOLOGY_BUCKET {id:” HomosexualTermsBucket ”})<-[ri1:INSTANCE_OF]-(oi1:
ONTOLOGY_INSTANCE)<-[rrw1:HAS_RISK_WORD]-(t1:TWEET)
<-[rt1:TWEETED]- (u1:USER)-[r:FOLLOWS*1..3]->(u2:USER) -[rt2:TWEETED]->(t2:
TWEET)-[rrw2:HAS_RISK_WORD]->(oi2:ONTOLOGY_INSTANCE)-[ri2:INSTANCE_OF]->
(o2:ONTOLOGY_BUCKET {id:” DrugBucket”})
return u1.name,length(p), count(u2)
ORDER BY length(p)
Queries
Output:
+-------------------------------------+
| u1.name | length(p) | count(u2) |
+-------------------------------------+
| "1b0056b07a"| 7 | 4 |
| "0c384be19a"| 7 | 2 |
+-------------------------------------+
(260 ms)
Graph substructures
Get me all the social subgraphs
which have central nodes
Shortest paths vs. diameter between users
● Finding user-connected components
○ Perform BFS traversal and add a property ‘subgraph’ for each node
○ Forbidden substructure - Users can be connected via ontology
buckets or ontology instances
● Neo4j Java Traversal Framework API Code Snippet
Traverser traverser = db.traversalDescription()
.breadthFirst()
.relationships(RelTypes.TWEETED)
.relationships(RelTypes.FOLLOWS)
.relationships(RelTypes.IS_REPLY_FOR)
.relationships(RelTypes.MENTIONED_IN)
.evaluator(Evaluators.excludeStartPosition())
.uniqueness(Uniqueness.NODE_GLOBAL).traverse(n);
Eliminate forbidden
substructure
Queries
Shortest paths vs. diameter between users
Find average shortest path between any 2 users in a connected component
and compare it to diameter of the connected component
match (n:USER) WITH n.subgraph as subGraphNum, count(n) as c
WHERE c >= 7
WITH collect(subGraphNum) as collectionSG
MATCH p=shortestPath((s:USER)-[:FOLLOWS|MENTIONED_IN|TWEETED|IS_REPLY_FOR*..]-(d:USER))
WHERE s.subgraph=d.subgraph and s.subgraph in collectionSG and length(p)>1
RETURN s.subgraph, sum(length(p))/count(p), max(length(p)), ((sum(length(p))/count(p))*1.0)
/max(length(p))
ORDER BY ((sum(length(p))/count(p))*1.0)/max(length(p)) DESC
Queries
Shortest paths vs. diameter between users
Output:
+--------------------------------------------------------------------------------------------------------
+
| s.subgraph | sum(length(p))/count(p) | max(length(p)) | ((sum(length(p))/count(p))*1.0)/max(length(p))
|
+--------------------------------------------------------------------------------------------------------
+
| 2431 | 2 | 2 | 1.0 |
| 23 | 3 | 4 | 0.75 |
| 6024 | 3 | 4 | 0.75 |
| 671 | 3 | 4 | 0.75 |
| 1737 | 3 | 4 | 0.75 |
| 1264 | 3 | 4 | 0.75 |
| 2136 | 3 | 4 | 0.75 |
| 1742 | 3 | 4 | 0.75 |
| 7152 | 3 | 4 | 0.75 |
| 5650 | 3 | 4 | 0.75 |
| 1 | 9 | 15 | 0.6 |
| 4195 | 3 | 5 | 0.6 |
| 8038 | 2 | 6 | 0.3333333333333333 |
+--------------------------------------------------------------------------------------------------------
+
Some interesting substructures!
Even more interesting substructures!
Verified accounts
Getting geo-local sentiments
What are the people hanging around Sex Venues
talking about?
Commonly discussed topics around Sex
Venues
● Some tweets are geotagged
○ Neo4j Spatial plugin to create spatial index on tweets
● Find tweets tweeted near a specific Sex Venue
○ Perform a withinDistance query for the coordinates of the sex venue
● Are these tweets talking about specific topics?
○ Topic Modeling -LDA (Gensim) on tweets
Queries
Commonly discussed topics around Sex
Venues
Find what topic HIV risk users are talking about the most, around a particular
Sex Venue.
REST API Code Snippet
headers = {'content-type': 'application/json'}
url = "http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/findGeometriesWithinDistance"
payload = {
"layer" : "geom",
"pointX" : -117.161324,
"pointY" : 32.710671,
"distanceInKm" : 2
}
r = requests.post(url, data=json.dumps(payload), headers=headers)
● Cleaning tweets - remove mentions, urls
● Stop word list - Stanford NLTK library
● Gensim - Corpora & lda libraries
● Free parameters
○ Number of topics - 2,3,4
○ Distance Radius for ‘withinDistance’ query - 2,5,10 kms
LDA on Tweets found around Sex Venues
Queries
LDA on Colocated Tweets - Results
Topic #1 gay, san, diego, queen, flicks, glass, amp, dont, coke, get
Topic #2 gay, san, diego, ca, glass, amp, cheers, flicks, bourbon, happy
Drug & Homosexual
bucket
coke
glass
dope
gay
queen
…
...
Sex Venues Bucket
groovy laid back amp nasty
cheers
flicks
bourbon street
club san diego
pecs
…
...
Interesting patterns..
“What is the longest conversation thread among any set of users?”
MATCH p = (n:TWEET)<-[r:IS_REPLY_FOR*]-(m:TWEET) RETURN p order by length
(p) desc limit 1
205 nodes
(157179 ms)
Queries
Result - Longest Conversation
Challenges
● Data collection
○ Sampled data (1%)
○ Twitter APIs call rate limit per user - 15 calls/15 mins.
○ Collecting users who have favorited a tweet.
○ Extracting conversations/retweet chains associated with a tweet.
● Data classification and cleaning
○ Working with microblogs.
○ Iterative process.
● Restricted visualization for Neo4j
○ Hard to decipher patterns in graph.
Future
● More representative dataset - Firehose API
● Innovative Data Visualizations to visualize evolving graphs
● Machine Learning for better HIV risk tweets classification.
○ Mechanical Turk for labeling
○ Logistic Regression for classification
● SD Primary Infection Cohort - Overlaying real-world HIV infection graph
on top of an enriched social network
Conclusion
● Structured approach to model social networks and derive insights from
networks like Twitter. Best practices in collecting and managing Twitter
data for social networks analysis.
● Current results - Graph queries to derive intuitions on factors that
influence HIV risk behaviour.
● Vision for the future.
Thanks!

More Related Content

What's hot

Mapping Tweets to Conference Talks: A Goldmine for Semantics
Mapping Tweets to Conference Talks: A Goldmine for SemanticsMapping Tweets to Conference Talks: A Goldmine for Semantics
Mapping Tweets to Conference Talks: A Goldmine for SemanticsMilan Stankovic
 
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th PlenarysmartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
Mark Wilkinson
 
Conducting Twitter Reserch
Conducting Twitter ReserchConducting Twitter Reserch
Conducting Twitter Reserch
Kim Holmberg
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologEleanor Howe
 
Independent Study_Final Report
Independent Study_Final ReportIndependent Study_Final Report
Independent Study_Final ReportShikha Swami
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Deepak K
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
Michel Dumontier
 
CrossRef Technical Information for Libraries
CrossRef Technical Information for LibrariesCrossRef Technical Information for Libraries
CrossRef Technical Information for Libraries
Crossref
 
Data Science for the Win
Data Science for the WinData Science for the Win
Data Science for the Win
Michel Dumontier
 
Mis 510 cyber analytics project report
Mis 510 cyber analytics project report Mis 510 cyber analytics project report
Mis 510 cyber analytics project report Aadil Hussaini
 
Scholarly Communication for Bioinformatics Students
Scholarly Communication for Bioinformatics StudentsScholarly Communication for Bioinformatics Students
Scholarly Communication for Bioinformatics Students
Philip Bourne
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
Ross Mounce
 
Dt35682686
Dt35682686Dt35682686
Dt35682686
IJERA Editor
 
NLP Structured Data Investigation on Non-Text
NLP Structured Data Investigation on Non-TextNLP Structured Data Investigation on Non-Text
NLP Structured Data Investigation on Non-Text
DataWorks Summit/Hadoop Summit
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
c.titus.brown
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
Mark Wilkinson
 
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Mark Wilkinson
 
The SFX Framework for Context-Sensitive Reference Linking
The SFX Framework for  Context-Sensitive Reference LinkingThe SFX Framework for  Context-Sensitive Reference Linking
The SFX Framework for Context-Sensitive Reference LinkingHerbert Van de Sompel
 
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
IRJET Journal
 

What's hot (20)

Mapping Tweets to Conference Talks: A Goldmine for Semantics
Mapping Tweets to Conference Talks: A Goldmine for SemanticsMapping Tweets to Conference Talks: A Goldmine for Semantics
Mapping Tweets to Conference Talks: A Goldmine for Semantics
 
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th PlenarysmartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
 
B.3.5
B.3.5B.3.5
B.3.5
 
Conducting Twitter Reserch
Conducting Twitter ReserchConducting Twitter Reserch
Conducting Twitter Reserch
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
 
Independent Study_Final Report
Independent Study_Final ReportIndependent Study_Final Report
Independent Study_Final Report
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
CrossRef Technical Information for Libraries
CrossRef Technical Information for LibrariesCrossRef Technical Information for Libraries
CrossRef Technical Information for Libraries
 
Data Science for the Win
Data Science for the WinData Science for the Win
Data Science for the Win
 
Mis 510 cyber analytics project report
Mis 510 cyber analytics project report Mis 510 cyber analytics project report
Mis 510 cyber analytics project report
 
Scholarly Communication for Bioinformatics Students
Scholarly Communication for Bioinformatics StudentsScholarly Communication for Bioinformatics Students
Scholarly Communication for Bioinformatics Students
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
 
Dt35682686
Dt35682686Dt35682686
Dt35682686
 
NLP Structured Data Investigation on Non-Text
NLP Structured Data Investigation on Non-TextNLP Structured Data Investigation on Non-Text
NLP Structured Data Investigation on Non-Text
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
 
The SFX Framework for Context-Sensitive Reference Linking
The SFX Framework for  Context-Sensitive Reference LinkingThe SFX Framework for  Context-Sensitive Reference Linking
The SFX Framework for Context-Sensitive Reference Linking
 
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
A Paper on Web Data Segmentation for Terrorism Detection using Named Entity R...
 

Similar to Social Networks analysis to characterize HIV at-risk populations - Progress and Status.

Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Werner Leyh
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
ManishReddy706923
 
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF DatasetsBOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
Kemele M. Endris
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
Paolo Missier
 
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
National Information Standards Organization (NISO)
 
IRJET - Suicidal Text Detection using Machine Learning
IRJET -  	  Suicidal Text Detection using Machine LearningIRJET -  	  Suicidal Text Detection using Machine Learning
IRJET - Suicidal Text Detection using Machine Learning
IRJET Journal
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
ijtsrd
 
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
Detection and Analysis of Twitter Trending Topics via Link-Anomaly DetectionDetection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
IJERA Editor
 
Data science unit3
Data science unit3Data science unit3
Data science unit3
varshakumar21
 
Modern Association Rule Mining Methods
Modern Association Rule Mining MethodsModern Association Rule Mining Methods
Modern Association Rule Mining Methods
ijcsity
 
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
GUANGYUAN PIAO
 
F017433947
F017433947F017433947
F017433947
IOSR Journals
 
Nov1 webinar intro_slides v
Nov1 webinar intro_slides vNov1 webinar intro_slides v
Nov1 webinar intro_slides v
SC CTSI at USC and CHLA
 
Prediction of User Rare Sequential Topic Patterns of Internet Users
Prediction of User Rare Sequential Topic Patterns of Internet UsersPrediction of User Rare Sequential Topic Patterns of Internet Users
Prediction of User Rare Sequential Topic Patterns of Internet Users
IRJET Journal
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble
 
LIBER Webinar: 23 Things About Research Data Management
LIBER Webinar: 23 Things About Research Data ManagementLIBER Webinar: 23 Things About Research Data Management
LIBER Webinar: 23 Things About Research Data Management
LIBER Europe
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
Tom Plasterer
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
Tom Plasterer
 

Similar to Social Networks analysis to characterize HIV at-risk populations - Progress and Status. (20)

Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
 
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF DatasetsBOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
 
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
 
IRJET - Suicidal Text Detection using Machine Learning
IRJET -  	  Suicidal Text Detection using Machine LearningIRJET -  	  Suicidal Text Detection using Machine Learning
IRJET - Suicidal Text Detection using Machine Learning
 
Final_report6
Final_report6Final_report6
Final_report6
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
Detection and Analysis of Twitter Trending Topics via Link-Anomaly DetectionDetection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
 
Data science unit3
Data science unit3Data science unit3
Data science unit3
 
Modern Association Rule Mining Methods
Modern Association Rule Mining MethodsModern Association Rule Mining Methods
Modern Association Rule Mining Methods
 
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
 
F017433947
F017433947F017433947
F017433947
 
Nov1 webinar intro_slides v
Nov1 webinar intro_slides vNov1 webinar intro_slides v
Nov1 webinar intro_slides v
 
Prediction of User Rare Sequential Topic Patterns of Internet Users
Prediction of User Rare Sequential Topic Patterns of Internet UsersPrediction of User Rare Sequential Topic Patterns of Internet Users
Prediction of User Rare Sequential Topic Patterns of Internet Users
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
LIBER Webinar: 23 Things About Research Data Management
LIBER Webinar: 23 Things About Research Data ManagementLIBER Webinar: 23 Things About Research Data Management
LIBER Webinar: 23 Things About Research Data Management
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
ThesisPresentation
ThesisPresentationThesisPresentation
ThesisPresentation
 

More from UC San Diego

A primer on network devices
A primer on network devicesA primer on network devices
A primer on network devicesUC San Diego
 
Datacenter traffic demand characterization
Datacenter traffic demand characterizationDatacenter traffic demand characterization
Datacenter traffic demand characterization
UC San Diego
 
Smart Homes, Buildings and Internet-of-things
Smart Homes, Buildings and Internet-of-thingsSmart Homes, Buildings and Internet-of-things
Smart Homes, Buildings and Internet-of-things
UC San Diego
 
eyeTalk - A system for helping people affected by motor neuron problems
eyeTalk - A system for helping people affected by motor neuron problemseyeTalk - A system for helping people affected by motor neuron problems
eyeTalk - A system for helping people affected by motor neuron problems
UC San Diego
 
Ajaxism
AjaxismAjaxism
Ajaxism
UC San Diego
 
Basic terminologies for a developer
Basic terminologies for a developerBasic terminologies for a developer
Basic terminologies for a developer
UC San Diego
 
Fields in computer science
Fields in computer scienceFields in computer science
Fields in computer science
UC San Diego
 
Understanding computer networks
Understanding computer networksUnderstanding computer networks
Understanding computer networks
UC San Diego
 
FOSS Introduction
FOSS IntroductionFOSS Introduction
FOSS Introduction
UC San Diego
 
Network Programming with Umit project
Network Programming with Umit projectNetwork Programming with Umit project
Network Programming with Umit project
UC San Diego
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
UC San Diego
 
Airline reservation system db design
Airline reservation system db designAirline reservation system db design
Airline reservation system db design
UC San Diego
 
Workshop on Network Security
Workshop on Network SecurityWorkshop on Network Security
Workshop on Network Security
UC San Diego
 
Socket programming in Java (PPTX)
Socket programming in Java (PPTX)Socket programming in Java (PPTX)
Socket programming in Java (PPTX)UC San Diego
 
Socket programming using java
Socket programming using javaSocket programming using java
Socket programming using java
UC San Diego
 
Routing basics
Routing basicsRouting basics
Routing basics
UC San Diego
 
Technology Quiz
Technology QuizTechnology Quiz
Technology Quiz
UC San Diego
 
Android application development
Android application developmentAndroid application development
Android application development
UC San Diego
 
Pervasive Web Application Architecture
Pervasive Web Application ArchitecturePervasive Web Application Architecture
Pervasive Web Application ArchitectureUC San Diego
 

More from UC San Diego (20)

A primer on network devices
A primer on network devicesA primer on network devices
A primer on network devices
 
Datacenter traffic demand characterization
Datacenter traffic demand characterizationDatacenter traffic demand characterization
Datacenter traffic demand characterization
 
Smart Homes, Buildings and Internet-of-things
Smart Homes, Buildings and Internet-of-thingsSmart Homes, Buildings and Internet-of-things
Smart Homes, Buildings and Internet-of-things
 
eyeTalk - A system for helping people affected by motor neuron problems
eyeTalk - A system for helping people affected by motor neuron problemseyeTalk - A system for helping people affected by motor neuron problems
eyeTalk - A system for helping people affected by motor neuron problems
 
Pirc net poster
Pirc net posterPirc net poster
Pirc net poster
 
Ajaxism
AjaxismAjaxism
Ajaxism
 
Basic terminologies for a developer
Basic terminologies for a developerBasic terminologies for a developer
Basic terminologies for a developer
 
Fields in computer science
Fields in computer scienceFields in computer science
Fields in computer science
 
Understanding computer networks
Understanding computer networksUnderstanding computer networks
Understanding computer networks
 
FOSS Introduction
FOSS IntroductionFOSS Introduction
FOSS Introduction
 
Network Programming with Umit project
Network Programming with Umit projectNetwork Programming with Umit project
Network Programming with Umit project
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
 
Airline reservation system db design
Airline reservation system db designAirline reservation system db design
Airline reservation system db design
 
Workshop on Network Security
Workshop on Network SecurityWorkshop on Network Security
Workshop on Network Security
 
Socket programming in Java (PPTX)
Socket programming in Java (PPTX)Socket programming in Java (PPTX)
Socket programming in Java (PPTX)
 
Socket programming using java
Socket programming using javaSocket programming using java
Socket programming using java
 
Routing basics
Routing basicsRouting basics
Routing basics
 
Technology Quiz
Technology QuizTechnology Quiz
Technology Quiz
 
Android application development
Android application developmentAndroid application development
Android application development
 
Pervasive Web Application Architecture
Pervasive Web Application ArchitecturePervasive Web Application Architecture
Pervasive Web Application Architecture
 

Recently uploaded

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 

Recently uploaded (20)

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 

Social Networks analysis to characterize HIV at-risk populations - Progress and Status.

  • 1. Detecting HIV at-risk MSM in San Diego through Social Networks Digital Epidemiology
  • 2. Current Relevance of HIV ● 33.4 million cases. ● Second growth phase of HIV already been reported in some of the countries. ● Need to intensify HIV prevention efforts - this is difficult.
  • 3. How can technology help? ● Philosophical question : Can social networks help in identifying users with high risk of HIV infection? ● Goal of project: Characterize HIV vulnerable populations by extracting user sentiments from social networks like Twitter.
  • 4. History & Related work ● Epidemiology - Hippocrates, 400 B.C. -> Digital Epidemiology - Marcel Salathe et. al. 2012. ● Unraveling Abstinence and Relapse: Smoking Cessation Reflected in Social Media - Dr. Elizabeth Murnane, CHI 2014. ● Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes - Sean D. Young et. al., Elsevier Preventive Medicine, 2014.
  • 5. Data source ● 210 notable social networks - 43 things to Zooppa. ● Twitter was chosen because of the results published in earlier studies. ● Programmatic access to tweets using Streaming API. ○ Sample Hose (~4200 tweets/min) ○ Filter Hose (~40 tweets/min) ○ Fire Hose (~420000 tweets/min)
  • 6. Data collection ● Streaming API ● MongoDB ○ Tweets ○ HIV Corpus ○ HIV Corpus cleaned ○ Related tweets/users ● Neo4j
  • 7. Data classification & cleaning ● Classification ○ Filter tweets based on a pre-defined set of HIV risk words. ○ Five Risk Buckets : Drug, SexVenues, STI, Sex, Homosexual. ● Cleaning ○ Keep or discard tweets based on co-occurring words. ○ Manually scavenged through classified tweets to create lists with Dr. Nella Green’s help. ○ Exception and Inclusion lists for every HIV risk word.
  • 8. Why Graph DB? ● Twitter’s deeply associative data can be easily modeled. ● Most use cases correspond to analyzing sub-structures and connectedness. Queries on a graph are much faster than join bombs in relational data models. ● We use Neo4j - mature and scalable native graph store with good support.
  • 9. Property graph Data Model Nodes 1. USER 2. TWEET 3. HASHTAG 4. URL 5. FOLLOWER_USER 6. ONTOLOGY_BUCKET 7. ONTOLOGY_INSTANCE Edges 1. FOLLOWS 2. TWEETED 3. MENTIONED_IN 4. IS_REPLY_FOR 5. RETWEET_FOR 6. HAS_HASHTAG 7. HAS_URL 8. HAS_RISK_WORD 9. INSTANCE_OF
  • 10. Property graph Data Model Nodes 1. USER 2. TWEET 3. HASHTAG 4. URL 5. FOLLOWER_USER 6. ONTOLOGY_BUCKET 7. ONTOLOGY_INSTANCE Edges 1. FOLLOWS 2. TWEETED 3. MENTIONED_IN 4. IS_REPLY_FOR 5. RETWEET_FOR 6. HAS_HASHTAG 7. HAS_URL 8. HAS_RISK_WORD 9. INSTANCE_OF
  • 11. Property graph Data Model Nodes 1. USER 2. TWEET 3. HASHTAG 4. URL 5. FOLLOWER_USER 6. ONTOLOGY_BUCKET 7. ONTOLOGY_INSTANCE Edges 1. FOLLOWS 2. TWEETED 3. MENTIONED_IN 4. IS_REPLY_FOR 5. RETWEET_FOR 6. HAS_HASTAG 7. HAS_URL 8. HAS_RISK_WORD 9. INSTANCE_OF
  • 12. Property graph Data Model Nodes 1. USER 2. TWEET 3. HASHTAG 4. URL 5. FOLLOWER_USER 6. ONTOLOGY_BUCKET 7. ONTOLOGY_INSTANCE Edges 1. FOLLOWS 2. TWEETED 3. MENTIONED_IN 4. IS_REPLY_FOR 5. RETWEET_FOR 6. HAS_HASTAG 7. HAS_URL 8. HAS_RISK_WORD 9. INSTANCE_OF
  • 13. Property graph Data Model Nodes 1. USER 2. TWEET 3. HASHTAG 4. URL 5. FOLLOWER_USER 6. ONTOLOGY_BUCKET 7. ONTOLOGY_INSTANCE Edges 1. FOLLOWS 2. TWEETED 3. MENTIONED_IN 4. IS_REPLY_FOR 5. RETWEET_FOR 6. HAS_HASTAG 7. HAS_URL 8. HAS_RISK_WORD 9. INSTANCE_OF
  • 14. Property graph Data Model Nodes 1. USER 2. TWEET 3. HASHTAG 4. URL 5. FOLLOWER_USER 6. ONTOLOGY_BUCKET 7. ONTOLOGY_INSTANCE Edges 1. FOLLOWS 2. TWEETED 3. MENTIONED_IN 4. IS_REPLY_FOR 5. RETWEET_FOR 6. HAS_HASTAG 7. HAS_URL 8. HAS_RISK_WORD 9. INSTANCE_OF
  • 15. Property graph Data Model Nodes 1. USER 2. TWEET 3. HASHTAG 4. URL 5. FOLLOWER_USER 6. ONTOLOGY_BUCKET 7. ONTOLOGY_INSTANCE Edges 1. FOLLOWS 2. TWEETED 3. MENTIONED_IN 4. IS_REPLY_FOR 5. RETWEET_FOR 6. HAS_HASHTAG 7. HAS_URL 8. HAS_RISK_WORD 9. INSTANCE_OF
  • 16. Property graph Data Model Nodes 1. USER 2. TWEET 3. HASHTAG 4. URL 5. FOLLOWER_USER 6. ONTOLOGY_BUCKET 7. ONTOLOGY_INSTANCE Edges 1. FOLLOWS 2. TWEETED 3. MENTIONED_IN 4. IS_REPLY_FOR 5. RETWEET_FOR 6. HAS_HASTAG 7. HAS_URL 8. HAS_RISK_WORD 9. INSTANCE_OF
  • 17. Property graph Data Model Nodes 1. USER 2. TWEET 3. HASHTAG 4. URL 5. FOLLOWER_USER 6. ONTOLOGY_BUCKET 7. ONTOLOGY_INSTANCE Edges 1. FOLLOWS 2. TWEETED 3. MENTIONED_IN 4. IS_REPLY_FOR 5. RETWEET_FOR 6. HAS_HASTAG 7. HAS_URL 8. HAS_RISK_WORD 9. INSTANCE_OF
  • 18. Property graph Data Model Nodes 1. USER 2. TWEET 3. HASHTAG 4. URL 5. FOLLOWER_USER 6. ONTOLOGY_BUCKET 7. ONTOLOGY_INSTANCE Edges 1. FOLLOWS 2. TWEETED 3. MENTIONED_IN 4. IS_REPLY_FOR 5. RETWEET_FOR 6. HAS_HASTAG 7. HAS_URL 8. HAS_RISK_WORD 9. INSTANCE_OF
  • 19. Migration from mongoDB to Neo4j ● Using python - Py2neo library. ● Modular scripts
  • 20. And.. this is what we got!
  • 21. ONTOLOGY_ BUCKET {id: “DrugBucket”} ONTOLOGY _INSTANCE {id:“Meth”} ONTOLOGY _INSTANCE {id:“Coke”} USER {name:” Bob”} USER {name:” Alice”} TWEET {text:” Hello World! I like meth! #drugs http://t. co/ran1”} FOLLOWER_ USER {name:” Eve”} FOLLOWER_ USER {name:” Fred”} TWEET {text:” @Alice:Want some coke? I am at the loft”} ONTOLOGY _BUCKET {id: “SexVenues” } ONTOLOGY _INSTANCE {id:“The Loft”} HASHTAG {name:”drugs”} URL {name:” http://t.co/ran1”} INSTANCE_OF HAS_RISK_WORD TWEETED HAS_URL HAS_HASHTAG FOLLOWS RETWEET_FOR IS_REPLY_FOR MENTIONED_IN Data Model..
  • 22. ONTOLOGY_ BUCKET {id: “DrugBucket ”} ONTOLOGY _INSTANCE {id:“Meth”} ONTOLOGY _INSTANCE {id:“Coke”} USER {name:” Bob”} USER {name:” Alice”} TWEET {text:” Hello World! I like meth! #drugs http: //t.co/ran1”} FOLLOWER_ USER {name:” Eve”} FOLLOWER_ USER {name:” Fred”} TWEET {text:” @Alice:Want some coke? I am at the loft”} ONTOLOGY_ BUCKET {id: “SexVenues” } ONTOLOGY _INSTANCE {id:“The Loft”} HASHTAG {name:”drugs”} URL {name:” http://t. co/ran1”} INSTANCE_OF HAS_RISK_WORD TWEETED HAS_URL HAS_HASHTAG FOLLOWS TWEET TWEET RETWEET_FOR IS_REPLY_FOR MENTIONED_IN
  • 24. Conversations among users.. “How many conversations are happening among the drug bucket users alone , sex bucket users alone and across drug bucket users and sex bucket users?” MATCH p=( (n:ONTOLOGY_BUCKET{id: 'DrugBucket'})-[r]-(m:ONTOLOGY_INSTANCE) -[r1]- (t:TWEET)<-[r2:IS_REPLY_FOR*2..]-(t1:TWEET)) where not (t)-[:`IS_REPLY_FOR`]->(:`TWEET`) RETURN count(DISTINCT t) Queries Output: 8 (1692 ms)
  • 25. Conversations among users.. “How many conversations are happening among the drug bucket users alone , sex bucket users alone and across drug bucket users and sex bucket users?” MATCH p=((n:ONTOLOGY_BUCKET)-[r]-(m:ONTOLOGY_INSTANCE) -[r1]- (t:TWEET)<-[r2:IS_REPLY_FOR*2..]-(t1:TWEET)) where n.id in ["HomosexualTermsBucket","STIBucket","SexBucket"," SexVenues"] and not (t)-[:`IS_REPLY_FOR`]->(:`TWEET`) RETURN count(DISTINCT t); Queries Output: 20 (2350 ms)
  • 26. Conversations among users.. “How many conversations are happening among the drug bucket users alone , sex bucket users alone and across drug bucket users and sex bucket users?” MATCH p1=((n:ONTOLOGY_BUCKET)-[r]-(m:ONTOLOGY_INSTANCE) -[r1]- (t:TWEET)<-[r2:IS_REPLY_FOR*2..]-(t1:TWEET) -[r3]- (o:ONTOLOGY_INSTANCE)-[r4]-(p:ONTOLOGY_BUCKET {id: 'DrugBucket'})) where n.id in ["HomosexualTermsBucket","STIBucket","SexBucket"," SexVenues"] and not (t)-[:`IS_REPLY_FOR`]->(:`TWEET`) RETURN count(DISTINCT t); Queries Output: 2 (207952 ms)
  • 27. Conversations among users.. “How many conversations are happening among the drug bucket users alone , sex bucket users alone and across drug bucket users and sex bucket users?” MATCH p1=((n:ONTOLOGY_BUCKET {id: 'DrugBucket'})-[r]-(m:ONTOLOGY_INSTANCE) -[r1]- (t:TWEET)<-[r2:IS_REPLY_FOR*2..]-(t1:TWEET) -[r3]- (o:ONTOLOGY_INSTANCE)-[r4]-(p:ONTOLOGY_BUCKET)) where p.id in ["HomosexualTermsBucket","STIBucket","SexBucket"," SexVenues"] and not (t)-[:`IS_REPLY_FOR`]->(:`TWEET`) RETURN count(DISTINCT t); Queries Output: 1 (234202 ms)
  • 28. Finding most referred users.. “List users in the descending order of referral counts” MATCH p=((u:USER)-[r:MENTIONED_IN]->() ) RETURN u.name,count(p) AS num_mentions ORDER BY num_mentions DESC limit 5; Queries Output: +--------------------------------------+ | u.name | num_mentions | +--------------------------------------+ | "cc7764343d" | 261 | | "972b1707f7" | 256 | | "9be7e77265" | 235 | | "8dc5aaf21a" | 232 | | "e1095646aa" | 220 | +--------------------------------------+ (172 ms)
  • 29. Finding most referred users.. “List users in the descending order of referral counts” MATCH p=((u:USER)-[r:MENTIONED_IN]->(t) ) where not (t)-[:`IS_REPLY_FOR`]->(:`TWEET`) RETURN u.name,count(p) AS num_mentions ORDER BY num_mentions DESC limit 5; Queries Output: +----------------------------------+ | u.name | num_mentions | +----------------------------------+ | "00f4edeac2" | 28 | | "8987f033aa" | 16 | | "e6e67c5cef" | 10 | | "fdf2ce82fd" | 6 | | "86609dbd6e" | 5 | +----------------------------------+ (198 ms) Forbidden substructure
  • 30. Topics of interest around a hub.. “What are the main topics in the discussions among people who are at a one- hop following distance from their sub-graph’s hubs.” MATCH (n:USER)<-[r:FOLLOWS*1..]-(m) OPTIONAL MATCH (m)-[r1:TWEETED]->( t:TWEET)-[o]->(p:ONTOLOGY_INSTANCE)-[q]- >(s:ONTOLOGY_BUCKET {id:” DrugBucket”}) WITH COUNT(t) as count, n as hub WHERE count >= 2 MATCH (o:ONTOLOGY_BUCKET)<-[r2*2..2]-(t1:TWEET) <-[TWEETED]-(neighbour:USER)-[r3:FOLLOWS]-hub return o.id, hub.name, count(t1) ORDER BY count(t1) DESC limit 5 Queries Output: +-------------------------------------+ | o.id | hub.name | count(t1) | +-------------------------------------+ | "SexBucket" | "b4f30295f9" | 1 | | "DrugBucket" | "b4f30295f9" | 1 | +-------------------------------------+ (589 ms)
  • 31. Two most consulted drug users.. “The real world data tells us that lots of homosexual (MSM) people consume drugs or psycho-stimulants. Identify two drug bucket users who are most consulted by homosexual people on Twitter” MATCH (o:ONTOLOGY_BUCKET {id:"DrugBucket"}) <-[ri1:INSTANCE_OF]-(oi1:ONTOLOGY_INSTANCE) <-[rhr1:HAS_RISK_WORD]-(t1:TWEET) <-[rt1:TWEETED]-(drug:USER)-[MENTIONED_IN]-> (t:TWEET)<-[rt2:TWEETED]-(homosex:USER) -[rt3:TWEETED]->(t2:TWEET)-[rhr2:HAS_RISK_WORD] ->(oi2:ONTOLOGY_INSTANCE)-[ri2:INSTANCE_OF] ->(o1:ONTOLOGY_BUCKET {id:"HomosexualTermsBucket"}) RETURN drug.name, count(DISTINCT t) ORDER BY count(DISTINCT t) DESC LIMIT 2 Queries Output: +------------------------------------------+ | drug.name | count(DISTINCT t) | +------------------------------------------+ | "748d9dc913" | 26 | | "5a74f759b8" | 13 | +------------------------------------------+ (13825 ms)
  • 32. Proximity of drug bucket users.. “How close are drug bucket users to other homosexual bucket users in terms of proximity in the social graph?” MATCH p = (o1:ONTOLOGY_BUCKET {id:” HomosexualTermsBucket ”})<-[ri1:INSTANCE_OF]-(oi1: ONTOLOGY_INSTANCE)<-[rrw1:HAS_RISK_WORD]-(t1:TWEET) <-[rt1:TWEETED]- (u1:USER)-[r:FOLLOWS*1..3]->(u2:USER) -[rt2:TWEETED]->(t2: TWEET)-[rrw2:HAS_RISK_WORD]->(oi2:ONTOLOGY_INSTANCE)-[ri2:INSTANCE_OF]-> (o2:ONTOLOGY_BUCKET {id:” DrugBucket”}) return u1.name,length(p), count(u2) ORDER BY length(p) Queries Output: +-------------------------------------+ | u1.name | length(p) | count(u2) | +-------------------------------------+ | "1b0056b07a"| 7 | 4 | | "0c384be19a"| 7 | 2 | +-------------------------------------+ (260 ms)
  • 33. Graph substructures Get me all the social subgraphs which have central nodes
  • 34. Shortest paths vs. diameter between users ● Finding user-connected components ○ Perform BFS traversal and add a property ‘subgraph’ for each node ○ Forbidden substructure - Users can be connected via ontology buckets or ontology instances ● Neo4j Java Traversal Framework API Code Snippet Traverser traverser = db.traversalDescription() .breadthFirst() .relationships(RelTypes.TWEETED) .relationships(RelTypes.FOLLOWS) .relationships(RelTypes.IS_REPLY_FOR) .relationships(RelTypes.MENTIONED_IN) .evaluator(Evaluators.excludeStartPosition()) .uniqueness(Uniqueness.NODE_GLOBAL).traverse(n); Eliminate forbidden substructure Queries
  • 35. Shortest paths vs. diameter between users Find average shortest path between any 2 users in a connected component and compare it to diameter of the connected component match (n:USER) WITH n.subgraph as subGraphNum, count(n) as c WHERE c >= 7 WITH collect(subGraphNum) as collectionSG MATCH p=shortestPath((s:USER)-[:FOLLOWS|MENTIONED_IN|TWEETED|IS_REPLY_FOR*..]-(d:USER)) WHERE s.subgraph=d.subgraph and s.subgraph in collectionSG and length(p)>1 RETURN s.subgraph, sum(length(p))/count(p), max(length(p)), ((sum(length(p))/count(p))*1.0) /max(length(p)) ORDER BY ((sum(length(p))/count(p))*1.0)/max(length(p)) DESC Queries
  • 36. Shortest paths vs. diameter between users Output: +-------------------------------------------------------------------------------------------------------- + | s.subgraph | sum(length(p))/count(p) | max(length(p)) | ((sum(length(p))/count(p))*1.0)/max(length(p)) | +-------------------------------------------------------------------------------------------------------- + | 2431 | 2 | 2 | 1.0 | | 23 | 3 | 4 | 0.75 | | 6024 | 3 | 4 | 0.75 | | 671 | 3 | 4 | 0.75 | | 1737 | 3 | 4 | 0.75 | | 1264 | 3 | 4 | 0.75 | | 2136 | 3 | 4 | 0.75 | | 1742 | 3 | 4 | 0.75 | | 7152 | 3 | 4 | 0.75 | | 5650 | 3 | 4 | 0.75 | | 1 | 9 | 15 | 0.6 | | 4195 | 3 | 5 | 0.6 | | 8038 | 2 | 6 | 0.3333333333333333 | +-------------------------------------------------------------------------------------------------------- +
  • 38.
  • 39. Even more interesting substructures!
  • 41. Getting geo-local sentiments What are the people hanging around Sex Venues talking about?
  • 42. Commonly discussed topics around Sex Venues ● Some tweets are geotagged ○ Neo4j Spatial plugin to create spatial index on tweets ● Find tweets tweeted near a specific Sex Venue ○ Perform a withinDistance query for the coordinates of the sex venue ● Are these tweets talking about specific topics? ○ Topic Modeling -LDA (Gensim) on tweets Queries
  • 43. Commonly discussed topics around Sex Venues Find what topic HIV risk users are talking about the most, around a particular Sex Venue. REST API Code Snippet headers = {'content-type': 'application/json'} url = "http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/findGeometriesWithinDistance" payload = { "layer" : "geom", "pointX" : -117.161324, "pointY" : 32.710671, "distanceInKm" : 2 } r = requests.post(url, data=json.dumps(payload), headers=headers)
  • 44. ● Cleaning tweets - remove mentions, urls ● Stop word list - Stanford NLTK library ● Gensim - Corpora & lda libraries ● Free parameters ○ Number of topics - 2,3,4 ○ Distance Radius for ‘withinDistance’ query - 2,5,10 kms LDA on Tweets found around Sex Venues Queries
  • 45. LDA on Colocated Tweets - Results Topic #1 gay, san, diego, queen, flicks, glass, amp, dont, coke, get Topic #2 gay, san, diego, ca, glass, amp, cheers, flicks, bourbon, happy Drug & Homosexual bucket coke glass dope gay queen … ... Sex Venues Bucket groovy laid back amp nasty cheers flicks bourbon street club san diego pecs … ...
  • 46. Interesting patterns.. “What is the longest conversation thread among any set of users?” MATCH p = (n:TWEET)<-[r:IS_REPLY_FOR*]-(m:TWEET) RETURN p order by length (p) desc limit 1 205 nodes (157179 ms) Queries
  • 47. Result - Longest Conversation
  • 48. Challenges ● Data collection ○ Sampled data (1%) ○ Twitter APIs call rate limit per user - 15 calls/15 mins. ○ Collecting users who have favorited a tweet. ○ Extracting conversations/retweet chains associated with a tweet. ● Data classification and cleaning ○ Working with microblogs. ○ Iterative process. ● Restricted visualization for Neo4j ○ Hard to decipher patterns in graph.
  • 49. Future ● More representative dataset - Firehose API ● Innovative Data Visualizations to visualize evolving graphs ● Machine Learning for better HIV risk tweets classification. ○ Mechanical Turk for labeling ○ Logistic Regression for classification ● SD Primary Infection Cohort - Overlaying real-world HIV infection graph on top of an enriched social network
  • 50. Conclusion ● Structured approach to model social networks and derive insights from networks like Twitter. Best practices in collecting and managing Twitter data for social networks analysis. ● Current results - Graph queries to derive intuitions on factors that influence HIV risk behaviour. ● Vision for the future.