SlideShare a Scribd company logo
1 of 31
Download to read offline
Rhea: Adaptively Sampling Authoritative
Content from Social Activity Streams
Panagiotis Liakos - Alexandros Ntoulas - Alex Delis
University of Athens, Greece
IEEE BigData 2017
December 11th-14th, 2017 - Boston, MA
UoA Panagiotis Liakos Rhea-• Motivation 2/26
500 million tweets
sent each day!
UoA Panagiotis Liakos Rhea-• Motivation 2/26
Motivation
Mining social activity in real-time is valuable for numerous
applications:
opinion mining
content recommendation
emerging news detection
Processing the full activity stream of a social network is prohibitive:
storage
computational cost
UoA Panagiotis Liakos Rhea-• Motivation 3/26
Motivation
Mining social activity in real-time is valuable for numerous
applications:
opinion mining
content recommendation
emerging news detection
Processing the full activity stream of a social network is prohibitive:
storage
computational cost
Not all content is useful:
90% of tweets is conversational or spam!
Workaround: take a sample of the social
activity and use it to feed into applications!
UoA Panagiotis Liakos Rhea-• Motivation 3/26
Motivation
Mining social activity in real-time is valuable for numerous
applications:
opinion mining
content recommendation
emerging news detection
Processing the full activity stream of a social network is prohibitive:
storage
computational cost
Not all content is useful:
90% of tweets is conversational or spam!
Workaround: take a sample of the social
activity and use it to feed into applications!
Our approach:
Sample the content published by authorities
UoA Panagiotis Liakos Rhea-• Motivation 3/26
Related Work
Social Activity Stream Sampling:
White-lists of users [GSB+12, WLP+12, GZB+13, ZBG+16].
Focus is mainly on Twitter.
Our approach is adaptive and does not rely on static white-lists.
Authoritative users in Online Social Networks:
Network attributes [ZAA07, JA07, ACD+08, PC11, BBC+13].
We focus on streams, not networks.
UoA Panagiotis Liakos Rhea-• Related Work 4/26
Contribution
We propose Rhea:
A sampling algorithm for authoritative content that forms a
network of authorities as it processes a social activity stream,
and samples only the activity of the top-K authoritative users.
We build on:
Network-based measures and their Our findings on the disadvantages
adaptation in a streaming setting of white-list approaches
We outperform contemporary approaches with regard to
precision, recall, and ranking accuracy!
UoA Panagiotis Liakos Rhea-• Contribution 5/26
Network-based measures
UoA Panagiotis Liakos Rhea-• Network-based measures 6/26
Network of Authorities from Social Activity
UoA Panagiotis Liakos Rhea-• Network-based measures 7/26
Network of Authorities from Social Activity
UoA Panagiotis Liakos Rhea-• Network-based measures 7/26
Ranking the Authorities
z-score: Zhang, Ackerman and Adamic, WWW 2007
Builds on positive and negative predictors of expertise:
z(u) = a(u)−q(u)
√
a(u)+q(u)
where, a(u) is the number of questions u has answered
and q(u) is the number of questions u has asked.
UoA Panagiotis Liakos Rhea-• Network-based measures 8/26
Ranking the Authorities
We propose auth-value:
A measure for a wide range of social networking sites:
auth(u) = in(u)−out(u)
√
in(u)+out(u)
where, in(u) is the weighted in-degree of u in the network of authorities
and out(u) is her respective weighted out-degree.
UoA Panagiotis Liakos Rhea-• Network-based measures 9/26
Our findings on
White-List approaches
UoA Panagiotis Liakos Rhea-• White-lists 10/26
Limitations of Static Lists of Authorities
Rank
October 2009 November 2009 December 2009
user u auth(u) user u auth(u) user u auth(u)
1 justinbieber 393.885 justinbieber 448.815 justinbieber 433.185
2 donniewahlberg 358.286 donniewahlberg 249.988 nickjonas 249.558
3 tweetmeme 263.103 revrunwisdom 242.807 revrunwisdom 222.571
4 revrunwisdom 237.964 tweetmeme 195.379 donniewahlberg 202.996
5 mashable 229.650 addthis 186.282 tweetmeme 183.603
6 addthis 212.325 ddlovato 181.720 jonasbrothers 182.882
7 ddlovato 204.910 luansantanaevc 167.514 addthis 181.403
8 jordanknight 191.045 jordanknight 167.197 omgfacts 154.136
9 jonasbrothers 175.054 jonasbrothers 165.520 mashable 153.616
10 lilduval 174.616 mashable 164.496 johncmayer 147.241
User rankings vary across different months
White-lists can be unstable and
quickly become out-of-date
UoA Panagiotis Liakos Rhea-• White-lists 11/26
Limitations of Static Lists of Authorities
0.4
0.5
0.6
0.7
0.8
0.9
1
0 250 500 750 1000
Precision@K
K (authorities)
Sept. 2009 & Oct. 2009
Sept. 2009 & Nov. 2009
Sept. 2009 & Dec. 2009
UoA Panagiotis Liakos Rhea-• White-lists 12/26
Limitations of Static Lists of Authorities
0.4
0.5
0.6
0.7
0.8
0.9
1
0 250 500 750 1000
Precision@K
K (authorities)
Sept. 2009 & Oct. 2009
Sept. 2009 & Nov. 2009
Sept. 2009 & Dec. 2009
We need an adaptive algorithm!
UoA Panagiotis Liakos Rhea-• White-lists 12/26
Rhea: “She who flows”
Museum of Fine Arts,
Boston
UoA Panagiotis Liakos Rhea-• Rhea 13/26
Rhea: Three Challenges
1 Maintaining user information
may be costly in terms of both memory & CPU
2 Ranking users
may require reckoning in multiple measures
3 Many elements we opt to include may be irrelevant
UoA Panagiotis Liakos Rhea-• Rhea 14/26
Maintaining User Information
Count-Min sketch:
+ct
+ct
+ct
+ct
h1
h2
hd
...
it d
w
count
Reducing the processing overhead through sampling:
We apply a Bernoulli sampling scheme [PJC+15].
UoA Panagiotis Liakos Rhea-• Rhea 15/26
Ranking Authorities
We need to know at any time the top-K users by auth(u):
Algorithm 1: put(Top-K-Heap, key, value)
input : A Top-K-Heap structure and a key associated with a value to be
inserted in the Top-K-Heap.
output : The updated Top-K-Heap.
1 begin
2 if Top-K-Heap.size() < K then
3 if Top-K-Heap.contains(key) then
4 Top-K-Heap.replace(key, value);
5 else
6 Top-K-Heap.push(key, value);
7 else
8 if Top-K-Heap.contains(key) then
9 Top-K-Heap.replace(key, value);
10 else if value > Top-K-Heap.peek().value() then
11 Top-K-Heap.pop();
12 Top-K-Heap.push(key, value);
13 return Top-K-Heap;
UoA Panagiotis Liakos Rhea-• Rhea 16/26
Filtering-out Non-relevant Activity
While processing the stream, we may deem as an authority
a user that temporarily appears to be one.
We lose in precision!
Post-processing step:
The sample is much smaller than the stream: ˆS S
We re-examine the elements of the sample and
filter-out the activity of users not in the Top-K-Heap
UoA Panagiotis Liakos Rhea-• Rhea 17/26
Rhea
Forming the network of
authorities
Sampling the stream
Removing irrelevant content
Algorithm 2: Rhea(S, K, p)
input : A stream S, a parameter K > 0 and a probability p ∈ (0, 1].
output : A set ˆS ⊂ S containing elements whose respective users are likely to
be among the top-K w.r.t. to the auth-value.
begin
T op-K-heap ← ∅;
CMSin ← ∅;
CMSout ← ∅;
foreach s ∈ S do
if random(0, 1] < p then
(in, out) ← extractIndicators(s.message) ;
CMSin[in]+ = 1 ;
CMSout[out]+ = 1 ;
authuser ←
CMSin[s.user]−CMSout[s.user]
CMSin[s.user]+CMSout[s.user]
;
if authuser > T op-K-heap.low() then
T op-K-heap.put(user, authuser);
ˆS.put(s);
foreach s ∈ ˆS do
if s.user /∈ T op-K-heap then
ˆS.remove(s);
return ˆS;
UoA Panagiotis Liakos Rhea-• Rhea 18/26
Experimental Evaluation
Dataset:
1 467 million tweets from 20 million users of Twitter
2 263, 540 answers to 83, 423 questions posted by 26, 752 users of
StackOverflow
Questions:
1 How does Rhea compare against white-list based sampling in
terms of F1-score?
2 Is Rhea able to assess the ranking relevance of the sampled
documents?
3 What is the impact of the parameters involved in the execution
of Rhea?
UoA Panagiotis Liakos Rhea-• Exeriments 19/26
F1-score
0
0.2
0.4
0.6
0.8
1
0 250 500 750 1000
F1-score
K (authorities)
Rhea (T)
WhiteList (T)
0
0.2
0.4
0.6
0.8
1
0 250 500 750 1000
F1-score
K (authorities)
Rhea (SO)
WhiteList (SO)
UoA Panagiotis Liakos Rhea-• Exeriments 20/26
Normalized Discounted Cumulative Gain
0.4
0.5
0.6
0.7
0.8
0.9
1
0 250 500 750 1000
NDCG
K (authorities)
Rhea (T)
WhiteList (T)
0.4
0.5
0.6
0.7
0.8
0.9
1
0 250 500 750 1000
NDCG
K (authorities)
Rhea (SO)
WhiteList (SO)
UoA Panagiotis Liakos Rhea-• Exeriments 21/26
Impact of Parameters
Varying the Value of Probability p:
Using a sample of 20% of S we achieve performance almost as
good as that of using S.
Using p = 0.2 instead of p = 1 greatly reduces processing time.
Removing Filtering Step:
Over 25 p.p. for K = 1, 000 and is never less than 10 p.p. for
any K examined.
UoA Panagiotis Liakos Rhea-• Exeriments 22/26
Conclusion
Rhea is the 1st adaptive algorithm for sampling
authoritative content from social activity streams.
We exposed the dynamic nature of the task.
We introduced a measure to identify authoritative users.
Rhea employs several techniques to achieve significantly
improved performance with regard to recall, precision, and
ranking accuracy.
UoA Panagiotis Liakos Rhea-• Conclusion 23/26
References I
[ACD+
08] Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and Gilad Mishne.
Finding high-quality content in social media.
In Proc. of the Int. Conf. on Web Search and Web Data Mining, WSDM 2008, Palo Alto, California, USA,
February 11-12, 2008, pages 183–194, 2008.
[BBC+
13] Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Matteo Silvestri, and Giuliano Vesci.
Choosing the right crowd: expert finding in social networks.
In Joint 2013 EDBT/ICDT Conferences, EDBT ’13 Proceedings, Genoa, Italy, March 18-22, 2013, pages
637–648, 2013.
[GSB+
12] Saptarshi Ghosh, Naveen Kumar Sharma, Fabr´ıcio Benevenuto, Niloy Ganguly, and P. Krishna Gummadi.
Cognos: crowdsourcing search for topic experts in microblogs.
In The 35th Int. ACM SIGIR Conf. on research and development in Information Retrieval, SIGIR ’12,
Portland, OR, USA, August 12-16, 2012, pages 575–590, 2012.
[GZB+
13] Saptarshi Ghosh, Muhammad Bilal Zafar, Parantapa Bhattacharya, Naveen Kumar Sharma, Niloy Ganguly,
and P. Krishna Gummadi.
On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream.
In 22nd ACM Int. Conf. on Information and Knowledge Management, CIKM’13, San Francisco, CA, USA,
October 27 - November 1, 2013, pages 1739–1744, 2013.
[JA07] Pawel Jurczyk and Eugene Agichtein.
Discovering authorities in question answer communities by using link analysis.
In Proc. of the 16th ACM Conf. on Information and Knowledge Management, CIKM 2007, Lisbon,
Portugal, November 6-10, 2007, pages 919–922, 2007.
[PC11] Aditya Pal and Scott Counts.
Identifying topical authorities in microblogs.
In Proc. of the 4th International Conference on Web Search and Web Data Mining, WSDM 2011, Hong
Kong, China, February 9-12, 2011, pages 45–54, 2011.
UoA Panagiotis Liakos Rhea-• References 24/26
References II
[PJC+
15] Deepan Subrahmanian Palguna, Vikas Joshi, Venkatesan T. Chakaravarthy, Ravi Kothari, and L. Venkata
Subramaniam.
Analysis of sampling algorithms for twitter.
In Proc. of the 24th Int. Joint Conf. on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July
25-31, 2015, pages 967–973, 2015.
[WLP+
12] Claudia Wagner, Vera Liao, Peter Pirolli, Les Nelson, and Markus Strohmaier.
It’s not in their tweets: Modeling topical expertise of twitter users.
In 2012 Int. Conf. on Privacy, Security, Risk and Trust, PASSAT 2012, and 2012 Int. Conf. on Social
Computing, SocialCom 2012, Amsterdam, Netherlands, September 3-5, 2012, pages 91–100, 2012.
[ZAA07] Jun Zhang, Mark S. Ackerman, and Lada A. Adamic.
Expertise networks in online communities: structure and algorithms.
In Proc. of the 16th Int. Conf. on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12,
2007, pages 221–230, 2007.
[ZBG+
16] Muhammad Bilal Zafar, Parantapa Bhattacharya, Niloy Ganguly, Saptarshi Ghosh, and Krishna P.
Gummadi.
On the wisdom of experts vs. crowds: Discovering trustworthy topical news in microblogs.
In Proc. of the 19th ACM Conf. on Computer-Supported Cooperative Work & Social Computing, CSCW
2016, San Francisco, CA, USA, February 27 - March 2, 2016, pages 437–450, 2016.
UoA Panagiotis Liakos Rhea-• References 25/26
thank you!
for further details email me at:
p.liakos@di.uoa.gr
UoA Panagiotis Liakos Rhea-• Contact 26/26

More Related Content

Similar to Rhea: Adaptively Sampling Authoritative Content from Social Activity Streams

Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsArcadia Data
 
Self-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipesSelf-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipesApache StreamPipes
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Top-N Recommendations from Implicit Feedback leveraging Linked Open Data
Top-N Recommendations from Implicit Feedback leveraging Linked Open DataTop-N Recommendations from Implicit Feedback leveraging Linked Open Data
Top-N Recommendations from Implicit Feedback leveraging Linked Open DataVito Ostuni
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDLT Solutions
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonJo-fai Chow
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...Adel Sabour
 
EvalOSS : A Framework to Evaluate Open Source Software
EvalOSS : A Framework to Evaluate Open Source SoftwareEvalOSS : A Framework to Evaluate Open Source Software
EvalOSS : A Framework to Evaluate Open Source Softwarebpupadhyaya
 
Nyc web perf-final-july-23
Nyc web perf-final-july-23Nyc web perf-final-july-23
Nyc web perf-final-july-23Dan Boutin
 
HIT3328 - Chapter04 - Complex Interactions
HIT3328 - Chapter04 - Complex InteractionsHIT3328 - Chapter04 - Complex Interactions
HIT3328 - Chapter04 - Complex InteractionsYhal Htet Aung
 
061211 Agu Aq Datasystem1
061211 Agu Aq Datasystem1061211 Agu Aq Datasystem1
061211 Agu Aq Datasystem1Rudolf Husar
 
Social Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender SystemsSocial Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender SystemsDenis Parra Santander
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
The NOAA Big Data Project: Public-Private Partnerships at Scale
The NOAA Big Data Project: Public-Private Partnerships at ScaleThe NOAA Big Data Project: Public-Private Partnerships at Scale
The NOAA Big Data Project: Public-Private Partnerships at ScaleAmazon Web Services
 
ER 2016 Tutorial
ER 2016 TutorialER 2016 Tutorial
ER 2016 TutorialRim Moussa
 

Similar to Rhea: Adaptively Sampling Authoritative Content from Social Activity Streams (20)

Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
Self-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipesSelf-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipes
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Top-N Recommendations from Implicit Feedback leveraging Linked Open Data
Top-N Recommendations from Implicit Feedback leveraging Linked Open DataTop-N Recommendations from Implicit Feedback leveraging Linked Open Data
Top-N Recommendations from Implicit Feedback leveraging Linked Open Data
 
cikm14
cikm14cikm14
cikm14
 
E05312426
E05312426E05312426
E05312426
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great Data
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
 
EvalOSS : A Framework to Evaluate Open Source Software
EvalOSS : A Framework to Evaluate Open Source SoftwareEvalOSS : A Framework to Evaluate Open Source Software
EvalOSS : A Framework to Evaluate Open Source Software
 
Nyc web perf-final-july-23
Nyc web perf-final-july-23Nyc web perf-final-july-23
Nyc web perf-final-july-23
 
HIT3328 - Chapter04 - Complex Interactions
HIT3328 - Chapter04 - Complex InteractionsHIT3328 - Chapter04 - Complex Interactions
HIT3328 - Chapter04 - Complex Interactions
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
061211 Agu Aq Datasystem1
061211 Agu Aq Datasystem1061211 Agu Aq Datasystem1
061211 Agu Aq Datasystem1
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Social Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender SystemsSocial Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender Systems
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
The NOAA Big Data Project: Public-Private Partnerships at Scale
The NOAA Big Data Project: Public-Private Partnerships at ScaleThe NOAA Big Data Project: Public-Private Partnerships at Scale
The NOAA Big Data Project: Public-Private Partnerships at Scale
 
ER 2016 Tutorial
ER 2016 TutorialER 2016 Tutorial
ER 2016 Tutorial
 

Recently uploaded

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 

Recently uploaded (20)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 

Rhea: Adaptively Sampling Authoritative Content from Social Activity Streams

  • 1. Rhea: Adaptively Sampling Authoritative Content from Social Activity Streams Panagiotis Liakos - Alexandros Ntoulas - Alex Delis University of Athens, Greece IEEE BigData 2017 December 11th-14th, 2017 - Boston, MA
  • 2. UoA Panagiotis Liakos Rhea-• Motivation 2/26
  • 3. 500 million tweets sent each day! UoA Panagiotis Liakos Rhea-• Motivation 2/26
  • 4. Motivation Mining social activity in real-time is valuable for numerous applications: opinion mining content recommendation emerging news detection Processing the full activity stream of a social network is prohibitive: storage computational cost UoA Panagiotis Liakos Rhea-• Motivation 3/26
  • 5. Motivation Mining social activity in real-time is valuable for numerous applications: opinion mining content recommendation emerging news detection Processing the full activity stream of a social network is prohibitive: storage computational cost Not all content is useful: 90% of tweets is conversational or spam! Workaround: take a sample of the social activity and use it to feed into applications! UoA Panagiotis Liakos Rhea-• Motivation 3/26
  • 6. Motivation Mining social activity in real-time is valuable for numerous applications: opinion mining content recommendation emerging news detection Processing the full activity stream of a social network is prohibitive: storage computational cost Not all content is useful: 90% of tweets is conversational or spam! Workaround: take a sample of the social activity and use it to feed into applications! Our approach: Sample the content published by authorities UoA Panagiotis Liakos Rhea-• Motivation 3/26
  • 7. Related Work Social Activity Stream Sampling: White-lists of users [GSB+12, WLP+12, GZB+13, ZBG+16]. Focus is mainly on Twitter. Our approach is adaptive and does not rely on static white-lists. Authoritative users in Online Social Networks: Network attributes [ZAA07, JA07, ACD+08, PC11, BBC+13]. We focus on streams, not networks. UoA Panagiotis Liakos Rhea-• Related Work 4/26
  • 8. Contribution We propose Rhea: A sampling algorithm for authoritative content that forms a network of authorities as it processes a social activity stream, and samples only the activity of the top-K authoritative users. We build on: Network-based measures and their Our findings on the disadvantages adaptation in a streaming setting of white-list approaches We outperform contemporary approaches with regard to precision, recall, and ranking accuracy! UoA Panagiotis Liakos Rhea-• Contribution 5/26
  • 9. Network-based measures UoA Panagiotis Liakos Rhea-• Network-based measures 6/26
  • 10. Network of Authorities from Social Activity UoA Panagiotis Liakos Rhea-• Network-based measures 7/26
  • 11. Network of Authorities from Social Activity UoA Panagiotis Liakos Rhea-• Network-based measures 7/26
  • 12. Ranking the Authorities z-score: Zhang, Ackerman and Adamic, WWW 2007 Builds on positive and negative predictors of expertise: z(u) = a(u)−q(u) √ a(u)+q(u) where, a(u) is the number of questions u has answered and q(u) is the number of questions u has asked. UoA Panagiotis Liakos Rhea-• Network-based measures 8/26
  • 13. Ranking the Authorities We propose auth-value: A measure for a wide range of social networking sites: auth(u) = in(u)−out(u) √ in(u)+out(u) where, in(u) is the weighted in-degree of u in the network of authorities and out(u) is her respective weighted out-degree. UoA Panagiotis Liakos Rhea-• Network-based measures 9/26
  • 14. Our findings on White-List approaches UoA Panagiotis Liakos Rhea-• White-lists 10/26
  • 15. Limitations of Static Lists of Authorities Rank October 2009 November 2009 December 2009 user u auth(u) user u auth(u) user u auth(u) 1 justinbieber 393.885 justinbieber 448.815 justinbieber 433.185 2 donniewahlberg 358.286 donniewahlberg 249.988 nickjonas 249.558 3 tweetmeme 263.103 revrunwisdom 242.807 revrunwisdom 222.571 4 revrunwisdom 237.964 tweetmeme 195.379 donniewahlberg 202.996 5 mashable 229.650 addthis 186.282 tweetmeme 183.603 6 addthis 212.325 ddlovato 181.720 jonasbrothers 182.882 7 ddlovato 204.910 luansantanaevc 167.514 addthis 181.403 8 jordanknight 191.045 jordanknight 167.197 omgfacts 154.136 9 jonasbrothers 175.054 jonasbrothers 165.520 mashable 153.616 10 lilduval 174.616 mashable 164.496 johncmayer 147.241 User rankings vary across different months White-lists can be unstable and quickly become out-of-date UoA Panagiotis Liakos Rhea-• White-lists 11/26
  • 16. Limitations of Static Lists of Authorities 0.4 0.5 0.6 0.7 0.8 0.9 1 0 250 500 750 1000 Precision@K K (authorities) Sept. 2009 & Oct. 2009 Sept. 2009 & Nov. 2009 Sept. 2009 & Dec. 2009 UoA Panagiotis Liakos Rhea-• White-lists 12/26
  • 17. Limitations of Static Lists of Authorities 0.4 0.5 0.6 0.7 0.8 0.9 1 0 250 500 750 1000 Precision@K K (authorities) Sept. 2009 & Oct. 2009 Sept. 2009 & Nov. 2009 Sept. 2009 & Dec. 2009 We need an adaptive algorithm! UoA Panagiotis Liakos Rhea-• White-lists 12/26
  • 18. Rhea: “She who flows” Museum of Fine Arts, Boston UoA Panagiotis Liakos Rhea-• Rhea 13/26
  • 19. Rhea: Three Challenges 1 Maintaining user information may be costly in terms of both memory & CPU 2 Ranking users may require reckoning in multiple measures 3 Many elements we opt to include may be irrelevant UoA Panagiotis Liakos Rhea-• Rhea 14/26
  • 20. Maintaining User Information Count-Min sketch: +ct +ct +ct +ct h1 h2 hd ... it d w count Reducing the processing overhead through sampling: We apply a Bernoulli sampling scheme [PJC+15]. UoA Panagiotis Liakos Rhea-• Rhea 15/26
  • 21. Ranking Authorities We need to know at any time the top-K users by auth(u): Algorithm 1: put(Top-K-Heap, key, value) input : A Top-K-Heap structure and a key associated with a value to be inserted in the Top-K-Heap. output : The updated Top-K-Heap. 1 begin 2 if Top-K-Heap.size() < K then 3 if Top-K-Heap.contains(key) then 4 Top-K-Heap.replace(key, value); 5 else 6 Top-K-Heap.push(key, value); 7 else 8 if Top-K-Heap.contains(key) then 9 Top-K-Heap.replace(key, value); 10 else if value > Top-K-Heap.peek().value() then 11 Top-K-Heap.pop(); 12 Top-K-Heap.push(key, value); 13 return Top-K-Heap; UoA Panagiotis Liakos Rhea-• Rhea 16/26
  • 22. Filtering-out Non-relevant Activity While processing the stream, we may deem as an authority a user that temporarily appears to be one. We lose in precision! Post-processing step: The sample is much smaller than the stream: ˆS S We re-examine the elements of the sample and filter-out the activity of users not in the Top-K-Heap UoA Panagiotis Liakos Rhea-• Rhea 17/26
  • 23. Rhea Forming the network of authorities Sampling the stream Removing irrelevant content Algorithm 2: Rhea(S, K, p) input : A stream S, a parameter K > 0 and a probability p ∈ (0, 1]. output : A set ˆS ⊂ S containing elements whose respective users are likely to be among the top-K w.r.t. to the auth-value. begin T op-K-heap ← ∅; CMSin ← ∅; CMSout ← ∅; foreach s ∈ S do if random(0, 1] < p then (in, out) ← extractIndicators(s.message) ; CMSin[in]+ = 1 ; CMSout[out]+ = 1 ; authuser ← CMSin[s.user]−CMSout[s.user] CMSin[s.user]+CMSout[s.user] ; if authuser > T op-K-heap.low() then T op-K-heap.put(user, authuser); ˆS.put(s); foreach s ∈ ˆS do if s.user /∈ T op-K-heap then ˆS.remove(s); return ˆS; UoA Panagiotis Liakos Rhea-• Rhea 18/26
  • 24. Experimental Evaluation Dataset: 1 467 million tweets from 20 million users of Twitter 2 263, 540 answers to 83, 423 questions posted by 26, 752 users of StackOverflow Questions: 1 How does Rhea compare against white-list based sampling in terms of F1-score? 2 Is Rhea able to assess the ranking relevance of the sampled documents? 3 What is the impact of the parameters involved in the execution of Rhea? UoA Panagiotis Liakos Rhea-• Exeriments 19/26
  • 25. F1-score 0 0.2 0.4 0.6 0.8 1 0 250 500 750 1000 F1-score K (authorities) Rhea (T) WhiteList (T) 0 0.2 0.4 0.6 0.8 1 0 250 500 750 1000 F1-score K (authorities) Rhea (SO) WhiteList (SO) UoA Panagiotis Liakos Rhea-• Exeriments 20/26
  • 26. Normalized Discounted Cumulative Gain 0.4 0.5 0.6 0.7 0.8 0.9 1 0 250 500 750 1000 NDCG K (authorities) Rhea (T) WhiteList (T) 0.4 0.5 0.6 0.7 0.8 0.9 1 0 250 500 750 1000 NDCG K (authorities) Rhea (SO) WhiteList (SO) UoA Panagiotis Liakos Rhea-• Exeriments 21/26
  • 27. Impact of Parameters Varying the Value of Probability p: Using a sample of 20% of S we achieve performance almost as good as that of using S. Using p = 0.2 instead of p = 1 greatly reduces processing time. Removing Filtering Step: Over 25 p.p. for K = 1, 000 and is never less than 10 p.p. for any K examined. UoA Panagiotis Liakos Rhea-• Exeriments 22/26
  • 28. Conclusion Rhea is the 1st adaptive algorithm for sampling authoritative content from social activity streams. We exposed the dynamic nature of the task. We introduced a measure to identify authoritative users. Rhea employs several techniques to achieve significantly improved performance with regard to recall, precision, and ranking accuracy. UoA Panagiotis Liakos Rhea-• Conclusion 23/26
  • 29. References I [ACD+ 08] Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and Gilad Mishne. Finding high-quality content in social media. In Proc. of the Int. Conf. on Web Search and Web Data Mining, WSDM 2008, Palo Alto, California, USA, February 11-12, 2008, pages 183–194, 2008. [BBC+ 13] Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Matteo Silvestri, and Giuliano Vesci. Choosing the right crowd: expert finding in social networks. In Joint 2013 EDBT/ICDT Conferences, EDBT ’13 Proceedings, Genoa, Italy, March 18-22, 2013, pages 637–648, 2013. [GSB+ 12] Saptarshi Ghosh, Naveen Kumar Sharma, Fabr´ıcio Benevenuto, Niloy Ganguly, and P. Krishna Gummadi. Cognos: crowdsourcing search for topic experts in microblogs. In The 35th Int. ACM SIGIR Conf. on research and development in Information Retrieval, SIGIR ’12, Portland, OR, USA, August 12-16, 2012, pages 575–590, 2012. [GZB+ 13] Saptarshi Ghosh, Muhammad Bilal Zafar, Parantapa Bhattacharya, Naveen Kumar Sharma, Niloy Ganguly, and P. Krishna Gummadi. On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream. In 22nd ACM Int. Conf. on Information and Knowledge Management, CIKM’13, San Francisco, CA, USA, October 27 - November 1, 2013, pages 1739–1744, 2013. [JA07] Pawel Jurczyk and Eugene Agichtein. Discovering authorities in question answer communities by using link analysis. In Proc. of the 16th ACM Conf. on Information and Knowledge Management, CIKM 2007, Lisbon, Portugal, November 6-10, 2007, pages 919–922, 2007. [PC11] Aditya Pal and Scott Counts. Identifying topical authorities in microblogs. In Proc. of the 4th International Conference on Web Search and Web Data Mining, WSDM 2011, Hong Kong, China, February 9-12, 2011, pages 45–54, 2011. UoA Panagiotis Liakos Rhea-• References 24/26
  • 30. References II [PJC+ 15] Deepan Subrahmanian Palguna, Vikas Joshi, Venkatesan T. Chakaravarthy, Ravi Kothari, and L. Venkata Subramaniam. Analysis of sampling algorithms for twitter. In Proc. of the 24th Int. Joint Conf. on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pages 967–973, 2015. [WLP+ 12] Claudia Wagner, Vera Liao, Peter Pirolli, Les Nelson, and Markus Strohmaier. It’s not in their tweets: Modeling topical expertise of twitter users. In 2012 Int. Conf. on Privacy, Security, Risk and Trust, PASSAT 2012, and 2012 Int. Conf. on Social Computing, SocialCom 2012, Amsterdam, Netherlands, September 3-5, 2012, pages 91–100, 2012. [ZAA07] Jun Zhang, Mark S. Ackerman, and Lada A. Adamic. Expertise networks in online communities: structure and algorithms. In Proc. of the 16th Int. Conf. on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12, 2007, pages 221–230, 2007. [ZBG+ 16] Muhammad Bilal Zafar, Parantapa Bhattacharya, Niloy Ganguly, Saptarshi Ghosh, and Krishna P. Gummadi. On the wisdom of experts vs. crowds: Discovering trustworthy topical news in microblogs. In Proc. of the 19th ACM Conf. on Computer-Supported Cooperative Work & Social Computing, CSCW 2016, San Francisco, CA, USA, February 27 - March 2, 2016, pages 437–450, 2016. UoA Panagiotis Liakos Rhea-• References 25/26
  • 31. thank you! for further details email me at: p.liakos@di.uoa.gr UoA Panagiotis Liakos Rhea-• Contact 26/26