IC2S2, Amsterdam, July 2019
Felix Victor Münch, Ben Thies, Cornelius Puschmann, Axel Bruns
We release the code and present results of a successful experiment with an adapted version of a network sampling method, the so-called rank degree method, which we modified to create a sample of the most influential accounts in the German-speaking Twitter follow network.
Walking Through Twitter: Sampling a Language-Based Follow NetworkFelix Victor Münch
AoIR Flashpoint Symposium, Urbino, 24. June 2019
Felix Victor Münch, Ben Thies, Cornelius Puschmann, Axel Bruns
We present first results of a successful experiment with an adapted version of a network sampling method, the so-called rank degree method, which we modified to create a sample of the most influential accounts in the German-speaking Twitter follow network.
What makes a bot a bot? Exploring benign automation on TwitterFelix Victor Münch
Presented at Digikomm 2019 in Berlin.
Abstract
We present a study on bot detection and its interpretation by assessing the different types of automation that one of the most popular methods for bot detection, Botometer (https://botometer.iuni.iu.edu/), detects. The study is based on the first project to assess the prevalence, influence, and roles of automated accounts in a Twitter follow network on a national scale: the German-speaking Twittersphere. This work in progress allows us to analyse the long-term structural role, impact, and possible audience of bots beyond the context of single events and topics.
Embeddings-Based Clustering for Target Specific StancesAmmar Rashed
We propose an unsupervised user stance detection method to capture fine grained divergences in a community across various topics. We employ pre-trained universal sentence encoders to represent users based on the content of their tweets on a particular topic. User vectors are projected onto a lower dimensional space using UMAP, then clustered using HDBSCAN. Our method performs better than previous approaches on two datasets in different domains, achieving precision and recall scores ranging between 0.89 and 0.97. We compiled a dataset of more than 300k tweets about UEFA Super Cup's 2019 final, and tagged 12k users as Liverpool FC or Chelsea FC fans. We utilized our method to analyze the stances of Twitter users noting a correlation between user stances towards various polarizing issues. We used the resultant clusters to quantify the polarization in various topics, and analyze the semantic divergence between clusters.
Walking Through Twitter: Sampling a Language-Based Follow NetworkFelix Victor Münch
AoIR Flashpoint Symposium, Urbino, 24. June 2019
Felix Victor Münch, Ben Thies, Cornelius Puschmann, Axel Bruns
We present first results of a successful experiment with an adapted version of a network sampling method, the so-called rank degree method, which we modified to create a sample of the most influential accounts in the German-speaking Twitter follow network.
What makes a bot a bot? Exploring benign automation on TwitterFelix Victor Münch
Presented at Digikomm 2019 in Berlin.
Abstract
We present a study on bot detection and its interpretation by assessing the different types of automation that one of the most popular methods for bot detection, Botometer (https://botometer.iuni.iu.edu/), detects. The study is based on the first project to assess the prevalence, influence, and roles of automated accounts in a Twitter follow network on a national scale: the German-speaking Twittersphere. This work in progress allows us to analyse the long-term structural role, impact, and possible audience of bots beyond the context of single events and topics.
Embeddings-Based Clustering for Target Specific StancesAmmar Rashed
We propose an unsupervised user stance detection method to capture fine grained divergences in a community across various topics. We employ pre-trained universal sentence encoders to represent users based on the content of their tweets on a particular topic. User vectors are projected onto a lower dimensional space using UMAP, then clustered using HDBSCAN. Our method performs better than previous approaches on two datasets in different domains, achieving precision and recall scores ranging between 0.89 and 0.97. We compiled a dataset of more than 300k tweets about UEFA Super Cup's 2019 final, and tagged 12k users as Liverpool FC or Chelsea FC fans. We utilized our method to analyze the stances of Twitter users noting a correlation between user stances towards various polarizing issues. We used the resultant clusters to quantify the polarization in various topics, and analyze the semantic divergence between clusters.
Marius Ivanovas, Head of Global Performance Division, Httpool
Amplify user acquisition and retention with Twitter. Performance based platform insights that will help to boost your results. Case-studies and examples.
Trend Detection and Analysis on Twitter
Leveraging Twitter's Streaming API to collect over 80 million tweets and extracting trends.
Project for first project paper during master studies.
Together with Lukas Masuch and Benjamin Raethlein
by Lukas Masuch, Henning Muszynski and Benjamin Raethlein
The popularity of social media services has increased exponentially in the last few years. The combination of big social data and powerful analytical technologies makes it possible to gain highly valuable insights that otherwise might not be accessible. The Twitter Analyzer comprises several components to collect, analyze and visualize Twitter data. Therefore, we explored various related technologies to implement this tool. We collected about 38 million english tweets related to various and analyzed those data with machine learning techniques to compute the respective sentiment and detect common topics. Furthermore, we visualized the results using varying visualization techniques to emphasize different aspects such as a wordcloud, several chart-types and geospatial visualizations. Used technologies: MongoDB, Python, Twython, Python NLTK, wordcloud2.js, wordfreq, amCharts, Google BigQuery, Google Cloud Storage, CartoDB, EtcML
Attend2trend: Attention Model for Real-Time Detecting and Forecasting of Tren...Ahmed Saleh
Knowing what is increasing in popularity is important to researchers, news organizations, auditors, government entities and more. In particular, knowledge of trending topics provides us with information about what people are attracted to and what they think is noteworthy. Yet detecting trending topics from a set of texts is a difficult task, requiring detectors to learn trending patterns while simultaneously making predictions.
In this paper, we propose a deep learning model architecture for the challenging task of trend detection and forecasting. The model architecture aims to learn and attend to the trending values' patterns. Our preliminary results show that our model detects the trending topics with a high accuracy.
In social networks, where users send messages to each other, the issue of what triggers communication between unrelated users arises: does communication between previously unrelated users depend on friend-of-a-friend type of relationships, common interests, or other factors? In this work, we study the problem of predicting directed communication
intention between two users. Link prediction is similar to communication intention in that it uses network structure for prediction. However, these two problems exhibit fundamental
differences that originate from their focus. Link prediction uses evidence to predict network structure evolution, whereas our focal point is directed communication initiation between
users who are previously not structurally connected. To address this problem, we employ topological evidence in conjunction to transactional information in order to predict communication intention. It is not intuitive whether methods that work well for
link prediction would work well in this case. In fact, we show in this work that network or content evidence, when considered separately, are not sufficiently accurate predictors. Our novel approach, which jointly considers local structural properties of users in a social network, in conjunction with their generated content, captures numerous interactions, direct and indirect, social and contextual, which have up to date been considered independently. We performed an empirical study to evaluate our method using an extracted network of directed @-messages sent between users of a corporate microblogging service, which resembles Twitter. We find that our method outperforms state of the art techniques for link prediction. Our findings have implications for a wide range of social web applications, such as contextual expert recommendation for Q&A, new friendship relationships creation, and targeted content delivery.
ML&AI APPROACH TO USER UNDERSTANDING ECOSYSTEM AT VCCORP Applications to News...Tuan Hoang
Introduce the ML&AI approach to user understanding at VCCORP with many applications, including news distribution, recommendation engine in news, e-commerce, and the advertising technology
Invited talk at Session on Semantic Knowledge for Commodity Computing, at Microsoft Research Faculty Summit 2011, July 19-20, 2011, Redmond, WA. http://research.microsoft.com/en-us/events/fs2011/default.aspx
Associated video at: https://youtu.be/HKqpuLiMXRs
Extracting Social Network Data and Multimedia Communications from Social Medi...Shalin Hai-Jew
This presentation provides an overview of some of the data extractions that may be achieved on social media platforms using their respective APIs and a free open-source tool (NodeXL).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Marius Ivanovas, Head of Global Performance Division, Httpool
Amplify user acquisition and retention with Twitter. Performance based platform insights that will help to boost your results. Case-studies and examples.
Trend Detection and Analysis on Twitter
Leveraging Twitter's Streaming API to collect over 80 million tweets and extracting trends.
Project for first project paper during master studies.
Together with Lukas Masuch and Benjamin Raethlein
by Lukas Masuch, Henning Muszynski and Benjamin Raethlein
The popularity of social media services has increased exponentially in the last few years. The combination of big social data and powerful analytical technologies makes it possible to gain highly valuable insights that otherwise might not be accessible. The Twitter Analyzer comprises several components to collect, analyze and visualize Twitter data. Therefore, we explored various related technologies to implement this tool. We collected about 38 million english tweets related to various and analyzed those data with machine learning techniques to compute the respective sentiment and detect common topics. Furthermore, we visualized the results using varying visualization techniques to emphasize different aspects such as a wordcloud, several chart-types and geospatial visualizations. Used technologies: MongoDB, Python, Twython, Python NLTK, wordcloud2.js, wordfreq, amCharts, Google BigQuery, Google Cloud Storage, CartoDB, EtcML
Attend2trend: Attention Model for Real-Time Detecting and Forecasting of Tren...Ahmed Saleh
Knowing what is increasing in popularity is important to researchers, news organizations, auditors, government entities and more. In particular, knowledge of trending topics provides us with information about what people are attracted to and what they think is noteworthy. Yet detecting trending topics from a set of texts is a difficult task, requiring detectors to learn trending patterns while simultaneously making predictions.
In this paper, we propose a deep learning model architecture for the challenging task of trend detection and forecasting. The model architecture aims to learn and attend to the trending values' patterns. Our preliminary results show that our model detects the trending topics with a high accuracy.
In social networks, where users send messages to each other, the issue of what triggers communication between unrelated users arises: does communication between previously unrelated users depend on friend-of-a-friend type of relationships, common interests, or other factors? In this work, we study the problem of predicting directed communication
intention between two users. Link prediction is similar to communication intention in that it uses network structure for prediction. However, these two problems exhibit fundamental
differences that originate from their focus. Link prediction uses evidence to predict network structure evolution, whereas our focal point is directed communication initiation between
users who are previously not structurally connected. To address this problem, we employ topological evidence in conjunction to transactional information in order to predict communication intention. It is not intuitive whether methods that work well for
link prediction would work well in this case. In fact, we show in this work that network or content evidence, when considered separately, are not sufficiently accurate predictors. Our novel approach, which jointly considers local structural properties of users in a social network, in conjunction with their generated content, captures numerous interactions, direct and indirect, social and contextual, which have up to date been considered independently. We performed an empirical study to evaluate our method using an extracted network of directed @-messages sent between users of a corporate microblogging service, which resembles Twitter. We find that our method outperforms state of the art techniques for link prediction. Our findings have implications for a wide range of social web applications, such as contextual expert recommendation for Q&A, new friendship relationships creation, and targeted content delivery.
ML&AI APPROACH TO USER UNDERSTANDING ECOSYSTEM AT VCCORP Applications to News...Tuan Hoang
Introduce the ML&AI approach to user understanding at VCCORP with many applications, including news distribution, recommendation engine in news, e-commerce, and the advertising technology
Invited talk at Session on Semantic Knowledge for Commodity Computing, at Microsoft Research Faculty Summit 2011, July 19-20, 2011, Redmond, WA. http://research.microsoft.com/en-us/events/fs2011/default.aspx
Associated video at: https://youtu.be/HKqpuLiMXRs
Extracting Social Network Data and Multimedia Communications from Social Medi...Shalin Hai-Jew
This presentation provides an overview of some of the data extractions that may be achieved on social media platforms using their respective APIs and a free open-source tool (NodeXL).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Mining Influencers in the German Twittersphere – Mapping a Language-Based Follow Network
1. Mining Influencers in the
German Twittersphere
Mapping a Language-Based Follow Network
Felix Victor Münch1
, Ben Thies1
, Cornelius Puschmann1
, Axel Bruns2
1
Leibniz Institute for Media Research | Hans-Bredow-Institut (HBI), Germany
2
Digital Media Research Centre (DMRC), Queensland University of Technology (QUT), Australia
IC2
S2
, July 2019, Amsterdam
2. Problem
‘I can’t get no population …’
● follow/friend networks of
Online Social Networks (OSNs)
arguably main predictor for
content exposure (despite
sponsored content and
algorithmic sorting of
timelines)
● APIs too restrictive for
collection of comprehensive
follow networks
3. Opportunities
Possible Research Questions
● “Are there filter bubbles/echo
chambers?”
● “Are there social and/or topical
communities/issue publics?”
● “Who can spread which content
most efficiently?”
● “Can we predict the spread of
(fake) news?”
● “Who let the bots out? And
where?”
5. The Australian Twittersphere
Bruns, A., Moon, B., Münch, F. V., & Sadkowsky, T. (2017). The Australian Twittersphere in 2016:
mapping the follower/followee network. Social Media + Society, 3(4).
https://doi.org/10.1177/2056305117748162
Münch, F. V. (2019). Measuring the Networked Public – Exploring Network Science Methods
for Large Scale Online Media Studies. Queensland University of Technology.
https://doi.org/10.5204/thesis.eprints.125543
7. Our adaptation of the ‘rank degree’ method
Based on: Salamanos, N., Voudigari, E., & Yannakoudakis, E. J. (2017). Deterministic graph exploration for efficient graph sampling. Social
Network Analysis and Mining, 7(1), 24. https://doi.org/10.1007/s13278-017-0441-6
Bottom: Original graph without walked edges. Starting nodes (seeds) are drawn randomly (1) and
walker move to their friend with the highest in-degree (2-6). Walked edges get removed/‘burned’.
Top: Current sample at each step. Walked (and symmetric) edges are added to sample.
1 2 3 4 5 6
8. Our adaptation of the ‘rank degree’ method
Main adaptations (amongst others) mostly due to API restrictions:
● Undirected → Directed
● Fewer walkers (200)
● Walkers do not collapse when ending up on the same node
● Last 5000 friends only
● ‘Degree’ stays constant and is equal to follower count reported by Twitter API
● Only collect edges to accounts with German as their interface language (not possible
anymore due to API changes ¯_(ツ)_/¯ … we work on a solution.)
12. Coverage
Distribution of public accounts with > 1 friend in the test sample over the percentage of their friends that
can be found in the influencer sample (left, filtered for in-degree >= 1, leaving 199,180 accounts) / baseline
sample (right, same size, randomly drawn from German accounts in global dataset)
13. Coverage
Ranked distribution of the percentage of ‘friends’ of accounts in a random sample
(n=1000, filtered for having at least 2 ‘friends’) found in our influencer sample (excl. nodes
with in-degree 0) and a random sample of the same size (181k accounts)
14. Reach
Ranked distribution of the percentage of accounts reached in a random sample (n=1000,
filtered for having at least 2 ‘friends’) by accounts in our influencer sample (excl. nodes
with in-degree 0) and by accounts in a random sample of the same size (181k accounts)
16. Community detection with
infomap
3-core of our sample network;
coloured by communities detected
with the infomap community
detection algorithm;
node size represents Page Rank
18. Tagged Community graph Community graph of communities
in the 3-core of our sample with
over 300 accounts, at least 80 active
accounts during the examined time
frame, and edges with a weight of at
least 150; edge width represents
weight; edge direction follows
clockwise curvature; edges coloured
by source node; node size represents
the number of accounts in each
community
19. Outlook
● Adaptation to API changes
● Bootstrapping the seed pool
● Other language-based spheres
● Topical mining
● Other community detection methods
● Bot detection
22. Interesting 🤔
Münch, F. V. (2019).
Measuring the Networked Public – Exploring Network
Science Methods for Large Scale Online Media Studies.
Queensland University of Technology.
https://doi.org/10.5204/thesis.eprints.125543
23. Coverage
Count, mean, standard deviation, minimum, quartiles, and maximum of the number of
friends and the percentages of friends in the influencer and baseline sample for public
accounts in the test sample with at least 2 friends.
n = 597 number of friends
percent of friends in
influencer sample
percent of friends in
baseline sample
mean 57 40 0.54
std 160 30 2.7
min 2 0 0
25% 7 11 0
50% 18 40 0
75% 42 65 0
max 1988 100 50
24. Twitter is not representative for general population – but for itself
https://www.pewinternet.org/2019/04/24/sizing-up-twitter-users/
25. Dominance of an active and influential elite
https://www.pewinternet.org/2019/04/24/sizing-up-twitter-users/