Presented at Digikomm 2019 in Berlin.
Abstract
We present a study on bot detection and its interpretation by assessing the different types of automation that one of the most popular methods for bot detection, Botometer (https://botometer.iuni.iu.edu/), detects. The study is based on the first project to assess the prevalence, influence, and roles of automated accounts in a Twitter follow network on a national scale: the German-speaking Twittersphere. This work in progress allows us to analyse the long-term structural role, impact, and possible audience of bots beyond the context of single events and topics.
Mining Influencers in the German Twittersphere – Mapping a Language-Based Fol...Felix Victor Münch
IC2S2, Amsterdam, July 2019
Felix Victor Münch, Ben Thies, Cornelius Puschmann, Axel Bruns
We release the code and present results of a successful experiment with an adapted version of a network sampling method, the so-called rank degree method, which we modified to create a sample of the most influential accounts in the German-speaking Twitter follow network.
Walking Through Twitter: Sampling a Language-Based Follow NetworkFelix Victor Münch
AoIR Flashpoint Symposium, Urbino, 24. June 2019
Felix Victor Münch, Ben Thies, Cornelius Puschmann, Axel Bruns
We present first results of a successful experiment with an adapted version of a network sampling method, the so-called rank degree method, which we modified to create a sample of the most influential accounts in the German-speaking Twitter follow network.
Mining Influencers in the German Twittersphere – Mapping a Language-Based Fol...Felix Victor Münch
IC2S2, Amsterdam, July 2019
Felix Victor Münch, Ben Thies, Cornelius Puschmann, Axel Bruns
We release the code and present results of a successful experiment with an adapted version of a network sampling method, the so-called rank degree method, which we modified to create a sample of the most influential accounts in the German-speaking Twitter follow network.
Walking Through Twitter: Sampling a Language-Based Follow NetworkFelix Victor Münch
AoIR Flashpoint Symposium, Urbino, 24. June 2019
Felix Victor Münch, Ben Thies, Cornelius Puschmann, Axel Bruns
We present first results of a successful experiment with an adapted version of a network sampling method, the so-called rank degree method, which we modified to create a sample of the most influential accounts in the German-speaking Twitter follow network.
Tweetfix is a visualization platform, developed for the Fix the Fixing european project, where users can explore the results of crowdsourced data analytics from Social Media on well-known Match Fixing cases.
Сервис для визуализации данных Infogr.am представляет собой сегодня один из самых удобных и простых онлайн-инструментов. С его помощью практически любой пользователь, не обладая специальными знаниями, может создавать интерактивные диаграммы разных видов, графики, таблицы и так далее.
"How Social Media Monitoring Can Help Corporate Communication" by Nadine Jako...UNECE Statistics
This is the presentation by Nadine Jakobs (Statistics Germany) made at the Work Session on the Communication of Statistics 2011 (organized by the United Nations Economic Commission for Europe).
You can download all papers and presentations from the work session at http://live.unece.org/stats/documents/2011.06.dissemination.html
Extracting interesting concepts from large-scale textual dataVasileios Lampos
Over the past few years large-scale textual resources, such as news articles, social media, search queries or even digitised books, have been the centre of various research efforts. In particular, the open nature of the microblogging platform of Twitter provided the unique opportunity for various appealing ideas to be evaluated. Based on the hypothesis that this online stream of content should represent at least a biased fraction of real-world situations or opinions, we have proposed core algorithms for estimating the current rate of an infectious disease, such as influenza, or even of a natural, less stable phenomenon like rainfall rates. A simplified emotion analysis on a longitudinal set of tweets revealed interesting patterns, including signs of rising anger and fear before the UK riots in August, 2011. A similar analysis on the Google N-gram book database uncovered collective patterns of affect over the course of the 20th century. Through the extension of linear text regression approaches by the addition of a user dimension, we proposed a family of bilinear regularised regression models, which found application in the approximation of voting intention trends. Finally, we reversed the previous modelling principle in an attempt to qualitatively analyse user attributes or behaviours.
It's 2015 and MEPs use social media. Is that news? Probably not. But how do MEPs use online tools and more generally, how do they consume information and prefer to interact with stakeholders? Follow #MEPDigital and @FleishmanEU for a steady stream of insights on what these results mean for public affairs professionals.
Sentiment analysis - Our approach and use casesKarol Chlasta
I. Introduction to Sentiment Analysis and its applications.
II. How to approach Sentiment Analysis?
III. 2015 Elections in Poland on Twitter.com & Onet.pl.
SmartData Webinar Slides: How to analyze 72 billion messages a day to find tr...DATAVERSITY
There is an overload of stream data that has led to interest in Big Data, while mostly resulting in a signal-to-noise problem. There is not enough attention in the world, nor enough analyst time to keep up with this deluge of data. Most Big Data tools available today are not up to the task. A radical new form of information retrieval is called for. In this webinar, we will show how we envision the future of automated insight discovery. We will show a very fast interactive analytics engine that allows for slicing and dicing data in many ways. We then go a step further to systematically walk through all these analytics - brute force style - to generate what we call "trends."
Twitter Sentiment Analysis Project Done using R.
In these Project we deal with the tweets database that are avaialble to us by the Twitter. We clean the tweets and break them out into tokens and than analysis each word using Bag of Word concept and than rate each word on the basis of the score wheter it is positive, negative and neutral.
We used Naive Baye's Classifier as our base.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Tweetfix is a visualization platform, developed for the Fix the Fixing european project, where users can explore the results of crowdsourced data analytics from Social Media on well-known Match Fixing cases.
Сервис для визуализации данных Infogr.am представляет собой сегодня один из самых удобных и простых онлайн-инструментов. С его помощью практически любой пользователь, не обладая специальными знаниями, может создавать интерактивные диаграммы разных видов, графики, таблицы и так далее.
"How Social Media Monitoring Can Help Corporate Communication" by Nadine Jako...UNECE Statistics
This is the presentation by Nadine Jakobs (Statistics Germany) made at the Work Session on the Communication of Statistics 2011 (organized by the United Nations Economic Commission for Europe).
You can download all papers and presentations from the work session at http://live.unece.org/stats/documents/2011.06.dissemination.html
Extracting interesting concepts from large-scale textual dataVasileios Lampos
Over the past few years large-scale textual resources, such as news articles, social media, search queries or even digitised books, have been the centre of various research efforts. In particular, the open nature of the microblogging platform of Twitter provided the unique opportunity for various appealing ideas to be evaluated. Based on the hypothesis that this online stream of content should represent at least a biased fraction of real-world situations or opinions, we have proposed core algorithms for estimating the current rate of an infectious disease, such as influenza, or even of a natural, less stable phenomenon like rainfall rates. A simplified emotion analysis on a longitudinal set of tweets revealed interesting patterns, including signs of rising anger and fear before the UK riots in August, 2011. A similar analysis on the Google N-gram book database uncovered collective patterns of affect over the course of the 20th century. Through the extension of linear text regression approaches by the addition of a user dimension, we proposed a family of bilinear regularised regression models, which found application in the approximation of voting intention trends. Finally, we reversed the previous modelling principle in an attempt to qualitatively analyse user attributes or behaviours.
It's 2015 and MEPs use social media. Is that news? Probably not. But how do MEPs use online tools and more generally, how do they consume information and prefer to interact with stakeholders? Follow #MEPDigital and @FleishmanEU for a steady stream of insights on what these results mean for public affairs professionals.
Sentiment analysis - Our approach and use casesKarol Chlasta
I. Introduction to Sentiment Analysis and its applications.
II. How to approach Sentiment Analysis?
III. 2015 Elections in Poland on Twitter.com & Onet.pl.
SmartData Webinar Slides: How to analyze 72 billion messages a day to find tr...DATAVERSITY
There is an overload of stream data that has led to interest in Big Data, while mostly resulting in a signal-to-noise problem. There is not enough attention in the world, nor enough analyst time to keep up with this deluge of data. Most Big Data tools available today are not up to the task. A radical new form of information retrieval is called for. In this webinar, we will show how we envision the future of automated insight discovery. We will show a very fast interactive analytics engine that allows for slicing and dicing data in many ways. We then go a step further to systematically walk through all these analytics - brute force style - to generate what we call "trends."
Twitter Sentiment Analysis Project Done using R.
In these Project we deal with the tweets database that are avaialble to us by the Twitter. We clean the tweets and break them out into tokens and than analysis each word using Bag of Word concept and than rate each word on the basis of the score wheter it is positive, negative and neutral.
We used Naive Baye's Classifier as our base.
Similar to What makes a bot a bot? Exploring benign automation on Twitter (20)
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
The affect of service quality and online reviews on customer loyalty in the E...
What makes a bot a bot? Exploring benign automation on Twitter
1. What makes a bot a bot? Exploring benign
automation on Twitter
Dr. Felix Victor Münch1
; Ben Thies1
, B.A.; Dr. Cornelius Puschmann1
; Dr. Axel Bruns2
1
Leibniz Institute for Media Research | Hans-Bredow-Institut (HBI), Germany
2
Digital Media Research Centre (DMRC), Queensland University of Technology (QUT), Australia
4. Background
● Most Twitter-related bot research usually focuses on activity by examining
Tweets, Hashtags, Keywords, @-mentions, etc.*
● Most research focuses on malign automation, sometimes even assumes
that most automation is malign, and often cannot estimate its
impact
This research is focused on the Twitter follow network, takes a nuanced
approach to automation and assesses bot impact via the Twitter follow
network
*
https://www.pewinternet.org/2018/04/09/bots-in-the-twittersphere/
5. Research Goals
● provide an overview of bot
prevalence, influence, and
roles of automated accounts in
the German-speaking
Twittersphere
● develop and test
tools/methods that enable
such analyses for other
languages as well
● do some groundwork for a
theory of social
automation by cataloguing
suspected bots
6. Method
1. Create a language-based (German) Twitter follow network sample of most
influential accounts (method preprint: https://bit.ly/twitterwalk)
2. Use Botometer to estimate automation probabilities and apply probability
threshold of 0.75 to mark a suspected bot
3. Manually check 100 suspected bots prioritised by Page Rank and code them
inductively
8. Our adaptation of the ‘rank degree’ method
More details in our preprint: http://bit.ly/twitterwalk
Bottom: Original graph without walked edges. Starting nodes (seeds) are drawn randomly (1) and
walker move to their friend with the highest in-degree (2-6). Walked edges get removed/‘burned’.
Top: Current sample at each step. Walked (and symmetric) edges are added to sample.
1 2 3 4 5 6
9. Coverage
Distribution of public accounts with > 1 friend in the test sample over the percentage of their friends that
can be found in the influencer sample (left, filtered for in-degree >= 1, leaving 199,180 accounts) / baseline
sample (right, same size, randomly drawn from German accounts in global dataset)
10. (A Sample of) The German Twittersphere
● ~ 1 Mio Accounts, 1.6
Mio. Edges
● 6 Months of collection
using API authentication
keys of 10 collaborators
● Figure: 3-core of sample
network, coloured by
communities detected
with infomap. Node size
indicates Page Rank
12. Tagged Community graph Community graph of communities
in the 3-core of our sample with
over 300 accounts, at least 80 active
accounts during the examined time
frame, and edges with a weight of at
least 150; edge width represents
weight; edge direction follows
clockwise curvature; edges coloured
by source node; node size represents
the number of accounts in each
community
13. Step 2: Use Botometer1
to identify suspected Bots
1
https://botometer.iuni.iu.edu
25. Preliminary results
● Prevalence of (suspected) automated accounts in line with prior analyses 1
● Low centrality of automated accounts (usually outside central clusters)
● Automated accounts are primarily located outside of politics and news clusters
● Most automated accounts are benign
● Botometer: false-positive rate higher than selected threshold (58% vs. 75% set
threshold) according to human ratings.
● But: good approach to inductively explore account-categories and test detection
methods
Overall, the negative impact of automated accounts seems rather low in our
sample.
1
Varol, O., Ferrara, E., Davis, C. A., Menczer, F., & Flammini, A. (2017). Online Human-Bot Interactions: Detection,
Estimation, and Characterization. In International AAAI Conference on Web and Social Media. Retrieved from
https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15587/14817
26. (Selected) Limitations / What to do next? / Outlook
This study Other studies In general
Sampling ● Method still
experimental
● Missing peripheral
auto accounts with
low impact (‘bot
swarms’)
Activity/keyword based or
random
● Impact/activity
assessment
problematic
● ‘SEO problem’: bots
optimise on
keywords
● Population unknown
● No concept for
representativity in
networks/media
environments with highly
skewed distributions of
degree/attention
Bot
detection
● Needs triangulation
with other
automated tools
● Needs follow-up
with multiple coders
● ‘Bad Bot’ stigma:
Quiet assumption
that automation is
malign
● Too much trust in
automated tools
● Automated tools often
developed and trained
without a working theory
of social automation
● Binary concept of ‘botness’
instead of
multidimensional
approach to
(semi-)automation