Socialbots are software programs that mimic human users on social networks. This document describes research modeling the susceptibility of users to interactions with socialbots. The researchers conducted experiments on data from a socialbot challenge to predict which users would interact with bots and how susceptible users were. Features like network position, linguistic style, and behavioral patterns were used. Users who were more social, communicative, emotionally expressive, and active on Twitter were found to be more likely to interact with socialbots. Predicting susceptibility levels was challenging with the available data.
There are lots of tools emerging that appear to give us wonderful statistics and data about Twitter and it’s hard to know which data we actually want and how we want to receive it.
As Twitter's API has been undergoing a few changes recently, we wanted to give an overview of the information that you can still get from the platform itself and then provide some guidance on the best way to measure the data.
There are four main areas of Twitter data:
1. User data - relates to the user who posted the message.
2. Friend and follower data - relates to the relationship a user has to other users.
3. Tweet data - all the details and content relating to a particular tweet.
4. Places and Geographic data - the geographic and location based aspects relating to a person or twee.
To measure the data there are also four main measurements that we use to understand the impact of activities on Twitter:
1. Impressions - aggregated users exposed to messages.
2. Reach - number of unique users exposed to a message.
3. Frequency - number of times each unique user reached is exposed to a message.
4. Relevancy - reach to specific demographics.
When it comes to the ROI of these messages it's important to think about how they compare to your other channels in terms of reach and impressions.
Take a look at the presentation below - we hope it helps to reveal some of the Twitter data you can access and ways in which you might go about measuring it.
The document summarizes a research study that examined how the background knowledge of audiences on Twitter can help analyze the semantics of messages in Twitter streams. The researchers collected data from different Twitter streams over time, selected audiences for the streams, and estimated the background knowledge of audiences in different ways. They then evaluated how well the background knowledge helped predict hashtags of future tweets. The results showed the audience of a stream can provide useful knowledge, and streams with stable, interconnected communities tended to have more useful audiences.
Datascience Introduction WebSci Summer School 2014Claudia Wagner
This document provides an overview of key concepts in data science and statistical analysis. It discusses the different activities involved in a typical data science project, including data collection, preparation, analysis, visualization, and preservation. Various data types and scales of measurement are defined. Common statistical and machine learning techniques are explained, such as clustering, dimensionality reduction, and regression. Potential biases and issues in data collection and analysis are also addressed. The document aims to give readers a well-rounded introduction to the data science process and some important statistical concepts.
It’s not in their tweets: Modeling topical expertise of Twitter users Claudia Wagner
This document discusses a study on modeling the topical expertise of Twitter users. It finds that:
1) User lists and bio information are most useful for humans to judge the expertise of Twitter users, while tweets and retweets are less useful.
2) Topic distributions based on user lists best reflect the underlying topics of a user and are most useful for classifying users into topical categories.
3) While all user data provides some topical information, lists are the most distinct and useful for building models of users' topical expertise.
This document discusses using online recipe data to analyze spatiotemporal dietary patterns. It proposes analyzing whether food preferences are more similar between geographically close regions than distant ones, and how weekday and season impact users' diets. The data comes from the server logs of Austria's most popular online recipe platform from 2012-2013, containing around 180,000 recipes and 1,700 regions. Next steps discussed include obtaining more recipe data, estimating nutritional values, and modeling the dynamics and spreading of food preference popularity over time, regions, weekdays and seasons.
When politicians talk: Assessing online conversational practices of political...Claudia Wagner
Politicians communicate differently on social media based on their party and country. Studies have found cross-party and cross-country differences in how politicians tag, retweet, and mention others depending on their cultural focus, similarity, and methods of cultural reproduction. Politicians must understand these dynamics to effectively communicate their messages.
Towards Maximising Cross-Community Information DiffusionVáclav Belák
The document summarizes research on maximizing information diffusion across online communities. It aims to spread a message across an information flow network by targeting influential communities. The researchers define methods to measure community impact and propose targeting based on impact and entropy. Evaluation on two datasets shows their impact focus approach outperforms others for small numbers of targeted communities and seed users, achieving diffusion of information to 80% of users or communities.
There are lots of tools emerging that appear to give us wonderful statistics and data about Twitter and it’s hard to know which data we actually want and how we want to receive it.
As Twitter's API has been undergoing a few changes recently, we wanted to give an overview of the information that you can still get from the platform itself and then provide some guidance on the best way to measure the data.
There are four main areas of Twitter data:
1. User data - relates to the user who posted the message.
2. Friend and follower data - relates to the relationship a user has to other users.
3. Tweet data - all the details and content relating to a particular tweet.
4. Places and Geographic data - the geographic and location based aspects relating to a person or twee.
To measure the data there are also four main measurements that we use to understand the impact of activities on Twitter:
1. Impressions - aggregated users exposed to messages.
2. Reach - number of unique users exposed to a message.
3. Frequency - number of times each unique user reached is exposed to a message.
4. Relevancy - reach to specific demographics.
When it comes to the ROI of these messages it's important to think about how they compare to your other channels in terms of reach and impressions.
Take a look at the presentation below - we hope it helps to reveal some of the Twitter data you can access and ways in which you might go about measuring it.
The document summarizes a research study that examined how the background knowledge of audiences on Twitter can help analyze the semantics of messages in Twitter streams. The researchers collected data from different Twitter streams over time, selected audiences for the streams, and estimated the background knowledge of audiences in different ways. They then evaluated how well the background knowledge helped predict hashtags of future tweets. The results showed the audience of a stream can provide useful knowledge, and streams with stable, interconnected communities tended to have more useful audiences.
Datascience Introduction WebSci Summer School 2014Claudia Wagner
This document provides an overview of key concepts in data science and statistical analysis. It discusses the different activities involved in a typical data science project, including data collection, preparation, analysis, visualization, and preservation. Various data types and scales of measurement are defined. Common statistical and machine learning techniques are explained, such as clustering, dimensionality reduction, and regression. Potential biases and issues in data collection and analysis are also addressed. The document aims to give readers a well-rounded introduction to the data science process and some important statistical concepts.
It’s not in their tweets: Modeling topical expertise of Twitter users Claudia Wagner
This document discusses a study on modeling the topical expertise of Twitter users. It finds that:
1) User lists and bio information are most useful for humans to judge the expertise of Twitter users, while tweets and retweets are less useful.
2) Topic distributions based on user lists best reflect the underlying topics of a user and are most useful for classifying users into topical categories.
3) While all user data provides some topical information, lists are the most distinct and useful for building models of users' topical expertise.
This document discusses using online recipe data to analyze spatiotemporal dietary patterns. It proposes analyzing whether food preferences are more similar between geographically close regions than distant ones, and how weekday and season impact users' diets. The data comes from the server logs of Austria's most popular online recipe platform from 2012-2013, containing around 180,000 recipes and 1,700 regions. Next steps discussed include obtaining more recipe data, estimating nutritional values, and modeling the dynamics and spreading of food preference popularity over time, regions, weekdays and seasons.
When politicians talk: Assessing online conversational practices of political...Claudia Wagner
Politicians communicate differently on social media based on their party and country. Studies have found cross-party and cross-country differences in how politicians tag, retweet, and mention others depending on their cultural focus, similarity, and methods of cultural reproduction. Politicians must understand these dynamics to effectively communicate their messages.
Towards Maximising Cross-Community Information DiffusionVáclav Belák
The document summarizes research on maximizing information diffusion across online communities. It aims to spread a message across an information flow network by targeting influential communities. The researchers define methods to measure community impact and propose targeting based on impact and entropy. Evaluation on two datasets shows their impact focus approach outperforms others for small numbers of targeted communities and seed users, achieving diffusion of information to 80% of users or communities.
This document discusses the implications of Web 2.0 and social media for IT organizations and employees. It defines concepts like blogs, microblogs, social networks, social bookmarking, wikis and RSS. It notes that employees were not traditionally trained to use these tools. The document also discusses how Web 2.0 principles can be applied within companies in an "Enterprise 2.0" model. It lists trends in IT like openness and socialization. The workshop assignment is for participants to consider how these tools and concepts apply to their work and to identify implications for their IT department and information sharing.
The document discusses the 10/3 Instructional Model which focuses on (1) power standards that are the most essential standards students need to learn, (2) using Web 2.0 tools like blogs, wikis and podcasts to make content more engaging for students, and (3) preparing students for future technologies like constant connectivity, 3D environments, cloud computing and simulations through integrating practices like mobile apps and online simulations.
The 10/3 Instructional Model focuses on power standards, web 2.0 technology tools, and aspects of future web 3.0. It prioritizes essential standards and recommends aligning courses to 10 modules focusing on power standards. It encourages using proprietary and freeware web 2.0 collaboration tools to engage students with content. While web 3.0 is undefined, the model suggests preparing students by integrating mobile apps, cloud computing, simulations and other emerging web 3.0 technologies into instructional design and practice.
Evan introduces himself and his work on open source microblogging software projects. He created Laconica, an open source PHP/MySQL microblogging platform that implements the Twitter API. He also developed the OpenMicroblogging protocol for distributed microblogging across servers and the StatusNet software-as-a-service platform. His goal is for open microblogging software to allow enterprises, brands, and communities to have decentralized alternatives to centralized platforms like Twitter.
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...ajmalik
The document discusses how semantic technologies can be used to build citizen-friendly versions of Recovery.gov and Data.gov in line with the Obama administration's goals of transparency, openness, and collaboration. It begins with an introduction to semantic technologies and how they can model knowledge and add intelligence. It then discusses how these technologies, combined with cloud computing and Web 2.0 approaches, can enhance sites like Recovery.gov and Data.gov to better serve citizens. The presentation concludes with a call to action to demonstrate semantic solutions for these sites.
Web 2.0 Measurement: Open Government Innovations ConferenceAndrew Krzmarzick
Presentation delivered at the Open Government and Innovations (OGI) Conference in Washington, DC, on July 22, 2009. Outlines the ways in which government has measured its web presence in a "1.0" context, including an overview of the measurement activities conducted by Brookings Institution, Foresee, Forrester and the e-Government Act of 2002.
2020 Social Workshop on Social Media Strategy for CXOs2020 Social
The document outlines an agenda for a 2020 Social Workshop on social media strategy for CXOs. The workshop consists of 4 sessions: Introduction, Strategy, Tactics, and Wrap-Up. Session 1 provides an introduction to social technologies and how they are changing people and society. It discusses various social platforms and how to understand them. Session 2 focuses on social media strategy, including how marketing is evolving from a TV-centric model to a community-driven approach. Key concepts around building online communities and scaling passion are also presented.
Invited talk at Future Networked Technologies / FIT-IT research calls opening...Paolo Massa
The document summarizes research on trust in recommender systems and social networks. It discusses using trust networks and trust metrics to power trust-aware recommender systems. It also describes research on modeling the social networks of Wikipedia based on interactions between users. Promising directions discussed are leveraging existing open web data and real user data when studying topics like trust in IT systems and semantic systems/services.
Disease spread in small-size directed networksMarco Pautasso
Why small-size networks? They are good models for regional horticultural networks spreading plant diseases such as Phytophthora ramorum. Main result: Lower epidemic threshold for scale-free networks with positive correlation between in- and out-degree
This document discusses detecting spam comments on YouTube videos using machine learning techniques. It analyzes YouTube comments datasets using logistic regression, AdaBoost, decision trees and random forest algorithms. Neural networks achieved the highest accuracy of 91.65% for spam detection, an improvement of around 18% over existing approaches. The document outlines the methodology, including preprocessing, feature selection and extraction, and model building. It discusses the results and screenshots of a developed system for classifying YouTube comments as spam or not spam. In conclusion, machine learning techniques can effectively detect spam comments, though spammers may adapt over time to evade detection.
This document discusses visualization for software analytics and identifies three key trends: 1) developers moving from solo coders to social coders, 2) software development shifting from code-centric to data-centric, and 3) visualization becoming ubiquitous rather than standalone. It provides examples of visualizations for software design, code, dynamic behavior, architecture, and human activities. It discusses how visualization can provide insights, support tasks, and communicate knowledge. It also outlines opportunities and challenges for visual analytics and ubiquitous visualization in software engineering.
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve PooleJAXLondon_Conference
1) The document discusses the rise of microservices and DevOps approaches in application development and deployment. It notes both the promises and challenges of these approaches, including increased complexity and the need for new tooling.
2) It describes lessons learned from early adoption of microservices, such as the problems that can arise from shared data stores and monolithic upgrades.
3) The document advocates for a "safety first" mindset with DevOps, emphasizing the importance of security, compliance, and understanding where data is located in cloud environments.
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"Daniel Bryant
Last year we talked about DevOps, what it was, why it was important and how to get started. Boy, was it scary. Now we’re wiser. More battle-scarred. The scale of the challenge for application writers exploiting cloud and DevOps is clearer, but so is the path forward. Understanding the DevOps approach is important but equally you must understand specific deployment technologies. How to exploit them and how they effect the design of applications. Whether creating simple applications or sophisticated microservice architectures many of the challenges are the same.
Presented at JAXLondon 2015 with Steve Poole
Social media for PR - Communications - Success measurementJose Sanchez
This document provides guidance on measuring the success of social media campaigns through defining goals, key metrics, tracking tools, and ongoing optimization. Key steps include choosing metrics like followers, engagement, and sharing to track awareness, participation, and advocacy; using tools to monitor metrics and populate dashboards; and analyzing outcomes to see if goals were met and how the strategy can be improved. Measuring social media performance helps ensure it effectively meets communications objectives.
Social media for PR Communications - Success measurement planJose Sanchez
This document provides guidance on measuring the success of a social media campaign after its launch. It begins by defining social media measurement as the objective tracking, monitoring, collection, measurement and analysis of quantitative and qualitative data generated by participants to optimize social media tools, tactics and services. Key terms are defined, such as likes, shares, clicks and followers. The benefits of social media measurement are outlined, including optimizing campaigns. Methods of measurement are described, like setting goals, choosing metrics, using monitoring tools and dashboards, and ongoing optimization. Costs, reliability and examples of success are also discussed.
2011 Wintel Targeted Attacks and a Post-Windows Environment APT ToolsetKurt Baumgartner
This document discusses targeted cyber attacks in 2011 and beyond. It describes how attackers have invested in developing tools and techniques to infiltrate systems using exploits, malware, and social engineering. Specific examples from 2011 are provided, including exploits of Adobe Flash and Reader. The document also discusses how attacks may evolve in a post-Windows environment as mobile devices and tablets become more prominent targets. It suggests attackers may increasingly focus on platforms like Android and iOS, and looks at early research into exploiting HTML5 features.
The document discusses the evolution of the web from read-only to read-write and participation through user-generated content and social media. It defines social media as people having conversations online and outlines how users interact by posting, sharing, tagging, and commenting on various types of content. The document also discusses emerging technologies like mobile social web, telepresence, and crowdsourcing as well as laws governing the growth of networks and bandwidth.
Web 2.0 refers to new uses and perspectives of the web that emphasize user participation, openness, and network effects. It includes applications like wikis, blogs, RSS feeds, podcasts, social bookmarking, social networks, and media sharing sites. While disruptive to traditional healthcare values of control and privacy, Web 2.0 could harness the collective intelligence of users to share information in new ways. The presentation discusses examples of Web 2.0 applications and how they differ from traditional websites, and considers opportunities and challenges of adopting more participatory approaches in healthcare.
This document analyzes gender inequalities in Wikipedia by measuring how notable men and women are presented and how professions are described. It finds that while coverage of notable men and women is generally good compared to external sources, women local heroes are covered less than expected. It also finds linguistic bias and structural differences in how men and women are depicted. Regarding professions, it finds gender-neutral descriptions are rare, professions dominated by women often refer mainly to men, and gender differences exist in descriptions of notable men and women in those fields. It concludes automatic tools and guidelines are needed to support editors in addressing some of these inequalities.
This document discusses the implications of Web 2.0 and social media for IT organizations and employees. It defines concepts like blogs, microblogs, social networks, social bookmarking, wikis and RSS. It notes that employees were not traditionally trained to use these tools. The document also discusses how Web 2.0 principles can be applied within companies in an "Enterprise 2.0" model. It lists trends in IT like openness and socialization. The workshop assignment is for participants to consider how these tools and concepts apply to their work and to identify implications for their IT department and information sharing.
The document discusses the 10/3 Instructional Model which focuses on (1) power standards that are the most essential standards students need to learn, (2) using Web 2.0 tools like blogs, wikis and podcasts to make content more engaging for students, and (3) preparing students for future technologies like constant connectivity, 3D environments, cloud computing and simulations through integrating practices like mobile apps and online simulations.
The 10/3 Instructional Model focuses on power standards, web 2.0 technology tools, and aspects of future web 3.0. It prioritizes essential standards and recommends aligning courses to 10 modules focusing on power standards. It encourages using proprietary and freeware web 2.0 collaboration tools to engage students with content. While web 3.0 is undefined, the model suggests preparing students by integrating mobile apps, cloud computing, simulations and other emerging web 3.0 technologies into instructional design and practice.
Evan introduces himself and his work on open source microblogging software projects. He created Laconica, an open source PHP/MySQL microblogging platform that implements the Twitter API. He also developed the OpenMicroblogging protocol for distributed microblogging across servers and the StatusNet software-as-a-service platform. His goal is for open microblogging software to allow enterprises, brands, and communities to have decentralized alternatives to centralized platforms like Twitter.
Semantic Technology Solutions For Gov 2 0 Citizen-Friendly Recovery.Gov and D...ajmalik
The document discusses how semantic technologies can be used to build citizen-friendly versions of Recovery.gov and Data.gov in line with the Obama administration's goals of transparency, openness, and collaboration. It begins with an introduction to semantic technologies and how they can model knowledge and add intelligence. It then discusses how these technologies, combined with cloud computing and Web 2.0 approaches, can enhance sites like Recovery.gov and Data.gov to better serve citizens. The presentation concludes with a call to action to demonstrate semantic solutions for these sites.
Web 2.0 Measurement: Open Government Innovations ConferenceAndrew Krzmarzick
Presentation delivered at the Open Government and Innovations (OGI) Conference in Washington, DC, on July 22, 2009. Outlines the ways in which government has measured its web presence in a "1.0" context, including an overview of the measurement activities conducted by Brookings Institution, Foresee, Forrester and the e-Government Act of 2002.
2020 Social Workshop on Social Media Strategy for CXOs2020 Social
The document outlines an agenda for a 2020 Social Workshop on social media strategy for CXOs. The workshop consists of 4 sessions: Introduction, Strategy, Tactics, and Wrap-Up. Session 1 provides an introduction to social technologies and how they are changing people and society. It discusses various social platforms and how to understand them. Session 2 focuses on social media strategy, including how marketing is evolving from a TV-centric model to a community-driven approach. Key concepts around building online communities and scaling passion are also presented.
Invited talk at Future Networked Technologies / FIT-IT research calls opening...Paolo Massa
The document summarizes research on trust in recommender systems and social networks. It discusses using trust networks and trust metrics to power trust-aware recommender systems. It also describes research on modeling the social networks of Wikipedia based on interactions between users. Promising directions discussed are leveraging existing open web data and real user data when studying topics like trust in IT systems and semantic systems/services.
Disease spread in small-size directed networksMarco Pautasso
Why small-size networks? They are good models for regional horticultural networks spreading plant diseases such as Phytophthora ramorum. Main result: Lower epidemic threshold for scale-free networks with positive correlation between in- and out-degree
This document discusses detecting spam comments on YouTube videos using machine learning techniques. It analyzes YouTube comments datasets using logistic regression, AdaBoost, decision trees and random forest algorithms. Neural networks achieved the highest accuracy of 91.65% for spam detection, an improvement of around 18% over existing approaches. The document outlines the methodology, including preprocessing, feature selection and extraction, and model building. It discusses the results and screenshots of a developed system for classifying YouTube comments as spam or not spam. In conclusion, machine learning techniques can effectively detect spam comments, though spammers may adapt over time to evade detection.
This document discusses visualization for software analytics and identifies three key trends: 1) developers moving from solo coders to social coders, 2) software development shifting from code-centric to data-centric, and 3) visualization becoming ubiquitous rather than standalone. It provides examples of visualizations for software design, code, dynamic behavior, architecture, and human activities. It discusses how visualization can provide insights, support tasks, and communicate knowledge. It also outlines opportunities and challenges for visual analytics and ubiquitous visualization in software engineering.
DevOps and the cloud: all hail the (developer) king - Daniel Bryant, Steve PooleJAXLondon_Conference
1) The document discusses the rise of microservices and DevOps approaches in application development and deployment. It notes both the promises and challenges of these approaches, including increased complexity and the need for new tooling.
2) It describes lessons learned from early adoption of microservices, such as the problems that can arise from shared data stores and monolithic upgrades.
3) The document advocates for a "safety first" mindset with DevOps, emphasizing the importance of security, compliance, and understanding where data is located in cloud environments.
JAXLondon 2015 "DevOps and the Cloud: All Hail the (Developer) King"Daniel Bryant
Last year we talked about DevOps, what it was, why it was important and how to get started. Boy, was it scary. Now we’re wiser. More battle-scarred. The scale of the challenge for application writers exploiting cloud and DevOps is clearer, but so is the path forward. Understanding the DevOps approach is important but equally you must understand specific deployment technologies. How to exploit them and how they effect the design of applications. Whether creating simple applications or sophisticated microservice architectures many of the challenges are the same.
Presented at JAXLondon 2015 with Steve Poole
Social media for PR - Communications - Success measurementJose Sanchez
This document provides guidance on measuring the success of social media campaigns through defining goals, key metrics, tracking tools, and ongoing optimization. Key steps include choosing metrics like followers, engagement, and sharing to track awareness, participation, and advocacy; using tools to monitor metrics and populate dashboards; and analyzing outcomes to see if goals were met and how the strategy can be improved. Measuring social media performance helps ensure it effectively meets communications objectives.
Social media for PR Communications - Success measurement planJose Sanchez
This document provides guidance on measuring the success of a social media campaign after its launch. It begins by defining social media measurement as the objective tracking, monitoring, collection, measurement and analysis of quantitative and qualitative data generated by participants to optimize social media tools, tactics and services. Key terms are defined, such as likes, shares, clicks and followers. The benefits of social media measurement are outlined, including optimizing campaigns. Methods of measurement are described, like setting goals, choosing metrics, using monitoring tools and dashboards, and ongoing optimization. Costs, reliability and examples of success are also discussed.
2011 Wintel Targeted Attacks and a Post-Windows Environment APT ToolsetKurt Baumgartner
This document discusses targeted cyber attacks in 2011 and beyond. It describes how attackers have invested in developing tools and techniques to infiltrate systems using exploits, malware, and social engineering. Specific examples from 2011 are provided, including exploits of Adobe Flash and Reader. The document also discusses how attacks may evolve in a post-Windows environment as mobile devices and tablets become more prominent targets. It suggests attackers may increasingly focus on platforms like Android and iOS, and looks at early research into exploiting HTML5 features.
The document discusses the evolution of the web from read-only to read-write and participation through user-generated content and social media. It defines social media as people having conversations online and outlines how users interact by posting, sharing, tagging, and commenting on various types of content. The document also discusses emerging technologies like mobile social web, telepresence, and crowdsourcing as well as laws governing the growth of networks and bandwidth.
Web 2.0 refers to new uses and perspectives of the web that emphasize user participation, openness, and network effects. It includes applications like wikis, blogs, RSS feeds, podcasts, social bookmarking, social networks, and media sharing sites. While disruptive to traditional healthcare values of control and privacy, Web 2.0 could harness the collective intelligence of users to share information in new ways. The presentation discusses examples of Web 2.0 applications and how they differ from traditional websites, and considers opportunities and challenges of adopting more participatory approaches in healthcare.
This document analyzes gender inequalities in Wikipedia by measuring how notable men and women are presented and how professions are described. It finds that while coverage of notable men and women is generally good compared to external sources, women local heroes are covered less than expected. It also finds linguistic bias and structural differences in how men and women are depicted. Regarding professions, it finds gender-neutral descriptions are rare, professions dominated by women often refer mainly to men, and gender differences exist in descriptions of notable men and women in those fields. It concludes automatic tools and guidelines are needed to support editors in addressing some of these inequalities.
This document summarizes research on gender inequality in Wikipedia. It finds that women have greater odds of being omitted from Wikipedia coverage compared to men. Structural analysis also reveals asymmetry, with links from male-related pages being less likely to link to female pages. Text analysis finds biographies of women disproportionately focus on relationships while biographies of men focus more on accomplishments. The researchers call for understanding how algorithms may propagate social biases and for improving how women are portrayed on Wikipedia. Tools like WikiWho and WikiVis are introduced to analyze article interaction graphs and their evolution over time. Overall questions remain about the causes of bias and whether the Wikipedia community is improving in addressing issues of representation.
WWW2014 Semantic Stability in Social Tagging StreamsClaudia Wagner
The document discusses measuring semantic stability in social tagging systems. It proposes a new approach called Rank Biased Overlap (RBO) to measure semantic stability over time. RBO compares the ranked lists of tags assigned to resources at different times to determine the overlap weighted by rank. The study finds that tag streams on Twitter, Delicious, and LibraryThing become semantically stable after 1,000-2,000 tag assignments based on RBO scores. Random tagging processes result in slower and lower levels of stability. Simulations show that combinations of background knowledge and imitation among users leads to the fastest stabilization. The approach allows comparing stabilization across systems and identifying semantically stable streams to extract shared semantic knowledge.
Welcome 1st Computational Social Science Workshop 2013 at GESISClaudia Wagner
This document summarizes a workshop on computational social science (CSS) held at GESIS - Leibniz Institute for the Social Sciences. The workshop aimed to explore how research at GESIS contributes to the CSS field and find collaboration opportunities. It began with network analysis presentations and continued with sessions on social science support and applications of CSS. The program included talks on predicting negative links on social networks, political discussions on Twitter, generating networks, and exploring dietary patterns from web data. Working groups then formed to discuss specific topics, with the goal of planning future collaborations between GESIS researchers applying computational methods to social science questions.
The Impact of Socialbots in Online Social NetworksClaudia Wagner
The document summarizes a study that examined the ability of socialbots on Twitter to influence link creation between users. Socialbots were able to increase total link creation by 40% in the first experimental phase and 20% in the second. However, most of the new links (around 1/3) were still recommended by human users, while only 6-12% were recommended by socialbots. Around 50% of new links could not be explained by the data, suggesting other external factors influenced link creation as well. The study aimed to measure socialbots' impact while controlling for confounding variables.
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...Claudia Wagner
This document analyzes factors that impact attention levels in online forum posts. It finds:
1. Attention patterns differ between communities and most are local rather than global.
2. Factors that initiate discussions often differ from those generating lengthy replies.
3. Purpose and specificity of a community's topic impacts attention, with supportive communities driven by different factors than informational ones.
This document discusses how pragmatic metadata, or data about how data is used, can support the generation of semantic metadata for user models. It presents an experiment using different topic modeling algorithms, including LDA and Dirichlet Multinomial Regression, to learn topics from user posts and annotations. Models incorporated pragmatic metadata like authorship and reply relationships. Evaluation showed models using pragmatic user metadata like replies had better predictive performance on future user posts than baselines without metadata. The results indicate pragmatic metadata can help generate semantic topic annotations for users and posts.
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
The document discusses topic models like Latent Dirichlet Allocation (LDA) and Correlated Topic Models (CTM). LDA is a generative probabilistic model that can discover topics in a collection of documents and represent documents as mixtures over latent topics. CTM extends LDA by allowing topic correlations, since LDA assumes topic independence. CTM models topic proportions using a logistic normal distribution rather than LDA's Dirichlet distribution, allowing dependencies between topics.
Topic models are probabilistic models for discovering the underlying semantic structure of a document collection based on a hierarchical Bayesian analysis. Latent Dirichlet allocation (LDA) is a commonly used topic model that represents documents as mixtures of topics and topics as distributions over words. LDA uses Gibbs sampling to estimate the posterior distribution over topic assignments given the words in each document.
Knowledge Acquisition from Social Awareness StreamsClaudia Wagner
The document discusses analyzing social awareness streams (SAS) to extract knowledge and emerging ontological structures. It proposes developing a SAS analyzer system to study what knowledge can be acquired from SAS using controlled experiments. Preliminary experiments aim to observe if semantics emerge from different types of SAS aggregations. Initial results indicate hashtag streams are more robust than user list streams and network transformations reveal latent conceptual structures.
The document discusses different types of social media streams and how their structures and emerging semantics are influenced by the type of stream aggregation. It presents a network-theoretic model of social awareness streams and analyzes hashtag streams, user list streams, and user directory streams on Twitter. Preliminary results show that hashtag streams demonstrate more robust latent conceptual structures than user list streams. Stream aggregation type affects both structural properties and emerging semantics.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Socialbots www2012
1. When socialbots attack:
Modeling susceptibility of users in online social networks
Claudia Wagner, Silvia Mitter, Christian Körner, Markus Strohmaier
Lyon, 16.4.2012
2. What are socialbots?
A socialbot is a piece of software that controls a user
account in an online social network and passes itself of as
a human being
3. 3
Danger of socialbots
Social Engineering
Gaining access to secure objects by exploiting human
psychology rather than using hacking techniques
Harvest private user data such as email addresses, phone
numbers, and other personal data that have monetary
value
Spread Misinformation
Ratkiewicz et al. describe the use of Twitter bots to run
smear campaigns during the 2010 U.S. midterm elections.
J. Ratkiewicz, M. Conover, M. Meiss, B. Goncalves, S. Patil, A. Flammini, and F. Menczer. Truthy:
mapping the spread of astroturf in microblog streams. In Proceedings of the 20th international
conference companion on World wide web, WWW '11, pages
4. Danger of socialbots
Snowball effects
Boshmaf et al. show that
Facebook can be infiltrated by
social bots sending friend
requests. 102 socialbots, 6
weeks, 3.517 friend requests and
2.079 infections
Average reported acceptance
rate: 59,1% up to 80% depending
on how many mutual friends the
social bots had with the infiltrated
users
Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu. The socialbot network. In Proceedings
of the 27th Annual Computer Security Applications Conference, page 93. ACM Press, Dec 2011.
5. How likely will she
be infected by a bot
Experimental Setup
?
Whom shall we protect to avoid large scale infiltration due to
snowball effects?
Who is a bot? Whom shall we eliminate?
Is she a bot?
src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/
6. Experimental Setup
Two-stage approach
Predict Infections (binary classification task)
Who is susceptible for bot attacks – i.e. who gets
infected?
Predict Infection level (regression task)
How susceptible is a user – i.e. how often does a user
interact with bots?
Dataset: Social Bot Challenge 2011
7. Social Bot Challenge 2011
Competition organized by Tim Hwang
Aim was to develop socialbots that persuade 500 randomly Twitter
users (targets) to interact with them
Targets have a topic in common: cats
Teams got points if targets replied to, mentioned, retweeted or
followed their lead bot
14 days during which teams were allowed to develop their social
bots.
Game started on the Jan 23rd 2011 (day 1) and ended Feb 5th 2011
(day 14)
At the 30th of January (day 8) the teams were allowed to update
their codebase
9. Feature Engineering
How likely will this user become infected?
User Network
Behavior
Content
10. Network Features
3 directed networks: Follow, retweet and interaction
(retweet, reply, mention and follow) network
Hub and Authority Score (HITS)
High authority score node has many incoming edges from
nodes with a high hub score
High hub score node has many outgoing edges to nodes
with a high authority score
In-degree and Out-degree
Clustering Coefficient
number of actual links between the neighbors of a node
divided by the number of possible links between them
11. Behavioral Features
Informational Coverage
Conversational Coverage
Question Coverage
Social Diversity
Informational Diversity
Temporal Diversity
Lexical Diversity
Topical Diversity
C. Wagner and M. Strohmaier. The wisdom in tweetonomies: Acquiring latent conceptual structures
From social awareness streams. In Proc. of the Semantic Search 2010 Workshop, April 2010.
12. Linguistic Features
LIWC uses a word count strategy searching for over
2300 words
Words have previously been categorized into over 70
linguistic dimensions.
standard language categories
(e.g., articles, prepositions, pronouns including first person
singular, first person plural, etc.)
psychological processes (e.g., positive and negative emotion
categories, cognitive processes such as use of causation
words, self-discrepancies),
relativity-related words (e.g., time, verb tense, motion, space)
traditional content dimensions
(e.g., sex, death, home, occupation).
J. Pennebaker, M. Mehl, and K. Niederhoer. Psychological aspects of natural language use: Our words,
our selves. Annual review of psychology, 54(1):547-577, 2003.
13. Feature Computation
For all targets we computed the features by using all
tweets they authored during the challenge (up to the
point in time where they become infected) and a
snapshot of the follow network which was as
recorded at the 26th of January (day 4)
We only used targets which became susceptible at
day 7 or later
Features do not contain any future information (such
as tweets or social relations which were created
after a user became infected)
14. Predict Infections
Binary Classification of users into susceptible and non-
susceptible
Train 6 classifiers
97 Features
Compare classifiers via 10 cross-fold validation
Balanced dataset
17. Predict Level of Infection
Which factors are correlated with users‘
susceptibility score?
Susceptibility score
counts number of interactions between a target and
any lead bot
Method: Regression Trees
can handle strongly nonlinear relationships with high order
interactions and different variable types
Fit the model to our 75% of the susceptible users
18. Users who
• use more negation words (e.g. not, never, no),
• tweet more regularly 1
(i.e. have a high temporal balance)
Predicting Levels of Susceptibility
• use more words related with the topic death
negemo
(e.g. bury, con, kill) < 0.40068 >= 0.40068
tend to interact more often with bots
2
temp_bal
< 0.37025 >= 0.37025
3
death
< −0.16389 >= −0.16389
Node 4 (n = 25) Node 5 (n = 7) Node 6 (n = 9) Node 7 (n = 15)
8 8 8 8
6 6 6 6
4 4 4 4
2 2 2 2
19. Predicting Levels of Susceptibility
Rank correlation of hold-out users given their real
susceptibility level and their predicted susceptibility level
(Kendall τ up to 0.45)
Goodness of fit (R2 up to 0.3)
Potential Reasons:
Dataset is too small (we only had 81 susceptible users
and 61% of them had level 1, 17% had level 2, 10% had
level 3, very few users had more than 3 interactions)
20. Summary & Conclusions
Approach to identify susceptible users
Features of all three types contributed to the
identification
Users are more likely to be susceptible if
they are emotional Meformers
they use Twitter mainly for communicating
their communications are not focused to a small circle of
friends
they are social and active (i.e., interact with many others)
21. Summary & Conclusions
Active Twitter users are more susceptible
They are more likely to see the messages/requests of
social bots
But we expected that they develop some skills to
distinguish social bots from human by using Twitter
frequently
Predicting users’ susceptibility score is difficult
More data and further experiments are required
22. Future Work
Repeating experiments on larger datasets
Taxonomy of social bot strategies
Massive numbers of con-messages (brute force)
Manipulation of messages through false retweets (changing pro-
to con messages)
Diverting attention by adding con-hashtags to pro-hashtags
Susceptibility of users for different strategies
23. Emotional Meformers which are active, communicative and social
Experimental Setup
are more likely to be infected
THANK YOU
claudia.wagner@joanneum.at
http://claudiawagner.info
src: http://adobeairstream.com/green/a-natural-predicament-sustainability-in-the-21st-century/
Editor's Notes
What makes a socialbot different from self-declared bots is that hide the fact that they're robots and usually try to pursue a variety of latent goals, such as to spread information or influence users. Tim Hang defined a socialbot as a machine with social impact.
And finally, recent research has shown that socialbots are extremely dangerous due to snowball effects. The more users a bot has infected in a network, the easier he can infect new users in that network. Boshmaf et al conducted in a very controversial experiment where they setup a network of 102 fb-bots which sent friend requests to others within a time period of 6 weeks. Their results show how a network of bots can infect fb user. Interestingly the average acceptance rate of friend requests was 59:1%, which, depends on howmany mutual friends the socialbots had with the inflltrated users, and can increase up to 80%.
So whatcanwe do toprevent large scaleinfilitrations due tosocial bot attacks? The traditional thingistotrytoidentifybotsandeliminatethem. In ourworkwesuggest a complementaryappraochwhichaimstoidentifyuserswhoaremostsuscepibleforsocial bot attacks. Wewantedtoknowiftheseusersshowspecialcharacteristicsand
Toanswerthisquestionweuse a 2-stage approach. First weaimtoidentifyuserswhoaresusceptibleto bot attacks in general– i.e., userswhobecameaffected–Wewereinterested in iftheseuserswhoanyspecificcharacteristicsoriftheseusersaraverageuserslikeyouandme.
In ourexperimentweuseddatafromthesocial bot challenge 2011 –whichis a competionthatwasorganizedby...
The dataset which we got contained all tweets which were published by the targets and bots during the challange and snapshots of the follow network between these users at different points in time. The figure shows how many users became susceptible at which day. One can see that most targets became susceptible at day 1. One possible explanation is the auto-follow feature which some of the targets might have used.
Sincewewereinterested in thefactorsthatimpactwhether a usergetsinfectedor not, wefirsthadto design featuresthatdescribe potential factors. In ourworkweused 3 different typesoffeature: featuresthatarebased on usernetworks, featuresthatarebased on users‘ tweetingbehaviorandfeaturesthatarebased on thelinguisticsofusers‘ tweetcontent.
Forthenetworkfeatureswecreated 3 different typesofusernetworksfromourdatasetandcomputedthefollowingmeasures on these 3 networks.
Coveragebasedmeasuresdescribe e.g. howmanymessagesof a usercontain links orareconversationalorcontainquestionmarks.Diversitybasedmeasuresdescribe e.g. withhowmany different users‘ a usercommunicatesandhowevenlydistributed a users‘ communicationeffortsareacrosstheseusers. A userwhocommunicateswithmanyusersequallymuchwouldhave a high socialdiversitywhile a userwhotendstocommunicatewith a smallcirceloffriendshas a lowsocialdiversity.
Linguistic Inquiry and Word By mapping words in tweets to these 2300 words one gets linguistic annotations of tweets which we used as features.
Wecomputedourfeaturesforeachtargetuserbased on all tweetsthetargetuserhasauthoredduringthechallangeuptothepointwhen he becameinfected. Thatmeanswedid not takeanyinformationintoaccountwhichhappened after a user was alreadyinfectedwhichisimportantsincewewanttopredictinfections. Thereforeweneedtoensurethatwe do not takeanyfutureinfromationintoaccountwhichcouldfalsifyourresults. Forthefollownetworkbasedfeaturesweused a snapshotfromday 4 –allsour sample usersbecamesusceptibelatday 7 orlater.
Soourfirstaim was toidentifyuserswhoarelikelytobecomeinfected. Thatmeanswehad a binaryclassificationproblemandouraim was todiffersusceptiblefrom non-susceptibleusers. Webalancedourdataset, compared 6 classifiersandconducted a 10 corss-foldvalidation. Ourresultsshowthat a generalizedboostedregressionclassifierperformed best. Thereforeweusedthisclassifiertofurtherinspectwhich variables were kost usefulfordifferentiatingbetween...
weusedthebestperformingclassificationmodeltofurtherinspectwhichfeaturesweremostusefulfordifferentiatingbetween...Onecanseefromthisslidethatthe most important features is the out-degree of a user node in the interaction network.It is interesting to note that the top 3 features contain one network feature, one linguistic feature and one behavioral feature which shows that all 3 types of features seem to contribute to our task.ROC curve plots the true positive rate vs. false positive rate. Idea would be if the Area under the ROC curve would be 1.
Wefurtherinspectedthefeaturedistributionsofthe top 20 featuresforeach user-class (i.e. suscepand non-suscept) togainfurtherinsightsintohowfeaturesofsusceptibleusersaredistributedandhow different theirdistributionsarefromthedistributionof non-.susceptibleusers.Best networkfeature: outdegreeofinteractionnetwork– i.e. userswhoactivlycreateinteractionswithothersaremorelikelytobecomeinfected. Best linguisticfeature: verbsandpresenttenseBest behavioralfeature: conversationalvarietyandcoverage
After havingidentifiesuserswho will becomeinfectedduring an attackwe also wanttopredicttheirlevelofinfection: i.e. doestheuserinteract just oncewiththe bot or do theydevelop a closedfrienshiprelation. Thatmeanstheaimofoursecondtaskistopredicthowoftentheyinteractedwith a bot. Toadressthisquestionweusedregressiontreessincetheycan handle...
By fitting the model to our dataset we learned the following tree structure which shows which features and thresholds are used internally by the model. The leaves show the distribution of the suscept score of users who were used as samples for this branch. From this tree structure we can see that…
Toassessthequalityofthismodelwemeasuredthe rank correlationof hold-out usersgiventheir real suscept score andgiventheirpredictedsusceptscores. The correlationcoefficient was prettylowand also the R-squaredvalueofthemodel was prettylow. One potential reasonforthatisthesizeofourdatasetandthatwedid not havemanysamplesofuserswhohadlengthydiscussionswithbots.
So letmestartconcludingmytalk. What I haveyoupresentedtodayis an approachtoidentifysuscepibleuser. Wehaveintroduced a varietyoffeatureswhichcancapturecharacteristicsofuserswhoareindeedmoresuscepibleto bot attacksthanothers.
The factthatactiveTwitterusersaremoresusceptibleis on thehand not reallysurprisingsince...But on theotherhanditissurprisingsinceonewouldexpectthatactiveusersdevelopsomesortofskillytodifferbetween...
Wehopethatourresearch will not onlyinform modern socialmediasecuritysystems but also supportthedevelopmentofgoodsocialbotswhichare e.g. usedtoincreasethefitnesslevelof a community.