This document summarizes a study that analyzed the influence of Twitter users by tracking the spread of 74 million URLs shared over two months in 2009. The study found that while the largest information cascades tended to be generated by users with many past followers and influence, it was difficult to reliably predict which specific users or URLs would generate large cascades. The study concluded that to effectively harness word-of-mouth diffusion on social media, it is best to target many average or potential influencers to capture average effects, rather than relying on individually predicting the most influential users or content.
This document analyzes emerging trends on Twitter to better understand how they can be characterized. It develops a taxonomy of trends from a large Twitter dataset within a specific geographic area. It identifies key dimensions to categorize trends, such as social network features, time signatures, and textual features. Analyzing these computed features across categories reveals significant differences that advance the understanding of trends on Twitter and how they can be used for applications serving local communities.
The Design of an Online Social Network Site for Emergency Management: A One-S...guest636475b
Web 2.0 is creating new opportunities for communication and collaboration. Part of this explosion is the increase in popularity and use of Social Network Sites (SNSs) for general and domain-specific use. In the emergency domain there are a number of websites, wikis, SNSs, etc. but they stand as silos in the field, unable to allow for cross-site collaboration. In this paper we describe ongoing design science research to develop and refine guiding principles for developing an SNS that will bring together emergency domain professionals in a “one-stop-shop.” We surveyed emergency professionals who study crisis information systems, to ascertain potential functionalities of such an SNS. Preliminary results suggest that there is a need for the envisioned SNS. Future research will continue to explore possible solutions to issues addressed in this paper.
This document summarizes a study that quantifies information overload on social media platforms using data from Twitter. The study models social media users as information processing systems that receive information in queues and process it at certain rates. By analyzing timestamps of tweets received and forwarded, the study estimates users' information processing behaviors and limits. Key findings include evidence that most users have processing limits of around 30 tweets/hour, and that overloaded users take longer to process information and prioritize tweets from select sources. The study also finds that information overload reduces the effectiveness of information spreading on social issues.
This document provides information about the Asia Triple Helix Society Summer Seminar to be held on June 25, 2014 in Daegu, South Korea. It will be held in conjunction with the 2014 World Conference for Public Administration. The seminar will include two panels on topics related to social media, big data, and North Korea, as well as corporate networks and entrepreneurial universities. Details are provided on the keynote speakers, panelists and their topics, dates and deadlines for abstracts and papers, location, sponsors, and contacts. The document outlines the full program agenda with titles, speakers and respondents for each presentation slot.
This document provides an overview of a dissertation examining power and control within the social media site Twitter. It discusses three key areas that will be analyzed: 1) the technical control of networked interaction through applying theories of Foucault, Galloway, and others, 2) the nature of networked conversation by establishing a model of interaction and analyzing its consequences, and 3) social structures within the site by proposing a social structure model and assessing its impact on information behavior. The introduction establishes Twitter as a case study for understanding power in a digital context due to its open and public nature. It proposes a model understanding power as involving technical limitations/freedoms, social/cultural capital, and structural bias.
MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating So...Carlton Northern
This document presents an unsupervised approach to discover and disambiguate social media profiles for a large group of individuals, such as employees or university students. The approach uses a combination of search engine queries, semantic web queries, and directly polling social media sites to discover potential profiles. It then applies heuristics involving keyword matching, community structure analysis, and extracting semantic and profile features to disambiguate the true profiles from false positives. The approach was tested on a set of 2016 university computer science student logins, achieving a precision of 0.863 and F-measure of 0.654 at discovering their real social media profiles from a ground truth data set.
The document discusses the emergence of data-driven science and computational social science. It covers several key areas:
- The growth of computational approaches and use of digital tools to manage large datasets in social science research.
- Debate around the role of theory and whether big data means the "end of theory". While data can provide insights, context from experts is still needed.
- The development of new research areas like data science, computational social science, and webometrics that utilize digital methods and focus on analyzing online data.
- Challenges in the field including uneven global development of data skills and divides between computational and non-computational researchers.
This document analyzes emerging trends on Twitter to better understand how they can be characterized. It develops a taxonomy of trends from a large Twitter dataset within a specific geographic area. It identifies key dimensions to categorize trends, such as social network features, time signatures, and textual features. Analyzing these computed features across categories reveals significant differences that advance the understanding of trends on Twitter and how they can be used for applications serving local communities.
The Design of an Online Social Network Site for Emergency Management: A One-S...guest636475b
Web 2.0 is creating new opportunities for communication and collaboration. Part of this explosion is the increase in popularity and use of Social Network Sites (SNSs) for general and domain-specific use. In the emergency domain there are a number of websites, wikis, SNSs, etc. but they stand as silos in the field, unable to allow for cross-site collaboration. In this paper we describe ongoing design science research to develop and refine guiding principles for developing an SNS that will bring together emergency domain professionals in a “one-stop-shop.” We surveyed emergency professionals who study crisis information systems, to ascertain potential functionalities of such an SNS. Preliminary results suggest that there is a need for the envisioned SNS. Future research will continue to explore possible solutions to issues addressed in this paper.
This document summarizes a study that quantifies information overload on social media platforms using data from Twitter. The study models social media users as information processing systems that receive information in queues and process it at certain rates. By analyzing timestamps of tweets received and forwarded, the study estimates users' information processing behaviors and limits. Key findings include evidence that most users have processing limits of around 30 tweets/hour, and that overloaded users take longer to process information and prioritize tweets from select sources. The study also finds that information overload reduces the effectiveness of information spreading on social issues.
This document provides information about the Asia Triple Helix Society Summer Seminar to be held on June 25, 2014 in Daegu, South Korea. It will be held in conjunction with the 2014 World Conference for Public Administration. The seminar will include two panels on topics related to social media, big data, and North Korea, as well as corporate networks and entrepreneurial universities. Details are provided on the keynote speakers, panelists and their topics, dates and deadlines for abstracts and papers, location, sponsors, and contacts. The document outlines the full program agenda with titles, speakers and respondents for each presentation slot.
This document provides an overview of a dissertation examining power and control within the social media site Twitter. It discusses three key areas that will be analyzed: 1) the technical control of networked interaction through applying theories of Foucault, Galloway, and others, 2) the nature of networked conversation by establishing a model of interaction and analyzing its consequences, and 3) social structures within the site by proposing a social structure model and assessing its impact on information behavior. The introduction establishes Twitter as a case study for understanding power in a digital context due to its open and public nature. It proposes a model understanding power as involving technical limitations/freedoms, social/cultural capital, and structural bias.
MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating So...Carlton Northern
This document presents an unsupervised approach to discover and disambiguate social media profiles for a large group of individuals, such as employees or university students. The approach uses a combination of search engine queries, semantic web queries, and directly polling social media sites to discover potential profiles. It then applies heuristics involving keyword matching, community structure analysis, and extracting semantic and profile features to disambiguate the true profiles from false positives. The approach was tested on a set of 2016 university computer science student logins, achieving a precision of 0.863 and F-measure of 0.654 at discovering their real social media profiles from a ground truth data set.
The document discusses the emergence of data-driven science and computational social science. It covers several key areas:
- The growth of computational approaches and use of digital tools to manage large datasets in social science research.
- Debate around the role of theory and whether big data means the "end of theory". While data can provide insights, context from experts is still needed.
- The development of new research areas like data science, computational social science, and webometrics that utilize digital methods and focus on analyzing online data.
- Challenges in the field including uneven global development of data skills and divides between computational and non-computational researchers.
This document discusses word-of-mouth (WOM) communication and the key players that influence change through WOM networks. It examines three theories on influentials: the influentials hypothesis, which argues that a few key individuals influence others; the everyone hypothesis, which states that everyone can play the role of influencer; and the hub hypothesis, which focuses on well-connected individuals. While each theory provides insights, the reality is likely a combination. The document also explores why WOM can be difficult for marketers to access, such as the challenges of measuring offline conversations. It concludes by identifying areas for further research, such as better defining influential roles and comparing online and offline hubs.
This document summarizes a research paper that proposes an unsupervised, lexicon-based approach for sentiment analysis of informal communication on social media. The approach uses a sentiment lexicon and rules to detect linguistic elements like negation, intensifiers, and emoticons. It can classify texts as subjective/objective and positive/negative. Experiments on three real-world social media datasets found it outperformed supervised machine learning approaches in most cases, demonstrating the effectiveness of simple, intuitive methods for sentiment analysis in some domains.
1999 ACM SIGCHI - Counting on Community in CyberspaceMarc Smith
This panel discusses research projects studying the formation of online communities. Each panelist presents empirical research on a different social cyber space:
1) Marc Smith studied Usenet and found islands of cooperative behavior exist, contradicting the idea it has succumbed to a "tragedy of the commons".
2) Steven Drucker analyzed graphical chat system V-Chat and found the graphical features were used extensively without direct prompts, showing why people communicate this way.
3) Barry Wellman studied residents in a wired Canadian suburb, finding how existing online services are used and what future services people want, providing insight into future connected communities.
4) Robert Kraut found that greater internet use was associated with declines in
This document summarizes research on how characteristics of social media profiles impact perceptions of source credibility. Specifically, it examines how the number of followers and the ratio of followers to follows on Twitter profiles affect judgments of trustworthiness, competence, and goodwill. The research aims to identify factors that influence how people evaluate the credibility of information from social media sources.
2000 - CSCW - Conversation Trees and Threaded ChatsMarc Smith
The document discusses issues with traditional chat interfaces and proposes an alternative called Threaded Chat. Traditional chat ruptures connections between turns and replies by displaying messages in order of arrival rather than conversational order. Threaded Chat aims to address this by structuring messages as threaded responses like online forums, though designed for synchronous use. A study found Threaded Chat allowed equally effective but possibly more efficient discussions than traditional chat.
An Online Social Network for Emergency ManagementConnie White
This document proposes investigating whether an online social network could help facilitate collaboration across different emergency management organizations. It discusses how social networking sites are becoming more popular tools for mass collaboration. The researchers conducted a survey of emergency management students to get preliminary feedback on using social networks for emergency coordination. The results showed strong agreement that social networks could effectively support information sharing and communication during emergencies. The researchers plan to further engage emergency professionals to understand their needs and how a social network could best serve the emergency domain.
This paper aims to analyze potential differences in the temporal patterns of misinformation diffusion compared to factual information diffusion on Twitter. Specifically, it looks at the speed of distribution and whether a lack of evenness in distribution is correlated with misinformation, building on previous research. The researchers found no strong evidence that speed of distribution is directly correlated with validity, but temporal patterns could potentially be used along with other methods to more quickly identify misinformation given the harm it can cause. Understanding how information spreads on social networks is important as both useful and harmful information can diffuse rapidly.
Social Media, Crisis Communication and Emergency Management: Leveraging Web 2...Connie White
Detailing guidelines and safe practices for using social media across a range of emergency management applications‚ Social Media, Crisis Communication, and Emergency Management: Leveraging Web 2.0 Technologies supplies cutting-edge methods to help you inform the public‚ reduce information overload‚ and ultimately‚ save more lives.
Introduces collaborative mapping tools that can be customized to your needs
Explores free and open-source disaster management systems‚ such as Sahana and Ushahidi
Covers freely available social media technologies—including Facebook‚ Twitter‚ and YouTube
This document presents a framework for estimating the prevalence of deceptive reviews in online review communities. It proposes using a machine learning classifier trained to detect deceptive reviews, along with estimates of the classifier's accuracy, within a generative model. The model is used to estimate deception rates in six major review sites like Expedia and Yelp, without requiring gold-standard human annotations. It finds deception rates vary significantly between sites, with sites having lower "signaling costs" for posting reviews generally showing higher estimated deception. When sites take measures to increase costs, like filtering new reviewers, estimated deception decreases substantially.
Visible Effort: A Social Entropy Methodology for Managing Computer-Mediated ...Sorin Adam Matei
A theoretically-grounded learning feedback tool suite, the Visible Effort (VE) Mediawiki extension, is proposed for optimizing online group learning activities by measuring the amount of equality and the emergence of social structure in groups that participate in Computer-Mediated Collaboration (CMC). Building on social entropy theory, drawn from Shannon’s Mathematical Theory of Communication, VE captures levels of CMC unevenness and group structure and visualizes them on wiki Web pages through background colors, charts, and tabular data. Visual information provides users entropic feedback on how balanced and equitable collaboration is within their online group are, while helping them to maintain it within optimal levels. Finally, we present the theoretical and practical implications of VE and the measures behind it, as well as illustrate VE’s capabilities by describing a quasi-experimental teaching activity (use scenario) in tandem with a detailed discussion of theoretical justification, methodological underpinning, and technological capabilities of the approach.
This document is a thesis submitted by Zachary Shaw to the Department of Philosophy at Princeton University in partial fulfillment of the requirements for a Bachelor of Arts degree. The thesis examines how social network sites have affected concepts of personal identity. It begins with background on the development of information technologies and the shortening gap between information input and output. It then discusses theories of personal identity and how social network sites fit into modern life. The thesis argues that only the "meta-patterns view" can adequately explain how social network sites have impacted identity construction. It concludes by arguing against the idea that social network site use should be reduced and states that personal identities are attempts to recognize patterns in ever-changing information.
2009 - Connected Action - Marc Smith - Social Media Network AnalysisMarc Smith
Review of social media network analysis of Internet social spaces like twitter, flickr, email, message boards, etc. Network analysis and visualization of social media collections of connections.
This doctoral research explores how users make sense of and use information when seeking information from web-based resources. The research examines user behaviors and strategies as they interact with information sources to identify examples of how users collect, evaluate, understand, interpret, and integrate new information. Three empirical studies were conducted involving participants completing information seeking tasks. The results provide insights into users' information interaction strategies and sensemaking activities. Implications for the design of technologies to better support sensemaking are discussed.
2009-Social computing-Analyzing social media networksMarc Smith
This document summarizes research on analyzing social networks within enterprises that have adopted social media applications. Key points:
- Social media applications generate social networks as employees interact by creating connections, replying to messages, collaborating on documents, and mentioning common topics. These networks reveal insights into an organization's structure and dynamics.
- Network analysis uses metrics from graph theory to describe network properties like individual roles (e.g. discussion starters), overall shape and size, and each individual's connections. Visualizations can highlight important people, events, and subgroups.
- Early social network analysis relied on manually collected data, limiting its use. Now, automatically captured social media data creates networks without explicit surveys, providing rich new data
Comprehensive Social Media Security Analysis & XKeyscore Espionage TechnologyCSCJournals
Social networks can offer many services to the users for sharing activities events and their ideas. Many attacks can happened to the social networking websites due to trust that have been given by the users. Cyber threats are discussed in this paper. We study the types of cyber threats, classify them and give some suggestions to protect social networking websites of variety of attacks. Moreover, we gave some antithreats strategies with future trends.
This document summarizes a PhD research project exploring how online social media is used to support local governance. It describes a case study of a small community that uses online discussion forums to discuss local issues. Analysis identified five patterns of "governance conversation" including sharing information, providing feedback, coordinating responses, informally mediating issues, and engaging in discussion of complex problems. While no binding decisions were made online, the discussions supported governance action. The research examines how online tools can both facilitate instrumental problem-solving and support expressive deliberation that accommodates pluralism in local governance.
2010-November-8-NIA - Smart Society and Civic Culture - Marc SmithMarc Smith
This document discusses how social media and social networks are enabling new forms of civic participation and collective action. It notes that citizens are increasingly using social media to find government services, engage in discussions, and measure public opinion. The document also discusses how social network analysis can be used to analyze patterns in social media networks and identify influential users. It provides an overview of various social media platforms and the types of social networks and connections that exist within them.
Peace on Facebook? Problematising social media as spaces for intergroup conta...Paul Reilly
The document discusses the role of social media in divided societies and peacebuilding. It summarizes theories on contact between antagonistic groups and the potential for social media to facilitate positive interactions. However, it argues that social media platforms are not neutral spaces and amplify inflammatory content. While online contact may occur, claims about reconciliation are overstated given platforms prioritize engagement over quality of interactions. Ultimately, social media can exacerbate tensions if interactions increase prejudice between groups in divided societies.
Multimode network based efficient and scalable learning of collective behaviorIAEME Publication
This document discusses multimode network-based approaches for efficiently learning collective behavior in large social networks. It provides an overview of existing approaches for predicting collective behavior based on the behaviors of connected individuals. Specifically, it describes methods that extract social dimensions from networks to represent affiliations between actors and then apply supervised learning to determine which dimensions are informative for behavior prediction. However, existing approaches do not scale well to networks with millions of actors. The document proposes a new edge-centric clustering approach to extract sparse social dimensions, enabling the efficient handling of very large networks while maintaining predictive performance.
The Pew Internet and American Life Project released a report examining how consumers find and consume news via a variety of sources and media platforms.
This document discusses leveraging the DDI (Data Documentation Initiative) model for linked statistical data in the social, behavioral, and economic sciences. It outlines how the DDI was developed as an ontology, including using use cases to identify important elements to model and mapping existing DDI-XML documents to DDI-RDF. A key use case is discovering microdata connected across multiple studies based on dimensions like time, country, and subject. The document walks through examples of queries this ontology would support, such as finding questions associated with a concept or the maximum value of a variable. It concludes by identifying some open issues to address in the DDI ontology.
This document discusses word-of-mouth (WOM) communication and the key players that influence change through WOM networks. It examines three theories on influentials: the influentials hypothesis, which argues that a few key individuals influence others; the everyone hypothesis, which states that everyone can play the role of influencer; and the hub hypothesis, which focuses on well-connected individuals. While each theory provides insights, the reality is likely a combination. The document also explores why WOM can be difficult for marketers to access, such as the challenges of measuring offline conversations. It concludes by identifying areas for further research, such as better defining influential roles and comparing online and offline hubs.
This document summarizes a research paper that proposes an unsupervised, lexicon-based approach for sentiment analysis of informal communication on social media. The approach uses a sentiment lexicon and rules to detect linguistic elements like negation, intensifiers, and emoticons. It can classify texts as subjective/objective and positive/negative. Experiments on three real-world social media datasets found it outperformed supervised machine learning approaches in most cases, demonstrating the effectiveness of simple, intuitive methods for sentiment analysis in some domains.
1999 ACM SIGCHI - Counting on Community in CyberspaceMarc Smith
This panel discusses research projects studying the formation of online communities. Each panelist presents empirical research on a different social cyber space:
1) Marc Smith studied Usenet and found islands of cooperative behavior exist, contradicting the idea it has succumbed to a "tragedy of the commons".
2) Steven Drucker analyzed graphical chat system V-Chat and found the graphical features were used extensively without direct prompts, showing why people communicate this way.
3) Barry Wellman studied residents in a wired Canadian suburb, finding how existing online services are used and what future services people want, providing insight into future connected communities.
4) Robert Kraut found that greater internet use was associated with declines in
This document summarizes research on how characteristics of social media profiles impact perceptions of source credibility. Specifically, it examines how the number of followers and the ratio of followers to follows on Twitter profiles affect judgments of trustworthiness, competence, and goodwill. The research aims to identify factors that influence how people evaluate the credibility of information from social media sources.
2000 - CSCW - Conversation Trees and Threaded ChatsMarc Smith
The document discusses issues with traditional chat interfaces and proposes an alternative called Threaded Chat. Traditional chat ruptures connections between turns and replies by displaying messages in order of arrival rather than conversational order. Threaded Chat aims to address this by structuring messages as threaded responses like online forums, though designed for synchronous use. A study found Threaded Chat allowed equally effective but possibly more efficient discussions than traditional chat.
An Online Social Network for Emergency ManagementConnie White
This document proposes investigating whether an online social network could help facilitate collaboration across different emergency management organizations. It discusses how social networking sites are becoming more popular tools for mass collaboration. The researchers conducted a survey of emergency management students to get preliminary feedback on using social networks for emergency coordination. The results showed strong agreement that social networks could effectively support information sharing and communication during emergencies. The researchers plan to further engage emergency professionals to understand their needs and how a social network could best serve the emergency domain.
This paper aims to analyze potential differences in the temporal patterns of misinformation diffusion compared to factual information diffusion on Twitter. Specifically, it looks at the speed of distribution and whether a lack of evenness in distribution is correlated with misinformation, building on previous research. The researchers found no strong evidence that speed of distribution is directly correlated with validity, but temporal patterns could potentially be used along with other methods to more quickly identify misinformation given the harm it can cause. Understanding how information spreads on social networks is important as both useful and harmful information can diffuse rapidly.
Social Media, Crisis Communication and Emergency Management: Leveraging Web 2...Connie White
Detailing guidelines and safe practices for using social media across a range of emergency management applications‚ Social Media, Crisis Communication, and Emergency Management: Leveraging Web 2.0 Technologies supplies cutting-edge methods to help you inform the public‚ reduce information overload‚ and ultimately‚ save more lives.
Introduces collaborative mapping tools that can be customized to your needs
Explores free and open-source disaster management systems‚ such as Sahana and Ushahidi
Covers freely available social media technologies—including Facebook‚ Twitter‚ and YouTube
This document presents a framework for estimating the prevalence of deceptive reviews in online review communities. It proposes using a machine learning classifier trained to detect deceptive reviews, along with estimates of the classifier's accuracy, within a generative model. The model is used to estimate deception rates in six major review sites like Expedia and Yelp, without requiring gold-standard human annotations. It finds deception rates vary significantly between sites, with sites having lower "signaling costs" for posting reviews generally showing higher estimated deception. When sites take measures to increase costs, like filtering new reviewers, estimated deception decreases substantially.
Visible Effort: A Social Entropy Methodology for Managing Computer-Mediated ...Sorin Adam Matei
A theoretically-grounded learning feedback tool suite, the Visible Effort (VE) Mediawiki extension, is proposed for optimizing online group learning activities by measuring the amount of equality and the emergence of social structure in groups that participate in Computer-Mediated Collaboration (CMC). Building on social entropy theory, drawn from Shannon’s Mathematical Theory of Communication, VE captures levels of CMC unevenness and group structure and visualizes them on wiki Web pages through background colors, charts, and tabular data. Visual information provides users entropic feedback on how balanced and equitable collaboration is within their online group are, while helping them to maintain it within optimal levels. Finally, we present the theoretical and practical implications of VE and the measures behind it, as well as illustrate VE’s capabilities by describing a quasi-experimental teaching activity (use scenario) in tandem with a detailed discussion of theoretical justification, methodological underpinning, and technological capabilities of the approach.
This document is a thesis submitted by Zachary Shaw to the Department of Philosophy at Princeton University in partial fulfillment of the requirements for a Bachelor of Arts degree. The thesis examines how social network sites have affected concepts of personal identity. It begins with background on the development of information technologies and the shortening gap between information input and output. It then discusses theories of personal identity and how social network sites fit into modern life. The thesis argues that only the "meta-patterns view" can adequately explain how social network sites have impacted identity construction. It concludes by arguing against the idea that social network site use should be reduced and states that personal identities are attempts to recognize patterns in ever-changing information.
2009 - Connected Action - Marc Smith - Social Media Network AnalysisMarc Smith
Review of social media network analysis of Internet social spaces like twitter, flickr, email, message boards, etc. Network analysis and visualization of social media collections of connections.
This doctoral research explores how users make sense of and use information when seeking information from web-based resources. The research examines user behaviors and strategies as they interact with information sources to identify examples of how users collect, evaluate, understand, interpret, and integrate new information. Three empirical studies were conducted involving participants completing information seeking tasks. The results provide insights into users' information interaction strategies and sensemaking activities. Implications for the design of technologies to better support sensemaking are discussed.
2009-Social computing-Analyzing social media networksMarc Smith
This document summarizes research on analyzing social networks within enterprises that have adopted social media applications. Key points:
- Social media applications generate social networks as employees interact by creating connections, replying to messages, collaborating on documents, and mentioning common topics. These networks reveal insights into an organization's structure and dynamics.
- Network analysis uses metrics from graph theory to describe network properties like individual roles (e.g. discussion starters), overall shape and size, and each individual's connections. Visualizations can highlight important people, events, and subgroups.
- Early social network analysis relied on manually collected data, limiting its use. Now, automatically captured social media data creates networks without explicit surveys, providing rich new data
Comprehensive Social Media Security Analysis & XKeyscore Espionage TechnologyCSCJournals
Social networks can offer many services to the users for sharing activities events and their ideas. Many attacks can happened to the social networking websites due to trust that have been given by the users. Cyber threats are discussed in this paper. We study the types of cyber threats, classify them and give some suggestions to protect social networking websites of variety of attacks. Moreover, we gave some antithreats strategies with future trends.
This document summarizes a PhD research project exploring how online social media is used to support local governance. It describes a case study of a small community that uses online discussion forums to discuss local issues. Analysis identified five patterns of "governance conversation" including sharing information, providing feedback, coordinating responses, informally mediating issues, and engaging in discussion of complex problems. While no binding decisions were made online, the discussions supported governance action. The research examines how online tools can both facilitate instrumental problem-solving and support expressive deliberation that accommodates pluralism in local governance.
2010-November-8-NIA - Smart Society and Civic Culture - Marc SmithMarc Smith
This document discusses how social media and social networks are enabling new forms of civic participation and collective action. It notes that citizens are increasingly using social media to find government services, engage in discussions, and measure public opinion. The document also discusses how social network analysis can be used to analyze patterns in social media networks and identify influential users. It provides an overview of various social media platforms and the types of social networks and connections that exist within them.
Peace on Facebook? Problematising social media as spaces for intergroup conta...Paul Reilly
The document discusses the role of social media in divided societies and peacebuilding. It summarizes theories on contact between antagonistic groups and the potential for social media to facilitate positive interactions. However, it argues that social media platforms are not neutral spaces and amplify inflammatory content. While online contact may occur, claims about reconciliation are overstated given platforms prioritize engagement over quality of interactions. Ultimately, social media can exacerbate tensions if interactions increase prejudice between groups in divided societies.
Multimode network based efficient and scalable learning of collective behaviorIAEME Publication
This document discusses multimode network-based approaches for efficiently learning collective behavior in large social networks. It provides an overview of existing approaches for predicting collective behavior based on the behaviors of connected individuals. Specifically, it describes methods that extract social dimensions from networks to represent affiliations between actors and then apply supervised learning to determine which dimensions are informative for behavior prediction. However, existing approaches do not scale well to networks with millions of actors. The document proposes a new edge-centric clustering approach to extract sparse social dimensions, enabling the efficient handling of very large networks while maintaining predictive performance.
The Pew Internet and American Life Project released a report examining how consumers find and consume news via a variety of sources and media platforms.
This document discusses leveraging the DDI (Data Documentation Initiative) model for linked statistical data in the social, behavioral, and economic sciences. It outlines how the DDI was developed as an ontology, including using use cases to identify important elements to model and mapping existing DDI-XML documents to DDI-RDF. A key use case is discovering microdata connected across multiple studies based on dimensions like time, country, and subject. The document walks through examples of queries this ontology would support, such as finding questions associated with a concept or the maximum value of a variable. It concludes by identifying some open issues to address in the DDI ontology.
The document discusses the results of an online survey of approximately 200 health activists regarding health care companies' use of social media and the need for regulation. Some key findings include:
- Most respondents agreed that health care companies' use of social media can help people understand health issues, and that companies should participate in social media. However, many felt regulation is needed.
- A majority felt that while company content is valuable, misinformation is still frequently spread online. Most agreed companies should monitor for misinformation.
- Respondents thought upcoming FDA guidance on social media would increase or not affect companies' social media use. Many also felt the delay in guidance had increased or not changed social media use.
La tecnología ha cambiado la forma en que vivimos y trabajamos. Ahora dependemos de computadoras, teléfonos inteligentes y otros dispositivos para comunicarnos y realizar tareas. Si bien la tecnología ha traído muchos beneficios, también plantea nuevos desafíos en áreas como la privacidad, la seguridad cibernética y el futuro del trabajo a medida que más tareas se automatizan.
This NEHI report reviews current tech trends which will impact the future of chronic disease management. The report categorizes these technologies into 4 classes based on the significant evidence supporting clinical and financial benefits. The technologies reviewed are:
Extended Care eVisits
Home Telehealth
In-Car Telehealth
Medication Adherence Tools
Mobile Asthma Management Tools
Mobile Cardiovascular Tools
Mobile Clinical Decision Support
Mobile Diabetes Management Tools
Social Media Promoting Health
Tele-Stroke Care
Virtual Visits
The document discusses using a model-view-controller (MVC) architecture to manage data modeling projects. It describes using an abstract data module based on the DDI ontology with concrete modules for each project that inherit and extend the abstract module. A RESTful interface is proposed to access resources identified by URIs. The abstract data model is implemented as domain classes with attributes and relations according to MVC and can generate views, storage models, and abstract persistence APIs. Extending the DDI ontology allows projects to add custom fields while maintaining compatibility. Sharing source code and data modules between projects via version control is described.
Selas Turkiye Influence And Passivity In Social Media ExcerptedZiya NISANOGLU
This document discusses influence and passivity in social media. It analyzes how information propagates on Twitter and proposes an algorithm to measure user influence that accounts for both popularity and passivity. The algorithm is evaluated using a dataset of 2.5 million Twitter users and is found to better predict URL clicks and identify influential users compared to other measures like page rank or number of followers. The study finds a weak correlation between popularity and influence, with the most influential users not necessarily the most popular. Highly passive users are more likely to be spammers.
Estudio Influencia Y Pasividad En Social Mediaeliasvillagran
This document discusses influence and passivity in social media. It analyzes how information propagates on Twitter and proposes an algorithm to measure user influence that accounts for both popularity and passivity. The algorithm is evaluated on a dataset of 2.5 million Twitter users and is found to better predict URL clicks and identify influential users compared to other measures like page rank or number of followers. The study finds a weak correlation between popularity and influence, with the most influential users not necessarily the most popular. Highly passive users are more likely to be spammers.
Who says what to whom on Twitter? - Twitter flowJuan Sarasua
This document discusses classifying Twitter users into categories and studying information flow between them. The authors introduce a method using Twitter Lists to classify "elite" users like celebrities, media, organizations, and bloggers, versus ordinary users. They find attention is highly concentrated among elite users, though information spreads indirectly through ordinary users as well. Different user categories and content types exhibit different information sharing and consumption patterns.
Towards Decision Support and Goal AchievementIdentifying Ac.docxturveycharlyn
Towards Decision Support and Goal Achievement:
Identifying Action-Outcome Relationships From Social
Media
Emre Kıcıman
Microsoft Research
[email protected]
Matthew Richardson
Microsoft Research
[email protected]
ABSTRACT
Every day, people take actions, trying to achieve their per-
sonal, high-order goals. People decide what actions to take
based on their personal experience, knowledge and gut in-
stinct. While this leads to positive outcomes for some peo-
ple, many others do not have the necessary experience, knowl-
edge and instinct to make good decisions. What if, rather
than making decisions based solely on their own personal
experience, people could take advantage of the reported ex-
periences of hundreds of millions of other people?
In this paper, we investigate the feasibility of mining the
relationship between actions and their outcomes from the
aggregated timelines of individuals posting experiential mi-
croblog reports. Our contributions include an architecture
for extracting action-outcome relationships from social me-
dia data, techniques for identifying experiential social media
messages and converting them to event timelines, and an
analysis and evaluation of action-outcome extraction in case
studies.
1. INTRODUCTION
While current structured knowledge bases (e.g., Freebase)
contain a sizeable collection of information about entities,
from celebrities and locations to concepts and common ob-
jects, there is a class of knowledge that has minimal cov-
erage: actions. Simple information about common actions,
such as the effect of eating pasta before running a marathon,
or the consequences of adopting a puppy, are missing. While
some of this information may be found within the free text of
Wikipedia articles, the lack of a structured or semi-structured
representation make it largely unavailable for computational
usage. With computing devices continuing to become more
embedded in our everyday lives, and mediating an increasing
degree of our interactions with both the digital and physical
world, knowledge bases that can enable our computing de-
vices to represent and evaluate actions and their likely out-
comes can help individuals reason about actions and their
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected]
KDD’15, August 10-13, 2015, Sydney, NSW, Australia.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 978-1-4503-3664-2/15/08 ...$15.00.
DOI: http://dx.doi.org/10.1145 ...
V Międzynarodowa Konferencja Naukowa Nauka o informacji (informacja naukowa) w okresie zmian Innowacyjne usługi informacyjne. Wydział Dziennikarstwa, Informacji i Bibliologii Katedra Informatologii, Uniwersytet Warszawski, Warszawa, 15 – 16 maja 2017
Detecting fake news_with_weak_social_supervisionSuresh S
This document discusses using weak social supervision for detecting fake news on social media. It defines weak supervision and describes how social media data provides a new type of weak supervision called weak social supervision. Weak social supervision can be generated from three aspects of social media - users, posts, and networks. Recent works have shown that user-based, post-based, and network-based weak social supervision can be effective for fake news detection, even with limited labeled data. Weak social supervision is a promising approach for learning tasks where labeled data is scarce.
This document discusses classifying Twitter users into categories to study information flow on the platform. The researchers introduce a method of classifying users as "elite" or ordinary" based on their inclusion on curated lists. Elite users are further divided into media, celebrities, organizations, and bloggers. Analysis finds information flow is concentrated among elite users, though most information reaches ordinary users indirectly. Different user categories emphasize different content types, and content lifespan varies dramatically from less than a day to months.
El informe concluyó en que la información fluye generalmente en dos etapas, afirmando que "casi la mitad de la información procedente de los medios de comunicación pasa a las masas indirectamente, a través de una capa intermedia de líderes de opinión, que aunque se clasifican como usuarios normales, están más conectados y se exponen más a los medios que sus seguidores.
This document analyzes who produces information and who consumes it on the social media platform Twitter. It finds:
1) Elite users, such as celebrities, bloggers, and media organizations, produce around 50% of shared URLs despite being only 20,000 users out of millions. Media produces the most information but celebrities are the most followed.
2) Users tend to follow and listen to others in their same category - celebrities follow celebrities and bloggers follow bloggers. However, bloggers rebroadcast more information than other categories.
3) The "two-step flow" theory of information spreading from media to opinion leaders to the public is supported on Twitter, with elite users playing the role of opinion leaders.
4
Current trends of opinion mining and sentiment analysis in social networkseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Altmetrics are alternative metrics to traditional citations that measure the online attention and impact of scholarly works. They capture how research outputs are discussed and shared on social media and other online platforms. This document provides an introduction to altmetrics, explaining how they originated from the shift to online scholarly communication. It describes various altmetrics providers and the types of metrics they collect for different research outputs like articles, books, data, and software. The benefits of altmetrics for researchers and their role in libraries and information science are also summarized.
Recommender systems in the scope of opinion formation: a modelMarcel Blattner, PhD
1. The document proposes a model to simulate recommendation systems data featuring fat-tailed distributions of item ratings.
2. The model is based on social interactions and opinion formation on a complex network. A threshold mechanism governs whether a user is interested in an item based on their intrinsic item anticipation and influence from neighbors.
3. The model can generate various patterns observed in real recommendation systems data and provides insight into how social processes shape recommender system data.
SECUREWALL-A FRAMEWORK FOR FINEGRAINED PRIVACY CONTROL IN ONLINE SOCIAL NETWORKSZac Darcy
This document summarizes a research paper that proposes a framework called SecureWall to implement fine-grained access controls on online social networks to mitigate privacy leaks. The framework combines multiple security models - Chinese Wall policy for community privacy, Biba model for integrity, and Bell-LaPadula (BLP) model for confidentiality. It was implemented on a prototype social network and evaluated based on information flow metrics and user surveys. Results found the framework reduced information leakage compared to popular social networks while maintaining usability and sociability according to majority of user feedback.
Tfsc disc 2014 si proposal (30 june2014)Han Woo PARK
Technological Forecasting and Social Change Special Issue
http://www.journals.elsevier.com/technological-forecasting-and-social-change/
Special issue title
Open (Big) Data as Social Change: Triple Helix Innovation toward Government 3.0
Associated conference
The 2nd Annual Asian Hub Conference on Triple Helix and Network Sciences (DISC 2014) on Data as Social Culture: Networked Innovation and Government 3.0, to be held on December 11-13, 2014, in Daegu and Gyeongbuk (Gyeongju), Rep. of Korea.
Call for Papers: http://www.slideshare.net/hanpark/disc-2014-cfp-v3
The conference is organized by Asia Triple Helix Society (ATHS). Point of contact: Secretary to Prof. Dr. Han Woo Park (info.disc2014@gmail.com), Department of Media & Communication, YeungNam University, 214-1, Dae-dong, Gyeongsan-si, Gyeongsangbuk-do, South Korea, Zip Code 712-749.
Associate Editors: Managing Guest Editors (MGE)
Wayne Weiai Xu, Doctoral Candidate, SUNY-Buffalo, USA, weiaixu@buffalo.edu
Dr. In Ho Cho, YeungNam University, Rep. of Korea, haihabacho@gmail.com
Important Dates
DISC 2014: 11 to 13 December 2014
Full paper submission: 1 March 2015
Review & Revision period: 1 September 2015
Online Publication: 1 December 2015
* We are also open to non-conference submissions to the special issue. However, the priority will be given to papers presented at the DISC 2014 and its associated seminars.
APPLYING THE TECHNOLOGY ACCEPTANCE MODEL TO UNDERSTAND SOCIAL NETWORKING ijcsit
This study examines the individuals’ participation intentions and behaviour on Social Networking Sites (SNSs). For this purpose, the Technology Acceptance Model (TAM) is utilized and extended in this study through the addition of “perceived social capital” construct aiming to increase its explanatory power and predictive ability in this context. Data collected from a survey of 1100 participants and distilled to 657 usable sets has been analysed to assess the predictive power of proposed model via structural equation modelling. The model proposed in this study explains 56% of the variance in “Participation Intentions” and 55% of the variance in “Participation Behaviour”. Participation of behavioural intention in the model’
explanatory power was the highest amongst the constructs (able to explain 28% of usage behaviour).While, “Attitude” explain around 11% of SNSs usage behaviour. The study findings also show that “Perceived Social Capital” construct has a notable impact on usage behaviour, this impact came indirectly through its direct effect on “Attitude” and “Perceived Usefulness”. Participation of “Perceived Social Capital” in the models' explanatory power was the third highest amongst the constructs. “Perceived Social Capital”, alone explain around 9% of SNSs usage behaviour.
The advancement of Information Technology has hastened the ability to disseminate information across the globe. In particular, the recent trends in ‘Social Networking’ have led to a spark in personally sensitive information being published on the World Wide Web. While such socially active websites are creative tools for expressing one’s personality it also entails serious privacy concerns. Thus, Social Networking websites could be termed a double edged sword. It is important for the law to keep abreast of these developments in technology. The purpose of this paper is to demonstrate the limits of extending existing laws to battle privacy intrusions in the Internet especially in the context of social networking. It is suggested that privacy specific legislation is the most appropriate means of protecting online privacy. In doing so it is important to maintain a balance between the competing right of expression, the failure of which may hinder the reaping of benefits offered by Internet technology
This document provides guidance on developing an effective research proposal. It begins by explaining the purpose of a research proposal and defining key terms. It then discusses important qualities like being engaging, directive, unique, holistic and keen. The document outlines typical parts of a research proposal like the rationale, statement of problem, methodology and references. Examples are provided for some sections. It emphasizes qualities like clarity, structure and avoiding duplication. Overall, the document aims to help researchers effectively plan and communicate their proposed study.
{White Paper} Measuring Global Attention | AppinionsAppinions
Appinions has developed an algorithm to measure influence by tracking how attention flows through networks as people react to and spread opinions. The algorithm analyzes data from various structured and unstructured sources to build influence maps showing connections between influencers. Influence scores represent the total attention received directly and indirectly. The algorithm accounts for different types of evidence and allows influence to dissipate over several connections to model real-world dynamics more accurately than simple link-based models.
Information Contagion through Social Media: Towards a Realistic Model of the ...Axel Bruns
Paper by Axel Bruns, Patrik Wikström, Peta Mitchell, Brenda Moon, Felix Münch, Lucia Falzon, and Lucy Resnyansky presented at the ACSPRI 2016 conference, Sydney, 19-22 July 2016/
This document provides social media guidelines and best practices for CDC employees and contractors using Facebook. It outlines the process for planning, developing, and engaging on Facebook pages including getting necessary approvals, developing branding and comment policies, and ensuring records management and archiving. It recommends keeping posts short and simple, identifying a regular posting schedule and best links, and determining an engagement strategy with fans through things like questions, contests and highlighting other social media.
17% of cell phone owners do most of their online browsing on their phone, rather than a computer or other device. Most do so for convenience, but for some their phone is their only option for online access.
86% of smartphone owners used their phone in the past month to look up information in real-time to help with daily tasks like meeting friends, solving problems, or settling arguments. A Pew Research Center survey found that 70% of cell phone owners used their phone for one of several "just-in-time" information searches in the past 30 days such as coordinating meetings, finding business information, or getting traffic updates. Younger cell phone users and smartphone owners were more likely to conduct these searches. The report provides details on the demographic differences in just-in-time mobile phone usage.
The document is a social media toolkit from the Centers for Disease Control and Prevention (CDC) that provides guidance on using social media for health communication. It covers topics such as developing a social media strategy, evaluating social media efforts, and descriptions of various social media tools including buttons/badges, image sharing, RSS feeds, podcasts, video sharing, widgets, eCards, mobile technologies, Twitter, blogs, and Facebook. It aims to help public health professionals integrate social media into their communication campaigns and activities.
The document provides guidance on writing effective social media content for the Centers for Disease Control and Prevention (CDC). It discusses the importance of understanding the target audience, applying health literacy principles, and using plain language. The document recommends segmenting audiences, avoiding jargon, writing short messages in an active voice, and choosing familiar words and measurements to improve understanding. It aims to help health communicators craft relevant and engaging social media content that promotes health literacy.
Social media, especially Facebook and YouTube, have become ubiquitous platforms that are difficult for marketers to ignore. While social media requires substantial resources to manage effectively, it can positively impact business goals when done right. Studies show social recommendations and sharing can improve customers' perceptions and experiences, and increase important metrics like click-through rates and average order size. However, privacy and trust issues still limit some customers' willingness to engage with social commerce. Marketers need more data to fully understand social media's momentum and potential.
The document summarizes a report by the McKinsey Global Institute about the growth of big data and its potential economic impact. It finds that the amount of data in the world is exploding, with companies and sensors creating trillions of bytes daily. It argues that big data is becoming essential to modern economic activity and that its proliferation means more than just a more intrusive world. The report examines the potential value big data can create for organizations and the economy, and what leaders must do to capture this value.
1. Most young physicians surveyed are employees of medical groups, especially small groups with 6 or fewer physicians. Hospital-based physicians are more likely than others to be employees of large groups or hospitals.
2. Financial factors strongly influence practice arrangements, and many aspire to ownership. While generally satisfied currently, over a quarter considered changing arrangements recently due to financial issues.
3. Physicians are highly pessimistic about the future of U.S. healthcare, chiefly due to concerns about the Affordable Care Act and perceived negative government involvement. Cynicism toward perceived prioritization of money over patients was commonly expressed.
Cincom Synchrony is a software solution that helps healthcare organizations overcome challenges from US healthcare reform through three key capabilities: 1) Intelligent Guidance that provides real-time guidance for customer interactions; 2) a unified customer view that presents holistic patient information; and 3) cross-channel continuity across communication channels. The solution aims to improve care quality and reduce costs in line with reform goals through smarter customer interactions.
This document summarizes key findings from a Pew Research Center report about digital differences and disparities in internet access. Some key points:
- While internet adoption has increased overall, one in five American adults still do not use the internet. Non-users tend to be older, lower-income, less educated, and Spanish speakers.
- Lack of a home broadband connection also persists, with four in ten American adults not having high-speed internet at home. Younger, higher-income, and more educated groups are more likely to have broadband.
- Mobile internet use is increasing access for traditionally underserved groups, but digital differences remain related to age, income, education, disability status, and other
The document summarizes key findings from a survey of 253 corporate marketing decision makers regarding their use of data, digital tools, and marketing ROI measurement. Some of the main findings include:
1) While 91% of marketers believe using customer data is important, many are not collecting the right types of data like mobile or social media data, or are not sharing data effectively across organizations.
2) Marketers have widely adopted new digital tools like social media marketing but are struggling to measure their effectiveness, especially in comparing across different channels.
3) Defining and measuring marketing ROI remains a challenge, with 37% not including financial outcomes in their definition and 57% not basing budgets on ROI analysis. Significant
The document summarizes findings from a Pew Research Center survey about search engine use in 2012. Some key findings include:
- While most users are satisfied with search engine results quality, many are concerned about personal information collection during searches and feel targeted ads and personalized results invade their privacy.
- Google remains the dominant search engine, used by 83% of respondents compared to 6% for Yahoo.
- Overall views of search engine performance are positive, though many users are unaware of how to limit personal data collection from websites.
This document discusses the MoTeCH mobile health project in Ghana which aims to improve maternal and child health outcomes using mobile phones. The project involves developing a system for community health workers to enter patient information and generate reports using basic phones. It also provides educational messages to pregnant women and new mothers based on their due dates. Early lessons indicate a need for language translation and addressing cultural beliefs, while health workers require time savings to encourage participation. The goal is to increase access to care through mobilizing both supply of and demand for health services.
The document summarizes an innovation called the OWD Service that allows patients to text their daily health readings like blood pressure and blood sugar to a monitoring system. The system analyzes the data and alerts physicians if readings breach thresholds. Physicians can access patient histories to determine necessary interventions. A trial of the service among 50 patients found that 32% improved health control and 30% required increased medical care. The innovation is being expanded through a partnership to provide localized health, wellness and nutrition messages directly to subscribers' phones by SMS.
The OneWorldDr Platform is a mobile health system that uses text and email to provide remote patient monitoring and messaging. It allows providers, patients, caregivers and communities to be connected electronically and enables things like monitoring vitals, managing chronic diseases, and sending health messages. A pilot program saw positive results, with 32% of patients improving health and 30% receiving increased care due to abnormal readings detected by the system.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Everyone's An Influencer: Quantifying Influence on Twitter
1. Everyone’s an Influencer:
Quantifying Influence on Twitter
Eytan Bakshy∗ Jake M. Hofman
University of Michigan, USA Yahoo! Research, NY, USA
ebakshy@umich.edu hofman@yahoo-inc.com
Winter A. Mason Duncan J. Watts
Yahoo! Research, NY, USA Yahoo! Research, NY, USA
winteram@yahoo- djw@yahoo-inc.com
inc.com
ABSTRACT Keywords
In this paper we investigate the attributes and relative influ- Communication networks, Twitter, diffusion, influence, word
ence of 1.6M Twitter users by tracking 74 million diffusion of mouth marketing.
events that took place on the Twitter follower graph over
a two month interval in 2009. Unsurprisingly, we find that 1. INTRODUCTION
the largest cascades tend to be generated by users who have
been influential in the past and who have a large number Word-of-mouth diffusion has long been regarded as an im-
of followers. We also find that URLs that were rated more portant mechanism by which information can reach large
interesting and/or elicited more positive feelings by workers populations, possibly influencing public opinion [14], adop-
on Mechanical Turk were more likely to spread. In spite of tion of innovations [26], new product market share [4], or
these intuitive results, however, we find that predictions of brand awareness [15]. In recent years, interest among re-
which particular user or URL will generate large cascades searchers and marketers alike has increasingly focused on
are relatively unreliable. We conclude, therefore, that word- whether or not diffusion can be maximized by seeding a
of-mouth diffusion can only be harnessed reliably by tar- piece of information or a new product with certain spe-
geting large numbers of potential influencers, thereby cap- cial individuals, often called “influentials” [34, 15] or sim-
turing average effects. Finally, we consider a family of hy- ply “influencers,” who exhibit some combination of desirable
pothetical marketing strategies, defined by the relative cost attributes—whether personal attributes like credibility, ex-
of identifying versus compensating potential “influencers.” pertise, or enthusiasm, or network attributes such as connec-
We find that although under some circumstances, the most tivity or centrality—that allows them to influence a dispro-
influential users are also the most cost-effective, under a portionately large number of others [10], possibly indirectly
wide range of plausible assumptions the most cost-effective via a cascade of influence [31, 16].
performance can be realized using “ordinary influencers”— Although appealing, the claim that word-of-mouth diffu-
individuals who exert average or even less-than-average in- sion is driven disproportionately by a small number of key
fluence. influencers necessarily makes certain assumptions about the
underlying influence process that are not based directly on
empirical evidence. Empirical studies of diffusion are there-
Categories and Subject Descriptors fore highly desirable, but historically have suffered from two
H.1.2 [Models and Principles]: User/Machine Systems; major difficulties. First, the network over which word-of-
J.4 [Social and Behavioral Sciences]: Sociology mouth influence spreads is generally unobservable, hence
influence is difficult to attribute accurately, especially in in-
stances where diffusion propagates for multiple steps [29,
General Terms 21]. And second, observational data on diffusion are heavily
biased towards “successful” diffusion events, which by virtue
Human Factors
of being large are easily noticed and recorded; thus infer-
∗
Part of this research was performed while the author was ences regarding the attributes of success may also be biased
visiting Yahoo! Research, New York. [5, 9], especially when such events are rare [8].
For both of these reasons, the micro-blogging service Twit-
ter presents a promising natural laboratory for the study
of diffusion processes. Unlike other user-declared networks
Permission to make digital or hard copies of all or part of this work for (e.g. Facebook), Twitter is expressly devoted to dissem-
personal or classroom use is granted without fee provided that copies are inating information, in that users subscribe to broadcasts
not made or distributed for profit or commercial advantage and that copies of other users; thus the network of “who listens to whom”
bear this notice and the full citation on the first page. To copy otherwise, to can be reconstructed by crawling the corresponding “follower
republish, to post on servers or to redistribute to lists, requires prior specific graph”. In addition, because users frequently wish to share
permission and/or a fee.
WSDM’11, February 9–12, 2011, Hong Kong, China. web-content, and because tweets are restricted to 140 char-
Copyright 2011 ACM 978-1-4503-0493-1/11/02 ...$10.00. acters in length, a popular strategy has been to use URL
2. shorteners (e.g. bit.ly, TinyURL, etc.), which effectively insights that we believe are of general relevance to word-of-
tag distinct pieces of content with unique, easily identifi- mouth marketing and related activities.
able tokens. Together these features allow us to track the The remainder of the paper is organized as follows. We
diffusion patterns of all instances in which shortened URLs review related work on modeling diffusion and quantifying
are shared on Twitter, regardless of their success, thereby influence in Section 2. In Sections 3 and 4 we provide an
addressing both the observability and sampling difficulties overview of the collected data, summarizing the structure of
outlined above. URL cascades on the Twitter follower graph. In Section 5,
The Twitter ecosystem is also well suited to studying the we present a predictive model of influence, in which cascade
role of influencers. In general, influencers are loosely defined sizes of posted URLs are predicted using the individuals’ at-
as individuals who disproportionately impact the spread of tributes and average size of past cascades. Section 6 explores
information or some related behavior of interest [34, 10, 15, the relationship between content as characterized by workers
11]. Unfortunately, however, this definition is fraught with on Amazon’s Mechanical Turk and cascade size. Finally, in
ambiguity regarding the nature of the influence in ques- Section 7 we use our predictive model of cascade size to ex-
tion, and hence the type of individual who might be con- amine the cost-effectiveness of targeting individuals to seed
sidered special. Ordinary individuals communicating with content.
their friends, for example, may be considered influencers,
but so may subject matter experts, journalists, and other 2. RELATED WORK
semi-public figures, as may highly visible public figures like
A number of recent empirical papers have addressed the
media representatives, celebrities, and government officials.
matter of diffusion on networks in general, and the attributes
Clearly these types of individuals are capable of influencing
and roles of influencers specifically. In early work, Gruhl et
very different numbers of people, but may also exert quite
al [13] attempted to infer a transmission network between
different types of influence on them, and even transmit in-
bloggers, given time-stamped observations of posts and as-
fluence through different media. For example, a celebrity
suming that transmission was governed by an independent
endorsing a product on television or in a magazine adver-
cascade model. Contemporaneously, Adar and Adamic [1]
tisement presumably exerts a different sort of influence than
used a similar approach to reconstruct diffusion trees among
a trusted friend endorsing the same product in person, who
bloggers, and shortly afterwards Leskovec et al. [20] used re-
in turn exerts a different sort of influence than a noted ex-
ferrals on an e-commerce site to infer how individuals are
pert writing a review.
influenced as a function of how many of their contacts have
In light of this definitional ambiguity, an especially useful
recommended a product.
feature of Twitter is that it not only encompasses various
A limitation of these early studies was the lack of “ground
types of entities, but also forces them all to communicate
truth” data regarding the network over which the diffusion
in roughly the same way: via tweets to their followers. Al-
was taking place. Addressing this problem, more recent
though it remains the case that even users with the same
studies have gathered data both on the diffusion process
number of followers do not necessarily exert the same kind
and the corresponding network. For example, Sun et al. [29]
of influence, it is at least possible to measure and compare
studied diffusion trees of fan pages on Facebook, Bakshy et
the influence of individuals in a standard way, by the ac-
al. [3] studied the diffusion of “gestures” between friends in
tivity that is observable on Twitter itself. In this way, we
Second Life, and Aral et al. [2] studied adoption of a mobile
avoid the need to label individuals as either influencers or
phone application over the Yahoo! messenger network. Most
non-influencers, simply including all individuals in our study
closely related to the current research is a series of recent pa-
and comparing their impact directly.
pers that examine influence and diffusion on Twitter specif-
We note, however, that our use of the term influencer cor-
ically. Kwak et al. [18] compared three different measures
responds to a particular and somewhat narrow definition of
of influence—number of followers, page-rank, and number
influence, specifically the user’s ability to post URLs which
of retweets—finding that the ranking of the most influential
diffuse through the Twitter follower graph. We restrict our
users differed depending on the measure. Cha et al. [7] also
study to users who “seed” content, meaning they post URLs
compared three different measures of influence—number of
that they themselves have not received through the follower
followers, number of retweets, and number of mentions—
graph. We quantify the influence of a given post by the num-
and also found that the most followed users did not neces-
ber of users who subsequently repost the URL, meaning that
sarily score highest on the other measures. Finally, Weng et
they can be traced back to the originating user through the
al. [35] compared number of followers and page rank with a
follower graph. We then fit a model that predicts influence
modified page-rank measure that accounted for topic, again
using an individual’s attributes and past activity and ex-
finding that ranking depended on the influence measure.
amine the utility of such a model for targeting users. Our
The present work builds on these earlier contributions in
emphasis on prediction is particularly relevant to our mo-
three key respects. First, whereas previous studies have
tivating question. In marketing, for example, the practical
quantified influence either in terms of network metrics (e.g.
utility of identifying influencers depends entirely on one’s
page rank) or the number of direct, explicit retweets, we
ability to do so in advance. Yet in practice, it is very often
measure influence in terms of the size of the entire diffusion
the case that influencers are identified only in retrospect,
tree associated with each event (Kwak et al [18] also compute
usually in the aftermath of some outcome of interest, such
what they call “retweet trees” but they do not use them as a
as the unexpected success of a previously unknown author
measure of influence). While related to other measures, the
or the sudden revival of a languishing brand [10]. By em-
size of the diffusion tree is more directly associated with dif-
phasizing ex-ante prediction of influencers over ex-post ex-
fusion and the dissemination of information (Goyal et al [12],
planation, our analysis highlights some simple but useable
it should be noted, do introduce a similar metric to quantify
influence; however, their interest is in identifying community
3. the same two-month period. We did this by querying the
Twitter API to find the followers of every user who posted
2
a bit.ly URL. Subsequently, we placed those followers in a
10
queue to be crawled, thereby identifying their followers, who
were then also placed in the queue, and so on. In this way,
10 4
we obtained a large fraction of the Twitter follower graph
comprising all active bit.ly posters and anyone connected to
these users via one-way directed chains of followers. Specifi-
Density
6
10
cally, the subgraph comprised approximately 56M users and
8
1.7B edges.
10
Consistent with previous work [7, 18, 35], both the in-
degree (‘followers”) and out-degree (“friends”) distributions
10 10
are highly skewed, but the former much more so—whereas
the maximum # of followers was nearly 4M, the maximum
# of friends was only about 760K—reflecting the passive
101 102 103 104 and one-way nature of the “follow” action on Twitter (i.e.
URLs posted A can follow B without any action required from B). We
emphasize, moreover, that because the crawled graph was
seeded exclusively with active users, it is almost certainly
Figure 1: Probability density of number of bit.ly not representative of the entire follower graph. In particular,
URLs posted per user active users are likely to have more followers than average,
in which case we would expect that the average in-degree
will exceed the average out-degree for our sample—as indeed
we observe. Table 1 presents some basic statistics of the
“leaders,” not on prediction.) Second, whereas the focus of distributions of the number of friends, followers and number
previous studies has been largely descriptive (e.g. compar- of URLs posted per user.
ing the most influential users), we are interested explicitly in
predicting influence; thus we consider all users, not merely
the most influential. Third, in addition to predicting diffu- Table 1: Statistics of the Twitter follower graph and
sion as a function of the attributes of individual seeds, we seed activity
also study the effects of content. We believe these differ- # Followers # Friends # Seeds Posted
ences bring the understanding of diffusion on Twitter closer Median 85.00 82.00 11.00
to practical applications, although as we describe later, ex- Mean 557.10 294.10 46.33
perimental studies are still required. Max. 3,984,000.00 759,700.00 54,890
3. DATA
To study diffusion on Twitter, we combined two separate
but related sources of data. First, over the two-month pe- 4. COMPUTING INFLUENCE ON TWITTER
riod of September 13 2009 - November 15 2009 we recorded To calculate the influence score for a given URL post,
all 1.03B public tweets broadcast on Twitter, excluding Oc- we tracked the diffusion of the URL from its origin at a
tober 14-16 during which there were intermittent outages in particular “seed” node through a series of reposts—by that
the Twitter API. Of these, we extracted 87M tweets that user’s followers, those users’ followers, and so on—until the
included bit.ly URLs and which corresponded to distinct diffusion event, or cascade, terminated. To do this, we used
diffusion “events,” where each event comprised a single ini- the time each URL was posted: if person B is following
tiator, or “seed,” followed by some number of repostings of person A, and person A posted the URL before B and was
the same URL by the seed’s followers, their followers, and so the only of B’s friends to post the URL, we say person A
on1 . Finally, we identified a subset of 74M diffusion events influenced person B to post the URL. As Figure 2 shows,
that were initiated by seed users who were active in both if B has more than one friend who has previously posted
the first and second months of the observation period; thus the same URL, we have three choices for how to assign the
enabling us to train our regression model on first month corresponding influence: first, we can assign full credit to the
performance in order to predict second-month performance friend who posted it first; second we can assign full credit to
(see Section 5). In total, we identified 1.6M seed users who the friend who posted it most recently (i.e. last); and third,
seeded an average of 46.33 bit.ly URLs each. Figure 1 shows we can split credit equally among all prior-posting friends.
the distribution of bit.ly URL posts by seed. These three assignments effectively make different assump-
Second, we crawled the portion of the follower graph com- tions about the influence process: “first influence” rewards
prising all users who had broadcast at least one URL over primacy, assuming that individuals are influenced when they
first see a new piece of information, even if they fail to im-
1
Our decision to restrict attention to bit.ly URLs was made mediately act on it, during which time they may see it again;
predominantly for convenience, as bit.ly was at the time “last influence” assumes the opposite, instead attributing in-
by far the dominant URL shortener on Twitter. Given the
fluence to the most recent exposure; and “split influence”
size of the population of users who rely on bit.ly, which is
comparable to the size of all active Twitter users, it seems assumes either that the likelihood of noticing a new piece
unlikely to differ systematically from users who rely on other of information, or equivalently the inclination to act on it,
shorteners. accumulates steadily as the information is posted by more
4. Figure 2: Three ways of assigning influence to mul-
tiple sources
friends. Having defined immediate influence, we can then
construct disjoint influence trees for every initial posting of a
URL. The number of users in these influence trees—referred
to as “cascades”—thus define the influence score for every
seed. See Figure 3 for some examples of cascades. To check Figure 3: Examples of information cascades on
that our results are not an artifact of any particular assump- Twitter.
tion about how individuals are influenced to repost infor-
mation, we conducted our analysis for all three definitions.
Although particular numerical values varied slightly across there are many reasons why individuals may choose to pass
the three definitions, the qualitative findings were identical; along information other than the number and identity of
thus for simplicity we report results only for first influence. the individuals from whom they received it—in particular,
Before proceeding, we note that our use of reposting to the nature of the content itself. Moreover, influencing an-
indicate influence is somewhat more inclusive than the con- other individual to pass along a piece of information does not
vention of “retweeting” (e.g. using the terminology “RT necessarily imply any other kind of influence, such as influ-
@username”) which explicitly attributes the original user. encing their purchasing behavior, or political opinion. Our
An advantage of our approach is that we can include in our use of the term “influencer” should therefore be interpreted
observations all instances in which a URL was reposted re- as applying only very narrowly to the ability to consistently
gardless of whether it was acknowledged by the user, thereby seed cascades that spread further than others. Nevertheless,
greatly increasing the coverage of our observations. (Since differences in this ability, such as they do exist, can be con-
our study, Twitter has introduced a “retweet” feature that sidered a certain type of influence, especially when the same
arguably increases the likelihood that reposts will be ac- information (in this case the same original URL) is seeded
knowledged, but does not guarantee that they will be.) How- by many different individuals. Moreover, the terms “influen-
ever, a potential disadvantage of our definition is that it tials” and “influencers” have often been used in precisely this
may mistakenly attribute influence to what is in reality a se- manner [3]; thus our usage is also consistent with previous
quence of independent events. In particular, it is likely that work.
users who follow each other will have similar interests and
so are more likely to post the same URL in close succession
than random pairs of users. Thus it is possible that some 5. PREDICTING INDIVIDUAL INFLUENCE
of what we are labeling influence is really a consequence of We now investigate an idealized version of how a mar-
homophily [2]. From this perspective, our estimates of in- keter might identify influencers to seed a word-of-mouth
fluence should be viewed as an upper bound. campaign [16], where we note that from a marketer’s per-
On the other hand, there are reasons to think that our spective the critical capability is to identify attributes of
measure underestimates actual influence, as re-broadcasting individuals that consistently predict influence. Reiterating
a URL is a particularly strong signal of interest. A weaker that by “influence” we mean a user’s ability to seed content
but still relevant measure might be to observe whether a containing URLs that generate large cascades of reposts, we
given user views the content of a shortened URL, imply- therefore begin by describing the cascades we are trying to
ing that they are sufficiently interested in what the poster predict.
has to say that they will take some action to investigate As Figure 4a shows, the distribution of cascade sizes is
it. Unfortunately click-through data on bit.ly URLs are of- approximately power-law, implying that the vast majority
ten difficult to interpret, as one cannot distinguish between of posted URLs do not spread at all (the average cascade
programmatic unshortening events—e.g., from crawlers or size is 1.14 and the median is 1), while a small fraction
browser extensions—and actual user clicks. Thus we instead are reposted thousands of times. The depth of the cascade
relied on reposting as a conservative measure of influence, (Figure 4b) is also right skewed, but more closely resembles
acknowledging that alternative measures of influence should an exponential distribution, where the deepest cascades can
also be studied as the platform matures. propagate as far as nine generations from their origin; but
Finally, we reiterate that the type of influence we study again the vast majority of URLs are not reposted at all,
here is of a rather narrow kind: being influenced to pass corresponding to cascades of size 1 and depth 0 in which
along a particular piece of information. As we discuss later, the seed is the only node in the tree. Regardless of whether
5. where the left (right) child is followed if the condition is sat-
G G isfied (violated). Leaf nodes give the predicted influence—as
10−1 107
measured by (log) mean cascade size— for the corresponding
10−2
G
G
106 G
partition. Thus, for example, the right-most leaf indicates
G
that users with upwards of 1870 followers who had on aver-
10−3
Frequency
105
G G
age 6.2 reposts by direct followers (past local influence) are
Density
G
−4 G 4 G
10 10
−5
G
G 3 G
predicted to have the largest average total influence, gener-
10 10
G G ating cascades of approximately 8.7 additional posts.
−6 G
2
10
G
10 G Unsurprisingly, the model indicates that past performance
10−7 101
G
G
G G
G
provides the most informative set of features, although it is
10 0
10 1
10 2
10 3
10 4
0 2 4 6 8
the local, not the total influence that is most informative;
Size Depth this is likely due to the fact that most non-trivial cascades
are of depth 1, so that past direct adoption is a reliable pre-
(a) Cascade Sizes (b) Cascade Depths
dictor of total adoption. Also unsurprisingly, the number of
followers is an informative feature. Notably, however, these
Figure 4: (a). Frequency distribution of cascade are the only two features present in the regression tree, en-
sizes. (b). Distribution of cascade depths. abling us to visualize influence as a function of these features,
as shown in Figure 6. This result is somewhat surprising,
as one might reasonably have expected that individuals who
we study size or depth, therefore, the implication is that follow many others, or very few others, would be distinct
most events do not spread at all, and even moderately sized from the average user. Likewise, one might have expected
cascades are extremely rare. that activity level, quantified by the number of tweets, would
To identify consistently influential individuals, we aggre- also be predictive.
gated all URL posts by user and computed individual-level Figure 7 shows the fit of the regression tree model for all
influence as the logarithm of the average size of all cascades five cross-validation folds. The location of the circles indi-
for which that user was a seed. We then fit a regression cates the mean predicted and actual values at each leaf of
tree model [6], in which a greedy optimization process recur- the trees, with leaves from different cross-validation folds ap-
sively partitions the feature space, resulting in a piecewise- pearing close to each other; the size of the circles indicates
constant function where the value in each partition is fit to the number of points in each leaf, while the bars show the
the mean of the corresponding training data. An important standard deviation of the actual values at each leaf. The
advantage of regression trees over ordinary linear regression model is extremely well calibrated, in the sense that the
(OLR) in this context is that unlike OLR, which tends to prediction of the average value at each cut of the regres-
fit the vast majority of small cascades at the expense of sion tree is almost exactly the actual average (R2 = 0.98).
larger ones, the piecewise constant nature of the regression This appearance, however, is deceiving. In fact, the model
tree function allows cascades of different sizes to be fit in- fit without averaging predicted and actual values at the leaf
dependently. The result is that the regression tree model nodes is relatively poor (R2 = 0.34), reflecting that although
is much better calibrated than the equivalent OLR model. large cascades tend to be driven by previously successful in-
Moreover, we used folded cross-validation [25] to terminate dividuals with many followers, the extreme scarcity of such
partitioning to prevent over-fitting. Our model included the cascades means that most individuals with these attributes
following features as predictors: are not successful either. Thus, while large follower count
and past success are likely necessary features for future suc-
1. Seed user attributes
cess, they are far from sufficient.
(a) # followers These results place the usual intuition about influencers
in perspective: individuals who have been influential in the
(b) # friends past and who have many followers are indeed more likely to
(c) # tweets, be influential in the future; however, this intuition is cor-
rect only on average. We also emphasize that these results
(d) date of joining are based on far more observational data than is typically
2. Past influence of seed users available to marketers—in particular, we have an objec-
tive measure of influence and extensive data on past per-
(a) average, minimum, and maximum total influence formance. Our finding that individual-level predictions of
influence nevertheless remain relatively unreliable therefore
(b) average, minimum, and maximum local influence, strongly suggests that rather than attempting to identify ex-
ceptional individuals, marketers seeking to exploit word-of-
where past local influence refers to the average number of
mouth influence should instead adopt portfolio-style strate-
reposts by that user’s immediate followers in the first month
gies, which target many potential influencers at once and
of the observation period, and past total influence refers to
therefore rely only on average performance [33].
average total cascade size over the same period. Followers,
friends, number of tweets, and influence (actual and past)
were all log-transformed to account for their skewed distri- 6. THE ROLE OF CONTENT
butions. We then compared predicted influence with actual An obvious objection to the above analysis is that it fails
influence computed from the second month of observations. to account for the nature of the content that is being shared.
Figure 5 shows the regression tree for one of the folds. Clearly one might expect that some types of content (e.g.
Conditions at the nodes indicate partitions of the features, YouTube videos) might exhibit a greater tendency to spread
6. log10(pastLocalInfluence + 1)< 0.1763
|
log10(pastLocalInfluence + 1)< 0.0416 log10(followers + 1)< 3.272
log10(followers + 1)< 2.157 log10(followers + 1)< 2.562 log10(followers + 1)< 2.519 log10(pastLocalInfluence + 1)< 0.4921
log10(pastLocalInfluence + 1)< 0.09791 log10(pastLocalInfluence + 1)< 0.3028 log10(pastLocalInfluence + 1)< 0.3027 log10(pastLocalInfluence + 1)< 0.856
0.0124 0.03631 0.05991 0.1229
0.09241 0.1452 0.1929 0.3045 0.275 0.4118 0.6034 0.9854
Figure 5: Regression tree fit for one of the five cross-validation folds. Leaf nodes give the predicted influence
for the corresponding partition, where the left (right) child is followed if the node condition is satisfied
(violated).
britneyspears BarackObama
cnnbrk
106 stephenfry
TreySongz
pigeonPOLL
nprnews
105
MrEdLover disneypolls
iphone_dev
Orbitz geohot
Followers
riskybusinessmb
mslayel marissamayer
10 4 billprady
michelebachmann
TreysAngels
garagemkorova
103
wealthtv
OFA_TX
102
10-1 100 101 102
Past Local Influence
(a) All users (b) Top 25 users
Figure 6: Influence as a function of past local influence and number of followers for (a) all users and (b)
users with the top 25 actual influence. Each circle represents a single seed user, where the size of the circle
represents that user’s actual average influence.
than others (e.g. news articles of specialized interest), or First, we filtered URLs that we knew to be spam or in a lan-
that even the same type of content might vary considerably guage other than English. Second, we binned all remaining
in interestingness or suitability for sharing. Conceivably, one URLs in logarithmic bins choosing an exponent such that
could do considerably better at predicting cascade size if in we obtained ten bins in total, and the top bin contained the
addition to knowing the attributes of the seed user, one also 100 largest cascades. Third, we sampled all 100 URLs in
knew something about the content of the URL being seeded. the top bin, and randomly sampled 100 URLs from each of
To test this idea, we used humans to classify the content the remaining bins. In this way, we ensured that our sample
of a sample of 1000 URLs from our study. An advantage would reflect the full distribution of cascade sizes.
of this approach over an automated classifier or a topic- Given this sample of URLs, we then used Amazon’s Me-
model [35] is that humans can more easily rate content on chanical Turk (AMT) to recruit human classifiers. AMT
attributes like “interestingness” or “positive feeling” which is a system that allows one to recruit workers to do small
are often quite difficult for a machine. A downside of us- tasks for small payments, and has been used extensively for
ing humans, however, is that the number of URLs we can survey and experimental research [17, 24, 23, 22] and to ob-
classify in this way is necessarily small relative to the total tain labels for data [28, 27]. We asked the workers to go
sample. Moreover, because the distribution of cascade sizes to the web page associated with the URL and answer ques-
is so skewed, a uniform random sample of 1000 URLs would tions about it. Specifically, we asked them to classify the
almost certainly not contain any large cascades; thus we in- site as “Spam / Not Spam / Unsure”, as “Media Sharing /
stead obtained a stratified sample in the following manner. Social Networking, Blog / Forum, News / Mass Media, or
7. 1. Rated interestingness
2. Perceived interestingness to an average person
1.2
G 3. Rated positive feeling
1.0 G
G
G
G
4. Willingness to share via Email, IM, Twitter, Facebook
Actual Influence
0.8 or Digg.
G
G
5. Indicator variables for type of URL (see Figure 8a)
0.6 G
G
G
GG 6. Indicator variable for category of content (see Fig-
0.4 G
G
ure 8b)
G
GGG
GG
GG
0.2 G
G
G Figure 10 shows the model fit including content. Surpris-
G
G
G
G ingly, none of the content features were informative rela-
G
0.0
G
G tive to the seed user features (hence we omit the regression
tree itself, which is essentially identical to Figure 5), nor
0.2 0.4 0.6 0.8 1.0 was the model fit (R2 = 0.31) improved by the addition of
Predicted Influence the content features. We note that the slight decrease in
fit and calibration compared to the content-free model can
be attributed to two main factors: first, the training set
Figure 7: Actual vs. predicted influence for regres- size is orders of magnitude smaller for the content model
sion tree. The model assigns each seed user to a leaf as we have fewer hand-labeled URLs, and second, here we
in the regression tree. Points representing the av- are making predictions at the single post level, which has
erage actual influence values are placed at the pre- higher variance than the user-averaged influence predicted
diction made by each leaf, with vertical lines rep- in the content-free model.
resenting one standard deviation above and below. These results are initially surprising, as explanations of
The size of the point is proportional to the num- success often do invoke the attributes of content to account
ber of diffusion events assigned to the leaf. Note for what is successful. However, the reason is essentially the
that all five folds are represented here, including the same as above—namely that most explanations of success
fold represented in Figure 5, where proximate lines tend to focus only on observed successes, which invariably
correspond to leaves from different cross-validation represent a small and biased sample of the total population
folds. of events. When the much larger number of non-successes
are also included, it becomes difficult to identify content-
based attributes that are consistently able to differentiate
success from failure at the level of individual events.
Other”, and then to specify one of 10 categories it fell into
(see Figure 8b). We then asked them to rate how broadly
relevant the site was, on a scale from 0 (very niche) to 100 7. TARGETING STRATEGIES
(extremely broad). We also gauged their impression of the Although content was not found to improve predictive per-
site, including how interesting they felt the site was (7-point formance, it remains the case that individual-level attributes—
Likert scale), how interesting the average person would find in particular past local influence and number of followers—
it (7-point Likert scale), and how positively it made them can be used to predict average future influence. Given this
feel (7-point Likert scale). Finally, we asked them to indi- observation, a natural next question is how a hypothetical
cate if they would share the URL using any of the following marketer might exploit available information to optimize the
services: Email, IM, Twitter, Facebook, or Digg. diffusion of information by systematically targeting certain
To ensure that our ratings and classifications were reliable, classes of individuals. In order to answer such a question,
we had each URL rated at least 3 times—the average URL however, one must make some assumptions regarding the
was rated 11 times, and the maximum was rated 20 times. costs of targeting individuals and soliciting their coopera-
If more than three workers marked the URL as a bad link tion.
or in a foreign language, it was excluded. In addition, we To illustrate this point we now evaluate the cost-effectiveness
excluded URLs that were marked as spam by the majority of of a hypothetical targeting strategy based on a simple but
workers. This resulted in 795 URLs that we could analyze. plausible family of cost functions ci = ca +fi cf , where ca rep-
As Figure 9 shows, content that is rated more interesting resents a fixed “acquisition cost” ca per individual i, and cf
tends to generate larger cascades on average, as does content represents a “cost per follower” that each individual charges
that elicits more positive feelings. In addition, Figure 8 the marketer for each “sponsored” tweet. Without loss of
shows that certain types of URLs, like those associated with generality we have assumed a value of cf = $0.01, where
shareable media, tend to spread more than URLs associated the choice of units is based on recent news reports of paid
with news sites, while some types of content (e.g “lifestyle”) tweets (http://nyti.ms/atfmzx). For convenience we express
spread more than others. the acquisition cost as multiplier α of the per-follower cost;
To evaluate the additional predictive power of content, we hence ca = αcf .
repeat the regression tree analysis of Section 5 for this subset Because the relative cost of targeting potential “influencers”
of URLs, adding the following content-based features: is an unresolved empirical question, we instead explore a
8. Lifestyle
Media Sharing /
Social Networking Technology
Offbeat
General Category
Entertainment
Blog / Forum
Type of URL
Gaming
Science
Other
Write In / Other
News
Business and Finance
News / Mass Media
Sports
0 50 100 150 0 50 100 150
Cascade Size Cascade Size
(a) Type of URL (b) Content Category for URL
Figure 8: (a). Average cascade size for different type os URLs (b). Average cascade size for different
categories of content. Error bars are standard errors
wide range of possible assumptions by varying α. For ex-
ample, choosing α to be small corresponds to a cost func- 170
tion that is biased towards individuals with relatively few 160
160
followers, who are cheap and numerous. Conversely when 150
α is sufficiently large, the acquisition cost will tilt toward 150
Cascade Size
Cascade Size
140
targeting a small number of highly influential users, mean- 140
ing users with a larger number of followers and good track 130
130
records. Regardless, one must trade off between the number 120
120
of followers per influencer and the number of individuals who 110
110
can be targeted, where the optimal tradeoff will depend on
100
α. To explore the full range of possibilities allowed by this 1 2 3 4 5 6 7 1 2 3 4 5 6 7
family of cost functions, for each value of α we binned users Interest Feeling
according to their influence as predicted by the regression (a) Interesting (b) Positive Feeling
tree model and computed the average influence-per-dollar
of the targeted subset for each bin. Figure 9: (a). Average cascade size for different in-
As Figure 11 shows, when α = 0—corresponding to a terest ratings (b). Average cascade size for different
situation in which individuals can be located costlessly— ratings of positive feeling. Error bars are standard
we find that by far the most cost-effective category is to errors.
target the least influential individuals, who exert over fif-
teen times the influence-per-dollar of the most influential
category. Although these individuals are much less influ-
ential (average influence score ≈ 0.01) than average, they
8. CONCLUSIONS
also have relatively few followers (average ≈ 14); thus are In light of the emphasis placed on prominent individuals
relatively inexpensive. At the other extreme, when α be- as optimal vehicles for disseminating information [19], the
come sufficiently large—here α 100, 000, corresponding possibility that “ordinary influencers”—individuals who ex-
to an acquisition cost ca = $1, 000—we recover the result ert average, or even less-than-average influence—are under
that highly influential individuals are also the most cost- many circumstances more cost-effective, is intriguing. We
effective. Although expensive, these users will be preferred emphasize, however, that these results are based on statisti-
simply because the acquisition cost prohibits identifying and cal modeling of observational data and do not imply causal-
managing large numbers of influencers. ity. It is quite possible, for example, that content seeded
Finally, Figure 11 reveals that although the most cost- by outside sources—e.g., marketers—may diffuse quite dif-
efficient category of influencers corresponds to increasingly ferently than content selected by users themselves. Like-
influential individuals as α increases, the transition is sur- wise, while we have considered a wide range of possible cost
prisingly slow. For example, even for values of α as high as functions, other assumptions about costs are certainly possi-
10,000, (i.e. equivalent to ca = $100) the most cost-efficient ble and may lead to different conclusions. For reasons such
influencers are still relatively ordinary users, who exhibit as these, our conclusions therefore ought to be viewed as
approximately average influence and connectivity. hypotheses to be tested in properly designed experiments,
not as verified causal statements. Nevertheless, our find-
9. 0.035 0
2.5 G
G
G
10
Mean actual influence / total cost
G
G
0.030 100
G
2.0 1000
G
0.025
10000
Actual Influence
G G
G G
1.5 0.020 1e+05
G
G G
G
G
G
0.015
1.0 G
G G
G
G
G G
GG
G
G
G G
G 0.010
G
0.5 G
G
G
G G
G
G
G
G
G
G 0.005
G
GG
GGG
G
G
G
G
G GG G
GG
G
GG
G
0.0 G
G G
0.0 0.5 1.0 1.5 2.0 2.5 10−1 10−0.5 100 100.5
Predicted Influence Mean predicted influence
Figure 10: Actual vs. predicted influence for regres-
sion tree including content Figure 11: Return on investment where targeting
cost is defined as ci = ca + fi cf , ca = αcf and α ∈
{0, 10, 100, 1000, 10000, 100000}.
ing regarding the relative efficacy of ordinary influencers is
consistent with previous theoretical work [32] that has also
questioned the feasibility of word-of-mouth strategies that Stanford, California, 2009. Association of Computing
depend on triggering “social epidemics” by targeting special Machinery.
individuals. [4] F. M. Bass. A new product growth for model consumer
We also note that although Twitter is in many respects a durables. Management Science, 15(5):215–227, 1969.
special case, our observation that large cascades are rare is [5] R. A. Berk. An introduction to sample selection bias
likely to apply in other contexts as well. Correspondingly, in sociological data. American Sociological Review,
our conclusion that word-of-mouth information spreads via 48(3):386–398, 1983.
many small cascades, mostly triggered by ordinary individ- [6] L. Breiman, J. Friedman, R. Olshen, and C. Stone.
uals, is also likely to apply generally, as has been suggested Classification and regression trees. Chapman &
elsewhere [33]. Marketers, planners and other change agents Hall/CRC, 1984.
interested in harnessing word-of-mouth influence could there-
[7] M. Cha, H. Haddadi, F. Benevenuto, and K. P.
fore benefit first by adopting more precise metrics of influ-
Gummad. Measuring user influence on twitter: The
ence; second by collecting more and better data about poten-
million follower fallacy. In 4th Int’l AAAI Conference
tial influencers over extended intervals of time; and third, by
on Weblogs and Social Media, Washington, DC, 2010.
potentially exploiting ordinary influencers, where the opti-
mal tradeoff between the number of individuals targeted and [8] R. M. Dawes. Everyday irrationality: How
their average level of influence will depend on the specifics pseudo-scientists, lunatics, and the rest of us
of the cost function in question. systematically fail to think rationally. Westview Pr,
2002.
[9] J. Denrell. Vicarious learning, undersampling of
9. ACKNOWLEDGMENTS failure, and the myths of management. Organization
We thank Sharad Goel for helpful conversations. Science, 14(3):227–243, 2003.
[10] M. Gladwell. The Tipping Point: How Little Things
10. REFERENCES Can Make a Big Difference. Little Brown, New York,
[1] E. Adar and A. Adamic, Lada. Tracking information 2000.
epidemics in blogspace. In 2005 IEEE/WIC/ACM [11] J. Goldenberg, S. Han, D. R. Lehmann, and J. W.
International Conference on Web Intelligence, Hong. The role of hubs in the adoption process.
Compiegne University of Technology, France, 2005. Journal of Marketing, 73(2):1–13, 2009.
[2] S. Aral, L. Muchnik, and A. Sundararajan. [12] A. Goyal, F. Bonchi, and L. V. S. Lakshmanan.
Distinguishing influence-based contagion from Discovering leaders from community actions. pages
homophily-driven diffusion in dynamic networks. 499–508. ACM, 2008. Proceeding of the 17th ACM
Proceedings of the National Academy of Sciences, conference on Information and knowledge
106(51):21544, 2009. management.
[3] E. Bakshy, B. Karrer, and A. Adamic, Lada. Social [13] D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins.
influence and the diffusion of user-created content. In Information diffusion through blogspace. pages
10th ACM Conference on Electronic Commerce, 491–501. ACM New York, NY, USA, 2004.
10. [14] E. Katz and P. F. Lazarsfeld. Personal influence; the [24] S. A. Munson and P. Resnick. Presenting diverse
part played by people in the flow of mass political opinions: how and how much. pages
communications. Free Press, Glencoe, Ill. 1955. 1457–1466. ACM, 2010.
”
[15] E. Keller and J. Berry. The Influentials: One [25] R. R. Picard and R. D. Cook. Cross-validation of
American in Ten Tells the Other Nine How to Vote, regression models. Journal of the American Statistical
Where to Eat, and What to Buy. Free Press, New Association, 79(387):575–583, 1984.
York, NY, 2003. [26] E. M. Rogers. Diffusion of innovations. Free Press,
[16] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing New York, 4th edition, 1995.
the spread of influence through a social network. In [27] V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get
9th ACM SIGKDD International Conference on another label? improving data quality and data
Knowledge Discovery and Data Mining, Washington, mining using multiple, noisy labelers. pages 614–622.
DC, USA., 2003. Association of Computing Machinery. ACM, 2008.
[17] A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user [28] R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng.
studies with mechanical turk. Proceedings of the Cheap and fast-but is it good? evaluating non-expert
twenty-sixth annual SIGCHI conference on Human annotations for natural language tasks, 2008.
factors in computing systems, 2008. [29] E. S. Sun, I. Rosenn, C. A. Marlow, and T. M. Lento.
[18] H. Kwak, C. Lee, H. Park, and S. Moon. What is Gesundheit! modeling contagion through facebook
twitter, a social network or a news media? pages news feed. In International Conference on Weblogs
591–600. ACM, 2010. and Social Media, San Jose, CA, 2009. AAAI.
[19] A. Leavitt, E. Burchard, D. Fisher, and S. Gilbert. [30] B. Tomlinson and C. Cockram. Sars: Experience at
The influentials: New approaches for analyzing prince of wales hospital, hong kong. The Lancet,
influence on twitter. 361(9368):1486–1487, 2003.
[20] J. Leskovec, A. Adamic, Lada, and A. Huberman, [31] D. J. Watts. A simple model of information cascades
Bernardo. The dynamics of viral marketing. ACM on random networks. Proceedings of the National
Trans. Web, 1(1):5, 2007. Academy of Science, U.S.A., 99:5766–5771, 2002.
[21] D. Liben-Nowell and J. Kleinberg. Tracing [32] D. J. Watts and P. S. Dodds. Influentials, networks,
information flow on a global scale using internet and public opinion formation. Journal of Consumer
chain-letter data. Proceedings of the National Academy Research, 34:441–458, 2007.
of Sciences, 105(12):4633, 2008. [33] D. J. Watts and J. Peretti. Viral marketing for the real
[22] W. Mason and S. Suri. Conducting Behavioral world. Harvard Business Review, May:22–23, 2007.
Research on Amazon’s Mechanical Turk. SSRN [34] G. Weimann. The Influentials: People Who Influence
eLibrary, 2010. People. State University of New York Press, Albany,
[23] W. Mason and D. J. Watts. Financial incentives and NY, 1994.
the performance of crowds. Proceedings of the ACM [35] J. Weng, E. P. Lim, J. Jiang, and Q. He. Twitterrank:
SIGKDD Workshop on Human Computation, pages finding topic-sensitive influential twitterers. pages
77–85, 2009. 261–270. ACM, 2010.