Dickinson, Thomas; Fernandez, Miriam; Thomas, Lisa; Mulholland, Paul; Briggs, Pam and Alani, Harith (2016). Detecting Important Life Events on Twitter Using Frequent Semantic and Syntactic Subgraphs. IADIS International Journal on WWW/Internet, 14(2) pp. 23–37.
http://oro.open.ac.uk/48678/
This document discusses scalable learning of collective behavior from social media data. It proposes an edge-centric approach to extract sparse social dimensions to address the scalability issues of existing methods. Existing methods extract dense social dimensions that cannot scale to networks with millions of actors. The proposed approach guarantees sparsity in the extracted social dimensions. This allows efficient handling of large real-world networks while maintaining comparable prediction performance to other methods.
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...BAINIDA
Subscriber Churn Prediction Model using Social Network Analysis In Telecommunication Industry โดย เชษฐพงศ์ ปัญญาชนกุล อาจารย์ ดร. อานนท์ ศักดิ์วรวิชญ์
ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
Social network analysis & Big Data - Telecommunications and moreWael Elrifai
Social Network Analysis: Practical Uses and Implementation is a presentation that discusses social network analysis and its uses. It covers key topics such as defining social networks and social network analysis, why social network analysis is important, identifying influencers in social networks, roles in social networks, graph theory concepts used in social network analysis, calculating metrics from social networks, and recommended approaches to social network analysis. The presentation provides an overview of social network analysis concepts and their practical applications.
Big social data analytics - social network analysis Jari Jussila
This document discusses social network analysis and visualization of Twitter data using tools like Gephi. It provides steps to collect Twitter data using an API script, create a network file from the data, and calculate network metrics and visualize the network in Gephi. Key aspects covered include extracting tweet data, creating a network file with NetworkX, uploading files to PythonAnywhere to run the script, and analyzing and visualizing the resulting network in Gephi to understand information diffusion on Twitter.
Icwsm10 S MateiVisible Effort: A Social Entropy Methodology for Managing Com...guest803e6d
A theoretically-grounded learning feedback tool suite, the Visible Effort (VE) Mediawiki extension, is proposed for optimizing online group learning activities by measuring the amount of equality and the emergence of social structure in groups that participate in Computer-Mediated Collaboration (CMC). Building on social entropy theory, drawn from Shannon’s Mathematical Theory of Communication, VE captures levels of CMC unevenness and group structure and visualizes them on wiki Web pages through background colors, charts, and tabular data. Visual information provides users entropic feedback on how balanced and equitable collaboration is within their online group are, while helping them to maintain it within optimal levels. Finally, we present the theoretical and practical implications of VE and the measures behind it, as well as illustrate VE’s capabilities by describing a quasi-experimental teaching activity (use scenario) in tandem with a detailed discussion of theoretical justification, methodological underpinning, and technological capabilities of the approach.
E-government research is an interdisciplinary field that examines how public administration uses information technology to deliver public services and engage citizens. It analyzes concepts like digital, mobile, and open government. E-government research draws from disciplines like public administration, information science, and computer science. The scope has expanded from a focus on technology use to its impact on public policy and governance. There is a growing body of literature in the field indexed in the E-Government Reference Library, though the field lacks unified theories. Topics of research include e-services, e-participation, and the use of technologies like social media and open data in government.
Social Network Analysis: applications for education researchChristian Bokhove
What is your talk about?
This seminar will illustrate various social network analysis (SNA) techniques and measures and their applications to research problems in education. These applications will be illustrated from our own research utilising a range of SNA techniques.
What are the key messages of your talk?
We will cover some of the ways in which network data can be collected and utilised with other research data to examine the relationships between network measures and other attributes of individuals and organisations, and how it can be linked to other approaches in multiple methods studies.
What are the implications for practice or research from your talk?
SNA is an approach that draws from theories of social capital to study the relational ties that exist between actors or institutions in a specific context. Such ties might include learning exchanges or advice-seeking interactions. SNA techniques allow researchers to incorporate the interdependence of participants within their research questions, whereas many traditional techniques assume our participants, and their responses to our questions, are independent of one another.
More than ever, we need to learn how to harness the power of networks to tackle the complex issues we're facing as a society. Here's a quick guide to the basics of social network analysis.
Interested? Sign up at http://kumu.io
This document discusses scalable learning of collective behavior from social media data. It proposes an edge-centric approach to extract sparse social dimensions to address the scalability issues of existing methods. Existing methods extract dense social dimensions that cannot scale to networks with millions of actors. The proposed approach guarantees sparsity in the extracted social dimensions. This allows efficient handling of large real-world networks while maintaining comparable prediction performance to other methods.
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...BAINIDA
Subscriber Churn Prediction Model using Social Network Analysis In Telecommunication Industry โดย เชษฐพงศ์ ปัญญาชนกุล อาจารย์ ดร. อานนท์ ศักดิ์วรวิชญ์
ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
Social network analysis & Big Data - Telecommunications and moreWael Elrifai
Social Network Analysis: Practical Uses and Implementation is a presentation that discusses social network analysis and its uses. It covers key topics such as defining social networks and social network analysis, why social network analysis is important, identifying influencers in social networks, roles in social networks, graph theory concepts used in social network analysis, calculating metrics from social networks, and recommended approaches to social network analysis. The presentation provides an overview of social network analysis concepts and their practical applications.
Big social data analytics - social network analysis Jari Jussila
This document discusses social network analysis and visualization of Twitter data using tools like Gephi. It provides steps to collect Twitter data using an API script, create a network file from the data, and calculate network metrics and visualize the network in Gephi. Key aspects covered include extracting tweet data, creating a network file with NetworkX, uploading files to PythonAnywhere to run the script, and analyzing and visualizing the resulting network in Gephi to understand information diffusion on Twitter.
Icwsm10 S MateiVisible Effort: A Social Entropy Methodology for Managing Com...guest803e6d
A theoretically-grounded learning feedback tool suite, the Visible Effort (VE) Mediawiki extension, is proposed for optimizing online group learning activities by measuring the amount of equality and the emergence of social structure in groups that participate in Computer-Mediated Collaboration (CMC). Building on social entropy theory, drawn from Shannon’s Mathematical Theory of Communication, VE captures levels of CMC unevenness and group structure and visualizes them on wiki Web pages through background colors, charts, and tabular data. Visual information provides users entropic feedback on how balanced and equitable collaboration is within their online group are, while helping them to maintain it within optimal levels. Finally, we present the theoretical and practical implications of VE and the measures behind it, as well as illustrate VE’s capabilities by describing a quasi-experimental teaching activity (use scenario) in tandem with a detailed discussion of theoretical justification, methodological underpinning, and technological capabilities of the approach.
E-government research is an interdisciplinary field that examines how public administration uses information technology to deliver public services and engage citizens. It analyzes concepts like digital, mobile, and open government. E-government research draws from disciplines like public administration, information science, and computer science. The scope has expanded from a focus on technology use to its impact on public policy and governance. There is a growing body of literature in the field indexed in the E-Government Reference Library, though the field lacks unified theories. Topics of research include e-services, e-participation, and the use of technologies like social media and open data in government.
Social Network Analysis: applications for education researchChristian Bokhove
What is your talk about?
This seminar will illustrate various social network analysis (SNA) techniques and measures and their applications to research problems in education. These applications will be illustrated from our own research utilising a range of SNA techniques.
What are the key messages of your talk?
We will cover some of the ways in which network data can be collected and utilised with other research data to examine the relationships between network measures and other attributes of individuals and organisations, and how it can be linked to other approaches in multiple methods studies.
What are the implications for practice or research from your talk?
SNA is an approach that draws from theories of social capital to study the relational ties that exist between actors or institutions in a specific context. Such ties might include learning exchanges or advice-seeking interactions. SNA techniques allow researchers to incorporate the interdependence of participants within their research questions, whereas many traditional techniques assume our participants, and their responses to our questions, are independent of one another.
More than ever, we need to learn how to harness the power of networks to tackle the complex issues we're facing as a society. Here's a quick guide to the basics of social network analysis.
Interested? Sign up at http://kumu.io
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
This document provides an overview of social network analysis, including what social networks are, what can be learned from analyzing social networks, and how social network analysis can be performed. Some key findings that can be uncovered include the six degrees of separation principle, the 80-20 rule of social popularity where a minority of nodes have most connections, how to identify influential nodes, and how to group similar nodes into communities. Various metrics and models are described for analyzing features like path lengths, degree distributions, ranking nodes, measuring community structure, and more. Examples of social network analysis are also provided.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
Social Network Analysis Introduction including Data Structure Graph overview. Given in Cincinnati August 18th 2015 as part of the DataSeed Meetup group.
The science of networks is becoming an increasingly important and intriguing area of study that reveals many a patterns and relationships often hidden. This presentation is about the use of SNA to study the network of the Digital Library Community
Social Network Analysis Workshop
This talk will be a workshop featuring an overview of basic theory and methods for social network analysis and an introduction to igraph. The first half of the talk will be a discussion of the concepts and the second half will feature code examples and demonstrations.
Igraph is a package in R, Python, and C++ that supports social network analysis and network data visualization.
Ian McCulloh holds joint appointments as a Parson’s Fellow in the Bloomberg School of Public health, a Senior Lecturer in the Whiting School of Engineering and a senior scientist at the Applied Physics Lab, at Johns Hopkins University. His current research is focused on strategic influence in online networks. His most recent papers have been focused on the neuroscience of persuasion and measuring influence in online social media firestorms. He is the author of “Social Network Analysis with Applications” (Wiley: 2013), “Networks Over Time” (Oxford: forthcoming) and has published 48 peer-reviewed papers, primarily in the area of social network analysis. His current applied work is focused on educating soldiers and marines in advanced methods for open source research and data science leadership.
More information about Dr. Ian McCulloh's work can be found at https://ep.jhu.edu/about-us/faculty-directory/1511-ian-mcculloh
This workshop will introduce some of the main principles and techniques of Social Network Analysis (SNA). We will use examples from organizational and social media-based networks to understand concepts such as network density, diameter, centrality measures, community detection algorithms, etc. The session will also introduce Gephi, a popular program for SNA. Gephi is a free and open-source tool that is available for both Mac and PC computers.
By the end of the session, you will develop a general understanding of what SNA is, what research questions it can help you answer, and how it can be applied to your own research. You will also learn how to use Gephi to visualize and examine networks using various layout and community detection algorithms.
Instructor’s Bio: Dr. Anatoliy Gruzd is a Canada Research Chair in Social Media Data Stewardship, Associate Professor at the Ted Rogers School of Management at Ryerson University, and Director of Research at the Social Media Lab. Anatoliy is also a Member of the Royal Society of Canada’s College of New Scholars, Artists and Scientists; a co-editor of a multidisciplinary journal on Big Data and Society; and a founding co-chair of the International Conference on Social Media and Society. His research initiatives explore how social media platforms are changing the ways in which people and organizations communicate, collaborate and disseminate information and how these changes impact the norms and structures of modern society.
What role do “power learners” play in online learning communities?@cristobalcobo
This study focusses on the role of highly active participants in online learning communities on Facebook. These people, often known as “power users” in the literature on social computing, are a common feature of a wide range of online learning groups, and are responsible not only for creating most of the content but also for getting discussion going and providing a basis for other’s participation. We test whether similar dynamics hold true in the context of online learning.
Based on a transactional dataset of almost 10,000 interactions with an online community of 32 postgraduate students who were following the same online course, we find evidence that power users also exist in the context of online learning. However, whilst they do create a lot of content, we find that they are not fundamental to keeping the group together, and in fact are less adept at creating content which generates responses than other “normal” users. This suggests that online learning communities may have different dynamics to other types of electronic community: it also suggests that design efforts should not be focused solely on attracting a small core of “power learners”. Rather, diverse types of users are needed for online learning communities to survive and prosper.
Authors:
Cristóbal Cobo, Center for Research - Ceibal Foundation, Uruguay
Monica Bulger, Data & Society Research Institute, United States
Jonathan Bright, Oxford Internet Institute, United Kingdom
Ryan den Rooijen, Oxford Internet Institute, United Kingdom
Presented at the LINC Conference (MIT, 2016) Digital Inclusion: Transforming Education through Technology.
Prof. Hendrik Speck - Social Network AnalysisHendrik Speck
The document discusses social networks and social network analysis. It provides definitions of social networks and describes how they can be visualized and analyzed. It also gives examples of popular social networks like MySpace and Facebook, and how people use different social media. Network analysis concepts like degrees of separation, Dunbar's number, and centrality measures are also introduced.
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
Social Network Analysis (SNA) and its implications for knowledge discovery in Informal Networks- Talk by Dr Jai Ganesh, SETLabs, Infosys at Search and Social Platforms tutorial, as part of Compute 2009, ACM Bangalore
Social Media Analysis: Present and Futurematthewhurst
Current social media analysis separates content and structure analysis, missing opportunities. A unified theory is needed to better analyze social media data by considering both content and structure together. While current applications focus on marketing, a broader business intelligence approach could expand the field. Developing richer models of business domains and processes would increase the actionability and market for social media analysis tools.
This document provides an overview of social network analysis (SNA) including concepts, methods, and applications. It begins with background on how SNA originated from social science and network analysis/graph theory. Key concepts discussed include representing social networks as graphs, identifying strong and weak ties, central nodes, and network cohesion. Practical applications of SNA are also outlined, such as in business, law enforcement, and social media sites. The document concludes by recommending when and why to use SNA.
2013 NodeXL Social Media Network AnalysisMarc Smith
Social media network analysis and visualization with NodeXL - the network overview discovery and exploration add-in for Excel. Map Twitter, Facebook, email, blogs, and the web with a point and click interface within the familiar spreadsheet.
Introduction to Social Network AnalysisPatti Anklam
This document provides an overview of network analysis and its applications. It discusses the origins and history of network study in fields like graph theory and sociology. Various network patterns and metrics are described, including density, distance, centrality, and structural measures. Case studies are presented on using network analysis to understand expertise management, trust, and performance issues in organizations. The document emphasizes that network analysis can provide insights through metrics and visualization to inform important business and organizational questions.
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisIJERA Editor
Fast and Appropriate Social Network Analysis (SNA) tools ,techniques, are required to collect and classify
opinion scores on social networksites , as a grouping on wrong opinion may create problems for a society or
country . Social Network Analysis (SNA) is popular means for researcher as the number of users and groups
increasing day by day on that social sites , and a large group may influence other.In this paper, we
recommendhybrid model of opinion recommendation systems, for single user and for collective community
respectively, formed on social liking and influence network theory. By collecting thedata of user social networks
and preferenceslike, we designed aimproved hybrid prototype to imitate the social influence by like and sharing
the information among groups.The significance of this paper to analyze the suitability of ANN and Fuzzy sets
method in a hybrid manner for social web sites classifications, First, we intend to use Artificial Neural
Network(ANN)techniques in social media data classification by using some contemporary methods different
than the conventional methods of statistics and data analysis, in next we want to propagate the fuzzy approach
as a way to overcome the uncertainity that is always present in social media analysis . We give a brief overview
of the main ideas and recent results of social networks analysis , and we point to relationships between the two
social network analysis and classification approaches .This researchsuggests a hybrid classification model build
on fuzzy and artificial neural network (HFANN). Information Gain and three popular social sites are used to
collect information depicting features that are then used to train and test the proposed methods . This neoteric
approach combines the advantages of ANN and Fuzzy sets in classification accuracy with utilizing social data
and knowledge base available in the hate lexicons.
This document discusses social network analysis and its applications. It defines a social network as being composed of actors (people or groups) connected by social relationships. Social network analysis can be used to map these relationships visually using sociograms, understand information flow and community structure, and identify influential actors through metrics like centrality and betweenness. Tools like NodeXL and Gephi enable network extraction, visualization, and analysis to glean strategic insights from social networks.
The document describes a method for analyzing online social networks using actor-based network analysis and tenses predicate logic. It discusses classifying users into different actor types like connectors, mavens, and salesmen based on their behaviors and roles in spreading information. The method is demonstrated on a large Twitter dataset to analyze how the #KONY2012 topic spread. Future work is discussed to expand the approach with semantic analysis and develop an ontology to integrate actor and event analysis.
An introduction in the world of Social Network Analysis and a view on how this may help learning networks. History, data collection and several analysis techniques are shown.
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET Journal
This document describes a study that uses community detection models to identify prevalent news topics discussed on both Twitter and traditional media like BBC. It collects tweets and news articles about sports over a one-month period. Keywords are extracted from the data and a graph is constructed to represent relationships between words. Three community detection models - Girvan-Newman clustering, CLIQUE, and Louvain - are used to cluster similar content and detect communities of keywords representing news topics. The number of unique Twitter users engaged with each topic is also calculated to rank topics by user attention. The goal is to analyze how information is distributed between social and traditional media and identify emerging topics with low coverage in traditional sources.
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
Researching Social Media – Big Data and Social Media Analysis, presentation for the Social Media for Researchers: A Sheffield Universities Social Media Symposium, 23 September 2014
MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating So...Carlton Northern
This document presents an unsupervised approach to discover and disambiguate social media profiles for a large group of individuals, such as employees or university students. The approach uses a combination of search engine queries, semantic web queries, and directly polling social media sites to discover potential profiles. It then applies heuristics involving keyword matching, community structure analysis, and extracting semantic and profile features to disambiguate the true profiles from false positives. The approach was tested on a set of 2016 university computer science student logins, achieving a precision of 0.863 and F-measure of 0.654 at discovering their real social media profiles from a ground truth data set.
Practical Applications for Social Network Analysis in Public Sector Marketing...Mike Kujawski
This document provides an overview of a presentation on practical applications of social network analysis. It discusses the growth of social data, defines social network analysis, and provides several use cases. It then outlines the presentation topics which include basics of reading sociograms, refining data, and applying SNA to public sector marketing. Examples of SNA applications to specific organizations are provided. Both free and paid tools for conducting SNA are also mentioned.
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...Susan Campos
This document summarizes a literature analysis of social information contribution and consumption on social networking sites. It identifies 126 relevant articles published between 2008 and 2014. The analysis finds that most existing work focuses on social information contribution, its antecedents and favorable outcomes. Only a few studies examine contribution behaviors and the downsides of social networking site use. The analysis also categorizes the limited papers on consumption behaviors based on the characteristics of social information and identifies different underlying processes like social comparison, monitoring and browsing. The findings consolidate knowledge about social networking site usage patterns and individual well-being.
Big data analysis of news and social media contentFiras Husseini
This document summarizes research from the Intelligent Systems Laboratory at the University of Bristol on analyzing large amounts of news and social media content using computational methods. It discusses several studies, including analyzing over 400 million tweets to track public mood in the UK, extracting narrative networks from over 125,000 news articles about the 2012 US elections, comparing differences between news outlets and their topics/writing styles using machine learning, modeling the EU news media network using clustering and translation techniques, and predicting popular news articles based on their content. The research demonstrates how computational social science can reveal patterns in large datasets that were previously impossible to analyze at scale.
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
This document provides an overview of social network analysis, including what social networks are, what can be learned from analyzing social networks, and how social network analysis can be performed. Some key findings that can be uncovered include the six degrees of separation principle, the 80-20 rule of social popularity where a minority of nodes have most connections, how to identify influential nodes, and how to group similar nodes into communities. Various metrics and models are described for analyzing features like path lengths, degree distributions, ranking nodes, measuring community structure, and more. Examples of social network analysis are also provided.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
Social Network Analysis Introduction including Data Structure Graph overview. Given in Cincinnati August 18th 2015 as part of the DataSeed Meetup group.
The science of networks is becoming an increasingly important and intriguing area of study that reveals many a patterns and relationships often hidden. This presentation is about the use of SNA to study the network of the Digital Library Community
Social Network Analysis Workshop
This talk will be a workshop featuring an overview of basic theory and methods for social network analysis and an introduction to igraph. The first half of the talk will be a discussion of the concepts and the second half will feature code examples and demonstrations.
Igraph is a package in R, Python, and C++ that supports social network analysis and network data visualization.
Ian McCulloh holds joint appointments as a Parson’s Fellow in the Bloomberg School of Public health, a Senior Lecturer in the Whiting School of Engineering and a senior scientist at the Applied Physics Lab, at Johns Hopkins University. His current research is focused on strategic influence in online networks. His most recent papers have been focused on the neuroscience of persuasion and measuring influence in online social media firestorms. He is the author of “Social Network Analysis with Applications” (Wiley: 2013), “Networks Over Time” (Oxford: forthcoming) and has published 48 peer-reviewed papers, primarily in the area of social network analysis. His current applied work is focused on educating soldiers and marines in advanced methods for open source research and data science leadership.
More information about Dr. Ian McCulloh's work can be found at https://ep.jhu.edu/about-us/faculty-directory/1511-ian-mcculloh
This workshop will introduce some of the main principles and techniques of Social Network Analysis (SNA). We will use examples from organizational and social media-based networks to understand concepts such as network density, diameter, centrality measures, community detection algorithms, etc. The session will also introduce Gephi, a popular program for SNA. Gephi is a free and open-source tool that is available for both Mac and PC computers.
By the end of the session, you will develop a general understanding of what SNA is, what research questions it can help you answer, and how it can be applied to your own research. You will also learn how to use Gephi to visualize and examine networks using various layout and community detection algorithms.
Instructor’s Bio: Dr. Anatoliy Gruzd is a Canada Research Chair in Social Media Data Stewardship, Associate Professor at the Ted Rogers School of Management at Ryerson University, and Director of Research at the Social Media Lab. Anatoliy is also a Member of the Royal Society of Canada’s College of New Scholars, Artists and Scientists; a co-editor of a multidisciplinary journal on Big Data and Society; and a founding co-chair of the International Conference on Social Media and Society. His research initiatives explore how social media platforms are changing the ways in which people and organizations communicate, collaborate and disseminate information and how these changes impact the norms and structures of modern society.
What role do “power learners” play in online learning communities?@cristobalcobo
This study focusses on the role of highly active participants in online learning communities on Facebook. These people, often known as “power users” in the literature on social computing, are a common feature of a wide range of online learning groups, and are responsible not only for creating most of the content but also for getting discussion going and providing a basis for other’s participation. We test whether similar dynamics hold true in the context of online learning.
Based on a transactional dataset of almost 10,000 interactions with an online community of 32 postgraduate students who were following the same online course, we find evidence that power users also exist in the context of online learning. However, whilst they do create a lot of content, we find that they are not fundamental to keeping the group together, and in fact are less adept at creating content which generates responses than other “normal” users. This suggests that online learning communities may have different dynamics to other types of electronic community: it also suggests that design efforts should not be focused solely on attracting a small core of “power learners”. Rather, diverse types of users are needed for online learning communities to survive and prosper.
Authors:
Cristóbal Cobo, Center for Research - Ceibal Foundation, Uruguay
Monica Bulger, Data & Society Research Institute, United States
Jonathan Bright, Oxford Internet Institute, United Kingdom
Ryan den Rooijen, Oxford Internet Institute, United Kingdom
Presented at the LINC Conference (MIT, 2016) Digital Inclusion: Transforming Education through Technology.
Prof. Hendrik Speck - Social Network AnalysisHendrik Speck
The document discusses social networks and social network analysis. It provides definitions of social networks and describes how they can be visualized and analyzed. It also gives examples of popular social networks like MySpace and Facebook, and how people use different social media. Network analysis concepts like degrees of separation, Dunbar's number, and centrality measures are also introduced.
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
Social Network Analysis (SNA) and its implications for knowledge discovery in Informal Networks- Talk by Dr Jai Ganesh, SETLabs, Infosys at Search and Social Platforms tutorial, as part of Compute 2009, ACM Bangalore
Social Media Analysis: Present and Futurematthewhurst
Current social media analysis separates content and structure analysis, missing opportunities. A unified theory is needed to better analyze social media data by considering both content and structure together. While current applications focus on marketing, a broader business intelligence approach could expand the field. Developing richer models of business domains and processes would increase the actionability and market for social media analysis tools.
This document provides an overview of social network analysis (SNA) including concepts, methods, and applications. It begins with background on how SNA originated from social science and network analysis/graph theory. Key concepts discussed include representing social networks as graphs, identifying strong and weak ties, central nodes, and network cohesion. Practical applications of SNA are also outlined, such as in business, law enforcement, and social media sites. The document concludes by recommending when and why to use SNA.
2013 NodeXL Social Media Network AnalysisMarc Smith
Social media network analysis and visualization with NodeXL - the network overview discovery and exploration add-in for Excel. Map Twitter, Facebook, email, blogs, and the web with a point and click interface within the familiar spreadsheet.
Introduction to Social Network AnalysisPatti Anklam
This document provides an overview of network analysis and its applications. It discusses the origins and history of network study in fields like graph theory and sociology. Various network patterns and metrics are described, including density, distance, centrality, and structural measures. Case studies are presented on using network analysis to understand expertise management, trust, and performance issues in organizations. The document emphasizes that network analysis can provide insights through metrics and visualization to inform important business and organizational questions.
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisIJERA Editor
Fast and Appropriate Social Network Analysis (SNA) tools ,techniques, are required to collect and classify
opinion scores on social networksites , as a grouping on wrong opinion may create problems for a society or
country . Social Network Analysis (SNA) is popular means for researcher as the number of users and groups
increasing day by day on that social sites , and a large group may influence other.In this paper, we
recommendhybrid model of opinion recommendation systems, for single user and for collective community
respectively, formed on social liking and influence network theory. By collecting thedata of user social networks
and preferenceslike, we designed aimproved hybrid prototype to imitate the social influence by like and sharing
the information among groups.The significance of this paper to analyze the suitability of ANN and Fuzzy sets
method in a hybrid manner for social web sites classifications, First, we intend to use Artificial Neural
Network(ANN)techniques in social media data classification by using some contemporary methods different
than the conventional methods of statistics and data analysis, in next we want to propagate the fuzzy approach
as a way to overcome the uncertainity that is always present in social media analysis . We give a brief overview
of the main ideas and recent results of social networks analysis , and we point to relationships between the two
social network analysis and classification approaches .This researchsuggests a hybrid classification model build
on fuzzy and artificial neural network (HFANN). Information Gain and three popular social sites are used to
collect information depicting features that are then used to train and test the proposed methods . This neoteric
approach combines the advantages of ANN and Fuzzy sets in classification accuracy with utilizing social data
and knowledge base available in the hate lexicons.
This document discusses social network analysis and its applications. It defines a social network as being composed of actors (people or groups) connected by social relationships. Social network analysis can be used to map these relationships visually using sociograms, understand information flow and community structure, and identify influential actors through metrics like centrality and betweenness. Tools like NodeXL and Gephi enable network extraction, visualization, and analysis to glean strategic insights from social networks.
The document describes a method for analyzing online social networks using actor-based network analysis and tenses predicate logic. It discusses classifying users into different actor types like connectors, mavens, and salesmen based on their behaviors and roles in spreading information. The method is demonstrated on a large Twitter dataset to analyze how the #KONY2012 topic spread. Future work is discussed to expand the approach with semantic analysis and develop an ontology to integrate actor and event analysis.
An introduction in the world of Social Network Analysis and a view on how this may help learning networks. History, data collection and several analysis techniques are shown.
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET Journal
This document describes a study that uses community detection models to identify prevalent news topics discussed on both Twitter and traditional media like BBC. It collects tweets and news articles about sports over a one-month period. Keywords are extracted from the data and a graph is constructed to represent relationships between words. Three community detection models - Girvan-Newman clustering, CLIQUE, and Louvain - are used to cluster similar content and detect communities of keywords representing news topics. The number of unique Twitter users engaged with each topic is also calculated to rank topics by user attention. The goal is to analyze how information is distributed between social and traditional media and identify emerging topics with low coverage in traditional sources.
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
Researching Social Media – Big Data and Social Media Analysis, presentation for the Social Media for Researchers: A Sheffield Universities Social Media Symposium, 23 September 2014
MDS 2011 Paper: An Unsupervised Approach to Discovering and Disambiguating So...Carlton Northern
This document presents an unsupervised approach to discover and disambiguate social media profiles for a large group of individuals, such as employees or university students. The approach uses a combination of search engine queries, semantic web queries, and directly polling social media sites to discover potential profiles. It then applies heuristics involving keyword matching, community structure analysis, and extracting semantic and profile features to disambiguate the true profiles from false positives. The approach was tested on a set of 2016 university computer science student logins, achieving a precision of 0.863 and F-measure of 0.654 at discovering their real social media profiles from a ground truth data set.
Practical Applications for Social Network Analysis in Public Sector Marketing...Mike Kujawski
This document provides an overview of a presentation on practical applications of social network analysis. It discusses the growth of social data, defines social network analysis, and provides several use cases. It then outlines the presentation topics which include basics of reading sociograms, refining data, and applying SNA to public sector marketing. Examples of SNA applications to specific organizations are provided. Both free and paid tools for conducting SNA are also mentioned.
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...Susan Campos
This document summarizes a literature analysis of social information contribution and consumption on social networking sites. It identifies 126 relevant articles published between 2008 and 2014. The analysis finds that most existing work focuses on social information contribution, its antecedents and favorable outcomes. Only a few studies examine contribution behaviors and the downsides of social networking site use. The analysis also categorizes the limited papers on consumption behaviors based on the characteristics of social information and identifies different underlying processes like social comparison, monitoring and browsing. The findings consolidate knowledge about social networking site usage patterns and individual well-being.
Big data analysis of news and social media contentFiras Husseini
This document summarizes research from the Intelligent Systems Laboratory at the University of Bristol on analyzing large amounts of news and social media content using computational methods. It discusses several studies, including analyzing over 400 million tweets to track public mood in the UK, extracting narrative networks from over 125,000 news articles about the 2012 US elections, comparing differences between news outlets and their topics/writing styles using machine learning, modeling the EU news media network using clustering and translation techniques, and predicting popular news articles based on their content. The research demonstrates how computational social science can reveal patterns in large datasets that were previously impossible to analyze at scale.
Objectification Is A Word That Has Many Negative ConnotationsBeth Johnson
Here is an introduction to social web mining and big data:
Social web mining is the process of extracting useful information and knowledge from social media data. With the rise of big data, social media platforms are generating massive amounts of unstructured data every day in the form of posts, comments, shares, likes, etc. This user-generated data holds valuable insights about people's opinions, interests, behaviors and more.
Big data analytics provides tools and techniques to analyze this large, complex social data at scale. Social web mining applies data mining and machine learning algorithms to big social data to discover patterns and relationships. Areas of focus include sentiment analysis to understand public opinions on brands, products or issues; network analysis to map relationships and influence; and
Meliorating usable document density for online event detectionIJICTJOURNAL
Online event detection (OED) has seen a rise in the research community as it can provide quick identification of possible events happening at times in the world. Through these systems, potential events can be indicated well before they are reported by the news media, by grouping similar documents shared over social media by users. Most OED systems use textual similarities for this purpose. Similar documents, that may indicate a potential event, are further strengthened by the replies made by other users, thereby improving the potentiality of the group. However, these documents are at times unusable as independent documents, as they may replace previously appeared noun phrases with pronouns, leading OED systems to fail while grouping these replies to their suitable clusters. In this paper, a pronoun resolution system that tries to replace pronouns with relevant nouns over social media data is proposed. Results show significant improvement in performance using the proposed system.
CVPSales price per unit$75.00Variable Cost per unit$67.00Fixed C.docxdorishigh
CVPSales price per unit$75.00*Variable Cost per unit$67.00*Fixed Cost$100,000.00*Targeted Net Income$0.00*(assume 0 if you want to calculate breakeven)Calculated Volume12,500calculated* inputted by user
Social Networking Channels
Thomas Lamonte Esters
Independence University
29 September 2018
SOCIAL NETWORKING CHANNELS 1
I dislike social networking sites because of the dangerous hazards connected to it.
The ProCon article vividly describes the numerous benefits that are attached to the social networking sites such as connecting people, enhancing advertising and marketing, promoting research and education, assisting to spread information faster as compared to other media, connecting employers and employees and assisting the government to identify and prosecute criminals. These are just a few examples that the article illustrates to support the necessity of the social networking sites in the society today. According to the article, the social networking channels have significantly transformed different sectors such as businesses for the better since they can sell their products and services globally (Procon.org, 2018).
However, the detrimental effects connected with the social networking channels are also numerous and most of them may lead to permanent damage to our lives. It is very clear that the education is the backbone of our lives and also the key to success. Currently, about 69% of the American population use social media channels which is a drastic increase in the usage from 2008 where about 26% of the Americans were connected to the social media (Procon.org, 2018). Most of the social networking sites users are the youths who are in their lower grade level, colleges or even universities. The research shows that using social media when handling assignments decreases the quality of work and makes the students drop in their performance. Education is a core value to a successful life and allowing social media to intrude in the academics will be detrimental since it will lead to the production of incompetent individuals who may end up causing problems in the society (Rowell, 2015).
Moreover, the social media channels expose individuals’ to privacy problems and intrusion by any interested parties. In fact, nothing which is shared in the social media channels is private. According to the survey conducted, 81% of the people surveyed believed that social media is insecure. The government through the NSA (National Security Agencies) intrudes to people’s data and communication in social media meaning that their private information ends up in the hands of the government. Many people do not know about social media privacy settings and this means that they leave their social media accounts prone to invasion (Procon.org, 2018). Viruses such as Steck. Evl can also be propagated via the social media to cause harm to the users. Most of these viruses are spies and send users priv.
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOKIJwest
The 21st century has been characterized by an increased attention to social networks. Nowadays, going 24 hours without getting in touch with them in some way has become difficult. Facebook and Twitter, these social platforms are now part of everyday life. Thus, these social networks have become important sources to be aware of frequently discussed topics or public opinions on a current issue. A lot of people write messages about current events, give their opinion on any topic and discuss social issues more and more.
This document presents research on classifying tweets from crisis datasets into informative vs non-informative categories and humanitarian categories using multimodal approaches. The researchers analyzed text and image data from crisis datasets using models like BERT and CNNs. They then proposed a technique to fuse text and image features to classify tweets. Their multimodal fusion approach outperformed text-only and image-only baselines, demonstrating the benefit of combining modalities for crisis response assessment from social media data.
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK dannyijwest
The 21st century has been characterized by an increased attention to social networks. Nowadays, going 24
hours without getting in touch with them in some way has become difficult. Facebook and Twitter, these
social platforms are now part of everyday life. Thus, these social networks have become important sources
to be aware of frequently discussed topics or public opinions on a current issue. A lot of people write
messages about current events, give their opinion on any topic and discuss social issues more and more.
2. Brandtzæg, P.B. (2012). Social networking sites: their users and social implications – a longitudinal study. Journal of Computer-Mediated Communication, 17 (4), 467-488
This document summarizes a study of crisis informatics and the challenges of studying crisis events in a networked world. The researchers conducted a case study of the 2007 Virginia Tech shooting crisis through both on-site interviews and analysis of online activity. They traced information dissemination across multiple online platforms and timelines to map the digital and physical responses. Key challenges included the dispersed and emergent nature of online information, determining what sites to study, and integrating digital and in-person data.
NG2S: A Study of Pro-Environmental Tipping Point via ABMsKan Yuenyong
A study of tipping point: much less is known about the most efficient ways to reach such transitions or how self-reinforcing systemic transformations might be instigated through policy. We employ an agent-based model to study the emergence of social tipping points through various feedback loops that have been previously identified to constitute an ecological approach to human behavior. Our model suggests that even a linear introduction of pro-environmental affordances (action opportunities) to a social system can have non-linear positive effects on the emergence of collective pro-environmental behavior patterns.
This document summarizes a study that compares the use of Facebook Groups and Causes as tools for cyberactivism in Chile and the Concepción area. The study aimed to determine if these applications are used for cyberactivism related to real-world issues and places in Chile. It also sought to identify trends in topics of discussion and whether there are correlations between topics discussed nationally and locally. The main findings were that Facebook is mainly used to strengthen existing social ties rather than create new connections, and that it allows for distributed organization without centralized control of information.
Suazo%2c martínez & elgueta english version2011990
This document summarizes a study that analyzed the use of Facebook Groups and Causes as tools for cyberactivism in Chile and the Concepción area. The study found that social networks mainly strengthen existing social ties rather than create new connections. It also found that Facebook's horizontal structure and lack of centralized control enables strong information sharing. Finally, the study aimed to determine if Groups or Causes were better for generating cyberactivism campaigns and setting independent social media agendas.
OntoSOC: S ociocultural K nowledge O ntology IJwest
This paper
present
s
a
sociocultural knowledge ontology (OntoSOC) modeling appro
a
ch. Ont
o-
SOC modeling appro
a
ch is based on Engeström‟s
Human Activity Theory (HAT)
.
That Theory allowed us
to identify fundamental concepts and rel
a
tionshi
ps between them. The top
-
down precess has been used to
d
efine differents sub
-
concepts. The
modeled vocabulary permits us to organise data, to facilitate in
form
a-
tion retrieval
by introducing a semantic layer in social web platform architec
ture,
we project t
o impl
e
ment.
This platform can be considered as a «
collective me
mory
»
and Participative and Distributed Info
r
mation
System
(PDIS) which will allow Cameroonian communities to share an co
-
construct knowledge on perm
a-
nent organi
z
ed activ
i
ties.
Tfsc disc 2014 si proposal (30 june2014)Han Woo PARK
Technological Forecasting and Social Change Special Issue
http://www.journals.elsevier.com/technological-forecasting-and-social-change/
Special issue title
Open (Big) Data as Social Change: Triple Helix Innovation toward Government 3.0
Associated conference
The 2nd Annual Asian Hub Conference on Triple Helix and Network Sciences (DISC 2014) on Data as Social Culture: Networked Innovation and Government 3.0, to be held on December 11-13, 2014, in Daegu and Gyeongbuk (Gyeongju), Rep. of Korea.
Call for Papers: http://www.slideshare.net/hanpark/disc-2014-cfp-v3
The conference is organized by Asia Triple Helix Society (ATHS). Point of contact: Secretary to Prof. Dr. Han Woo Park (info.disc2014@gmail.com), Department of Media & Communication, YeungNam University, 214-1, Dae-dong, Gyeongsan-si, Gyeongsangbuk-do, South Korea, Zip Code 712-749.
Associate Editors: Managing Guest Editors (MGE)
Wayne Weiai Xu, Doctoral Candidate, SUNY-Buffalo, USA, weiaixu@buffalo.edu
Dr. In Ho Cho, YeungNam University, Rep. of Korea, haihabacho@gmail.com
Important Dates
DISC 2014: 11 to 13 December 2014
Full paper submission: 1 March 2015
Review & Revision period: 1 September 2015
Online Publication: 1 December 2015
* We are also open to non-conference submissions to the special issue. However, the priority will be given to papers presented at the DISC 2014 and its associated seminars.
Social media? It's serious! Understanding the dark side of social mediaIan McCarthy
This document discusses the dark side of social media and its unintended negative consequences. It begins by noting that while research has focused on the benefits of social media, there are also significant risks to individuals, communities, firms, and society. Examples of these risks include cyberbullying, addiction, trolling, witch hunts, fake news, and privacy abuse.
The document then adapts an existing social media framework to explain how each of the seven functional building blocks of social media (conversations, sharing, presence, relationships, reputation, groups, identity) can have unintended negative consequences. For each building block, examples are given of how functions meant to connect and engage users can enable harmful behaviors like harassment, inaccurate information
Searching for patterns in crowdsourced informationSilvia Puglisi
This document introduces crowdsourcing and discusses discovering patterns in crowdsourced data. It discusses defining the context of volunteered information on the internet in order to understand relationships between data. A network model is proposed where different types of context define nodes and relationships between context determine edges. Properties of small world networks are discussed including how they could be used to model relationships between crowdsourced data and evaluate data quality. Finally, applications to search ranking, privacy and security are briefly mentioned.
What Data Can Do: A Typology of Mechanisms
Angèle Christin .
International Journal of Communication > Vol 14 (2020) , de Angèle Christin del Departamento de Comunicación de Stanford University, USA titulado "What Data Can Do: A Typology of Mechanisms". Entre otras cosas es autora del libro "Metrics at Work.
Similar to Detecting Important Life Events on Twitter Using Frequent Semantic and Syntactic Subgraphs (20)
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847. This material reflects only the author's view and the European Commission is not responsible for any use that may be made of the information it contains.
Evaluating Platforms for Community Sensemaking: Using the Case of the Kenyan ...COMRADES project
This document describes a study that evaluated how platforms can support community sensemaking during disruptive events. The researchers conducted a scenario-based evaluation using data from Kenya's 2017 elections. Twelve students participated in the evaluation. They were given the task of mapping reports of voting incidents and irregularities from Kenya's Uchaguzi platform to assess the validity of the elections and support security forces. The goal was to examine how such a platform could aid non-mandated responders' situational understanding. Data was collected on the participants' sensemaking process to identify requirements for resilience platforms and inform future research.
Helping Crisis Responders Find the Informative Needle in the Tweet HaystackCOMRADES project
Leon Derczynski - University of Sheffield,
Kenny Meesters - TU Delft, Kalina Bontcheva - University of Sheffield, Diana Maynard- University of Sheffield
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
An Extensible Multilingual Open Source LemmatizerCOMRADES project
This document summarizes an article that presents an open-source lemmatizer called GATE DictLemmatizer. The lemmatizer currently supports lemmatization for English, German, Italian, French, Dutch, and Spanish. It uses a combination of automatically generated lemma dictionaries from Wiktionary and the Helsinki Finite-State Transducer Technology (HFST). An evaluation shows it achieves similar or better results than the TreeTagger lemmatizer for languages with HFST support, and still provides satisfactory results for languages without HFST support. The lemmatizer and tools to generate dictionaries are made freely available as open source.
Classifying Crises-Information Relevancy with SemanticsCOMRADES project
Prashant Khare, Gregoire Burel, and Harith Alani
Knowledge Media Institute, The Open University, United Kingdom
fprashant.khare,g.burel,h.alanig@open.ac.uk
D6.2 First report on Communication and Dissemination activitiesCOMRADES project
COMRADES project was launched in January 2016 with a lifetime of 36 months and it aims to empower communities with intelligent socio-technical solutions to help them reconnect, respond to, and recover from crisis situations.
COMRADES consortium will build a next generation, intelligent resilience platform to provide high socio-technical innovation and to support community resilience in crises situations. The platform will capture and process in real-time, multilingual social information streams from distributed communities, for the purpose of identifying, aggregating, and verifying reported events at the citizen and community levels. Resilience frameworks, guidelines and best practices will be embedded into the platform design and functionality, enriched with open datasets and open source software.
The main objectives of the project is to foster social innovation during crises for safeguarding communities during critical scenarios from inaccurate, distrusted, and overhyped information, and for raising citizen and community awareness of crisis situations by providing them with filtered, validated, enriched, high quality, and actionable knowledge. Community decision-making will be assisted by automated methods for real-time, intelligent processing and linking of crowdsourced crisis information.
This document forms deliverable D6.2 “First report on communication and dissemination activities”. It outlines the dissemination and communication objectives and strategy of the reporting period and focuses on the tools and activities that were undertaken to accomplish the objectives set. The deliverable reports on dissemination tools (website, social media, press releases, newsletter issues, brochures, etc.) used from M1 to M12 to disseminate the project implementing the online and offline dissemination strategy D6.1 deliverable in M6. Also, it presents the dissemination activities that have been implemented by the partners and are foreseen in the Description of Action for WP6. It will be updated yearly during the whole du project.
It is based on, and is consistent with, the DoA and the CA, but is not a substitute for reading these documents.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847
http://www.comrades-project.eu/outputs/deliverables/82-deliverables/46-d6-2-first-report-on-communication-and-dissemination-activities.html
This report describes the tools developed for multilingual text processing of social media. It gives details about the linguistic approaches used, the scope of the tools, and some results of performance evaluation. WP3 focuses on developing methods to detect relevant and informative posts, as well as content with clear validity, so that these can be dealt with efficiently without the clutter of uninformative or irrelevant posts obscuring the important facts. In order to achieve these objectives, low-level linguistic processing components are first required in order to generate lexical, syntactic and semantic features of the text required by the informativeness and trustworthiness components to be developed later in the project. These tools are also required by components developed in WP4 for the detection of emergency events, modelling and matchmaking. They take as input social media and other kinds of text-based messages and posts, and produce as output additional information about the language of the message, the named entities, and syntactic and semantic information.
In this report, we first describe the suite of tools we have developed for Information Extraction from social media for English, French and German. While English is the main language of messages dealt with by the tools in this project, it is very useful to be able to both recognise and deal with messages in other languages. French and German are therefore used as examples to show the adaptability of our tools to other languages, and our multilingual components thus serve as a testbed for new language adaptation techniques with which we have experimented during the project.
Various aspects of these tools are evaluated for accuracy. Second, we describe the tools we have developed for entity disambiguation and linking from social media, for English, French and German. These ensure not only that we extract relevant instances of locations, names of people and organisations, but that we know which particular instance we are talking about since these names may potentially refer to different things. By linking to a semantic knowledge base, we ensure both disambiguation and also that we have additional knowledge (for example, the coordinates of a location). The tools are evaluated for accuracy as well as speed, since traditionally these techniques are extremely slow and cumbersome to use in real world scenarios. The tools are all made available as GATE Cloud services.
http://www.comrades-project.eu/outputs/deliverables/82-deliverables/43-d3-1-multilingual-content-processing-methods.html
D4.1 Enriched Semantic Models of Emergency EventsCOMRADES project
This document presents a semantic model for representing emergency event data in the COMRADES project. It analyzes requirements from multiple sources, including the Ushahidi platform data structures, COMRADES tool requirements, stakeholder interviews, and crisis datasets. Based on this analysis, an Ontology Requirement Specification Document is created. Finally, an ontology model is presented and evaluated against the competency questions from the requirements analysis to ensure it meets the project needs. The model will be integrated into the COMRADES platform to semantically represent emergency event data and metadata.
SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for ...COMRADES project
Leon Derczynski and Kalina Bontcheva and Maria Liakata and Rob Procter and Geraldine Wong Sak Hoi and Arkaitz Zubiaga
Media is full of false claims. Even Oxford Dictionaries named “post-truth” as the word of 2016. This makes it more important than ever to build systems that can identify the veracity of a story, and the nature of the discourse around it.RumourEval is a SemEval shared task that aims to identify and handle rumours and reactions to them, in text. We present an annotation scheme, a large dataset covering multiple topics – each having their own families of claims and replies – and use these to pose two concrete challenges as well as the results achieved by participants on these challenges.
http://www.derczynski.com/sheffield/papers/rumoureval-task.pdf
Prospecting Socially-Aware Concepts and Artefacts for Designing for Community...COMRADES project
This document discusses concepts from the Socially-Aware Design approach that could help inform the design of technologies to boost community resilience for refugees. It introduces the Semiotic Onion model, which views a design problem across technical, formal, and informal sociocultural layers. It also discusses Edward Hall's Basic Building Blocks of Culture as a way to understand cultural aspects that may influence design. The document argues these concepts could help identify important values, threats, and resilience factors for refugee communities to ensure designs are aligned with their needs and context.
On Semantics and Deep Learning for Event Detection in Crisis SituationsCOMRADES project
In this paper, we introduce Dual-CNN, a semantically-enhanced deep learning model to target the problem of event detection in crisis situations from
social media data. A layer of semantics is added to a traditional Convolutional Neural Network (CNN) model to capture the contextual information that is generally scarce in short, ill-formed social media messages. Our results show that
our methods are able to successfully identify the existence of events, and event types (hurricane, floods, etc.) accurately (> 79% F-measure), but the performance of the model significantly drops (61% F-measure) when identifying fine-grained event-related information (affected individuals, damaged infrastructures, etc.).
These results are competitive with more traditional Machine Learning models, such as SVM.
http://oro.open.ac.uk/49639/1/event_detection.pdf
A Semantic Graph-based Approach for Radicalisation Detection on Social MediaCOMRADES project
This document presents a semantic graph-based approach for detecting radicalization on social media, specifically Twitter. The approach extracts semantic concepts and relations from tweets and represents them as graphs. Frequent subgraph mining is used to identify patterns that distinguish pro-ISIS and anti-ISIS stances. Classifiers are trained using these "semantic features" and are shown to outperform classifiers using only lexical, sentiment, topic and network features. The top entities and relations discussed differ between pro-ISIS and anti-ISIS users.
Behind the Scenes of Scenario-Based Training: Understanding Scenario Design a...COMRADES project
This document discusses scenario-based training exercises for disaster response organizations. It notes that current scenario design processes are often inflexible and do not adapt to how coordination emerges in real disasters. The authors observed a large humanitarian aid exercise called TRIPLEX-2016 to understand current scenario design practices. They propose an adaptive scenario generator that can select and adjust scenarios in real-time to better address both individual and collective learning goals for organizations responding to complex, uncertain disasters.
Sustainable Performance Measurement for Humanitarian Supply Chain Operations COMRADES project
WiPe Paper – Logistics and Supply-Chain Proceedings of the 14th ISCRAM Conference – Albi, France, May 2017 Tina Comes, Frédérick Bénaben, Chihab Hanachi, Matthieu Lauras, Aurélie Montarnal, eds.
http://idl.iscram.org/files/lauralagunasalvado/2017/1510_LauraLagunaSalvado_etal2017.pd
DoRES — A Three-tier Ontology for Modelling Crises in the Digital AgeCOMRADES project
Burel, Gregoire; Piccolo, Lara S. G.; Meesters, Kenny and Alani, Harith (2017). DoRES — A Three-tier Ontology for Modelling Crises in the Digital Age. In: ISCRAM 2017 Conference Proceedings, (in press).
http://oro.open.ac.uk/49285/
D2.1 Requirements for boosting community resilience in crisis situationCOMRADES project
COMRADES (Collective Platform for Community Resilience and Social Innovation during Crises, www.comrades-project.eu) aims to empower communities with intelligent socio-technical solutions to help them reconnect, respond to, and recover from crisis situations.
This deliverable reviews theories, conceptual frameworks, standards (e.g., BS 11200 and ISO 22320-2011) and indicators of community resilience. It analyses relevant use cases and reports of resilience around various type of crises.
These assessments enable the project team to present a definition of community resilience that is tailored specifically to the needs and aims of the COMRADES project, which focuses on the role of information and technology as a driver of community resilience.
The project’s stance to improve resilience can therefore be summarized by:
Continuously enhancing community resilience through ICT, instead of focusing on specific shocks or disruptive events.
Enabling a broad range of actors to acquire a relevant, consistent and coherent understanding of a stressing situation.
Empower decision makers and trigger community engagement on response and recovery efforts, including long-term mitigation and preparation.
The deliverable uses this definition and the project’s overall understanding of community resilience, as presented in the review sections, to specify initial design and functional requirements for adopting and integrating resilience procedures and methodologies into the COMRADES collective resilience platform. At the same time, we develop an evaluation framework that enables us to measure the contribution of the COMRADES resilience platform to building community resilience.
Finally, we outline three prototypical crisis scenarios that will enable the project development, test and evaluation. As such, the deliverable informs the design of the community workshops and interviews planned that will be conducted in T2.2 and T2.3, as well as the work of technology development, particularly in WP5.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847
OMRADES is creating an open‐source, community resilience platform, designed by communities, for communities, to help them reconnect, respond to, and recover from crisis situations. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847 This material reflects only the authors view and the European Commission is not responsible for any use that may be made of the information it contains
OMRADES is creating an open‐source, community resilience platform, designed by communities, for communities, to help them reconnect, respond to, and recover from crisis situations. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847 This material reflects only the authors view and the European Commission is not responsible for any use that may be made of the information it contains
OMRADES is creating an open‐source, community resilience platform, designed by communities, for communities, to help them reconnect, respond to, and recover from crisis situations. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847 This material reflects only the authors view and the European Commission is not responsible for any use that may be made of the information it contains
Bharat Mata - History of Indian culture.pdfBharat Mata
Bharat Mata Channel is an initiative towards keeping the culture of this country alive. Our effort is to spread the knowledge of Indian history, culture, religion and Vedas to the masses.
Presentation by Julie Topoleski, CBO’s Director of Labor, Income Security, and Long-Term Analysis, at the 16th Annual Meeting of the OECD Working Party of Parliamentary Budget Officials and Independent Fiscal Institutions.
Presentation by Rebecca Sachs and Joshua Varcie, analysts in CBO’s Health Analysis Division, at the 13th Annual Conference of the American Society of Health Economists.
Jennifer Schaus and Associates hosts a complimentary webinar series on The FAR in 2024. Join the webinars on Wednesdays and Fridays at noon, eastern.
Recordings are on YouTube and the company website.
https://www.youtube.com/@jenniferschaus/videos
Indira awas yojana housing scheme renamed as PMAYnarinav14
Indira Awas Yojana (IAY) played a significant role in addressing rural housing needs in India. It emerged as a comprehensive program for affordable housing solutions in rural areas, predating the government’s broader focus on mass housing initiatives.
karnataka housing board schemes . all schemesnarinav14
The Karnataka government, along with the central government’s Pradhan Mantri Awas Yojana (PMAY), offers various housing schemes to cater to the diverse needs of citizens across the state. This article provides a comprehensive overview of the major housing schemes available in the Karnataka housing board for both urban and rural areas in 2024.
Detecting Important Life Events on Twitter Using Frequent Semantic and Syntactic Subgraphs
1. IADIS International Journal on WWW/Internet
Vol. 14, No. 2, pp. 23-37
ISSN: 1645-7641
23
DETECTING IMPORTANT LIFE EVENTS ON
TWITTER USING FREQUENT SEMANTIC AND
SYNTACTIC SUBGRAPHS
Thomas Dickinson1
, Miriam Fernandez1
, Lisa Thomas2
, Paul Mulholland1
,
Pam Briggs2
and Harith Alani1
1
KMI, UK
2
Northumbria University, UK
ABSTRACT
Identifying global events from social media has been the focus of much research in recent years.
However, the identification of personal life events poses new requirements and challenges that have
received relatively little research attention. In this paper we explore a new approach for life event
identification, where we expand social media posts into both semantic, and syntactic networks of
content. Frequent graph patterns are mined from these networks and used as features to enrich life-event
classifiers. Results show that our approach significantly outperforms the best performing baseline in
accuracy (by 4.48% points) and F-measure (by 4.54% points) when used to identify five major life
events identified from the psychology literature: Getting Married, Having Children, Death of a Parent,
Starting School, and Falling in Love. In addition, our results show that, while semantic graphs are
effective at discriminating the theme of the post (e.g. the topic of marriage), syntactic graphs help
identify whether the post describes a personal event (e.g. someone getting married).
KEYWORDS
Semantic Networks, Event Detection, Frequent Pattern Mining, Classification, Social Media.
1. INTRODUCTION
Social media platforms are not only the online spaces where billions of people connect and
socialise, they are also where much of their prominent life events are captured, shared, and
reflected on. Most of these platforms are exploring methods that make use of this content for
profiling, marketing, and product recommendation, which constitutes a valuable resource for
governments and organisations. More recently, several initiatives are also focusing on
exploring the value that this content has for the individuals as a means of self-reflection.
2. IADIS International Journal on WWW/Internet
24
Services such as Museum of Me, Facebook Look Back, and Google Creations aim to generate
brief automated biographies for the users based on their generated content. Although these
services and technologies are still in their infancy, they raise the need for a more accurate
identification of personal life events.
However, as opposed to the identification of large-scale events from social media (e.g.,
detection of news stories (Wayne, 2000) natural disasters (Sakaki, et al., 2010), music events
(Liu, et al., 2011) etc., which has gained much interest in recent years, the identification of
personal life events (e.g. getting married, having a child, starting school) is still largely
unexplored. This problem poses new requirements and challenges, since these events do not
receive the same amount of coverage as large-scale events, unless they involve celebrities
(Choudhury & Alani, 2014), and do not show similar geographical or temporal distributions.
In addition, the accurate identification of life events in social media requires distinguishing
between posts that mention a life event (e.g., I’m getting married in the church today) and
posts about the same theme that do not mention a life event (e.g., check out my wedding
photography business). While previous works have found that uni-grams are one of the best
features for thematically classifying posts in terms of whether or not they mention the topic of
a life event (Di Eugenio, et al., 2013) (Dickinson, et al., 2015) current methods cannot
adequately distinguish posts describing life events from posts of the same theme that do not.
In our work, we focus on identifying events from Twitter data that are regarded as some of
the most significant life events in the psychology literature (Janssen & Rubin, 2011): Getting
Married, Having Children, Starting School, Death of a Parent, and Falling in Love. To address
the above mentioned challenges we propose a novel approach based on the representation of
posts as syntactic and semantic graphs. Our hypothesis is that, by considering the
conceptualisations that emerge from social media posts (mined in our work by using WordNet
and ConceptNet), as well their syntactic structures (extracted by applying dependency
parsing), we can obtain relevant insights for a more accurate detection of personal life events.
These insights are extracted by performing frequent pattern mining over the generated graphs
and used as features to generate life event classifiers. Our results show that our approach
significantly outperforms the state of the art baseline in accuracy (by 4.48% points) and
F-measure (by 4.54% points). In addition, we can observe from our results that, while
semantic graphs are effective at discriminating the theme of the post, syntactic graphs help
identifying whether the personal event is present in the post or not. The main contributions of
this paper are:
1. Introduce a semantic and syntactic graph-based approach for identifying five
personal-life events using ConceptNet, WordNet, and Dependency Parsing
2. Mine frequent graph patterns, and serialise them as features, in a LibLINEAR
classification algorithm
3. Compare several feature reduction techniques to elicit the best performance
4. Construct tri-class classifiers to categorise posts into those about an event, those
about the theme of an event, and those that are neither
5. Compare results to the state of the art baseline, showing an average increase of 4.48%
points in accuracy, 4.54% points in F-measure
6. Discuss the advantages of using syntactic and semantic graphs for personal life event
identification
3. DETECTING IMPORTANT LIFE EVENTS ON TWITTER USING FREQUENT SEMANTIC AND
SYNTACTIC SUBGRAPHS
25
2. STATE OF THE ART
As mentioned earlier, much work has been done on detecting global and/or famous events,
such as natural disasters (Sakaki, et al., 2010) music events (Liu, et al., 2011), and prominent
news stories (Wayne, 2000). However, detecting life events differs from the detection of such
global events. Life events tend to focus on one particular individual, and only very few posts
from a user’s history are likely to refer to a particular life event. Few works in the literature
have attempted to address this problem, although it is becoming more prevalent. (Di Eugenio,
et al., 2013) investigated the identification of life events related to employment, and marriage.
Their approach compared several different classifiers trained with different subsets of
linguistic features, concluding that uni-grams are the best features to automatically identify
these types of events. (Li, et al., 2014) took a slightly different approach. Rather than trying to
identify a set of pre-defined life events, this approach focused on detecting life events
mentioned on Twitter by clustering posts based on congratulatory and condolence replies such
as “Congratulations”, and “Sorry to hear that”. Their approach is also based on the use of
linguistic features including sequences of words within the tweets, named entities, and topic
models for different life event categories. (Choudhury & Alani, 2014) studied the use of
interaction features to identify several events: marriage, graduation, new job, new born, and
surgery. Their approach aggregates linguistic features with interaction features (i.e, features
based on the interactions of users with their social network - comments, replies, etc.).
(Cavalin, et al., 2015) look at classifying the event “Travel” in both English and Portuguese.
Their approach looks at using several different feature sets such as uni-grams, bi-grams,
co-occurrence of n-grams in conversations, and a multiple classifier system.
Our work extends the above approaches by representing social media posts as graphs, to
capture the interdependencies of linguistic, semantic and interaction features. Although using
graphs for finding similar concepts is not a new idea (e.g., (Resnik, 1999) and (Rada, et al.,
1989)), applying it for identifying life events is vastly under researched. (Sayyadi, et al., 2009)
used graphs for news-event identification, by creating a “key graph” of key words co-
occurring within a document, where nodes represent individual keywords and edges represent
their co-occurrence. To the best of our knowledge, our work is the first one that uses a
combination of syntactic and semantic graphs for personal life event identification.
3. APPROACH
3.1 Representing Posts as Feature Graphs
Before we can extract frequent patterns from social media, we first need to represent our posts
as graph structures. An example of a full feature graph for the tweet “I had a baby” is
displayed in Figure 2. Our graphs can be placed into two specific groups: Semantic, and
Syntactic.
3.1.1 Semantic Graphs
We construct semantic graphs by extracting concepts and synsets from the posts, then
expanding into two popular semantic networks: ConceptNet, and WordNet. Our hypothesis is
that, after expansion, posts may have related nodes amongst one another. For example, the
4. IADIS International Journal on WWW/Internet
26
concepts “mother” and “father” are related by the concept “parent” within ConceptNet. By
extending these two posts into ConceptNet, /c/en/parent would then become a feature,
appearing in two posts. Figure 1 highlights this example.
In this work, we only expand 1 hop into both ConceptNet and WordNet, to limit the size of
the graphs we create. In the case of WordNet, we extend one hop using both hypernyms, and
hyponyms with an “IS_A” relationship, whereas for ConceptNet, we include all relationship
types. As concepts in ConceptNet are normalised WordNet URIs, we extract N-grams from
each post, and use ConceptNet’s URI standardisation API.1
Figure 1. Example of a Semantic Graph
3.1.2 Syntactic Graphs
Syntactic graphs can be divided into two areas: dependency graphs, and token graphs.
Dependency graphs can be constructed using dependency parsing (Covington, 2001), which
are the syntactic dependencies between words within a sentence. Dependency parsers
represent these dependencies as acyclic directed trees with a word, generally the main verb,
representing the main node, root, of the tree, and the rest of the words being dependants of the
head or other governor nodes. These trees constitute a graph, where words and their associated
part of speech (POS) are the nodes, and the edges are the syntactic dependencies between
these nodes.
Our hypothesis is that posts about the same life events may share similar syntactic
structures. In this work, we use the Stanford Neural Network Dependency parser (Chen &
Manning, 2014) to obtain these dependency trees from which syntactic graphs are generated.
Use of dependency graphs and n-grams within event detection is not uncommon (Li, et al.,
2014). However, we have not seen another approach that mines frequent sub-graphs from
syntactic graphs, in order to create feature sets for life event classifiers.
With token graphs we represent the different tokens (words) that emerge from social media
posts, with their associated POS, as nodes, and the sequence in which they appear within the
posts (i.e., one token before/after another token) as the edges. This is very similar to n-grams.
Our hypothesis is that posts about a particular life event may share common sequences of
either tokens or POS tags. We use TweetNLP (Owoputi, et al., 2013) to extract both tokens
and POS tags from the text of each post.
3.2 Mining Frequent Patterns
Once the previously described graphs have been generated from the social media posts, we
apply frequent pattern mining to the graphs associated with each type of life event. By
frequent, we mean a sub-graph appearing more than a minimum number of times n within a
1
https://github.com/commonsense/conceptnet5/wiki/API#uri-standardization
5. DETECTING IMPORTANT LIFE EVENTS ON TWITTER USING FREQUENT SEMANTIC AND
SYNTACTIC SUBGRAPHS
27
set of graphs. For this experiment, we set n to 2, which mirrors our baseline choice of filtering
n-grams.
To mine our frequent graphs, we use the Parallel and Sequential Mining Suites
(ParSeMiS2
) (Philippsen, et al., 2011), with the following optimisations:
· Mining Feature Sets Individually: While it is possible to combine our feature sets
into one large graph, we instead mine our feature sets individually for this experiment.
This is primarily to help reduce memory usage when mining our sub-graphs. We
discuss the possible benefits of mining combined graphs later in section 6
· CloseGraph: We detect closed graphs only, where a closed graph has no parent with
the same frequency. This has been shown to reduce the time complexity by an order of
magnitude when compared to using gSpan without this optimisation. As n is very low,
it is also a useful way to remove a large number of redundant features, as if graph A is
a sub-graph of B, then A always occurs with B, thus A is a redundant data point. We
use the CloseGraph (Yan & Han, 2003) algorithm, which is an improvement of gSpan
(Yan & Han, 2002) that mines only closed graphs.
· Limiting Sub-graph Edges: To improve performance further, we set a limit to the
maximum number of edges a sub-graph can have. Limiting the maximum size of the
subgraph significantly reduces the search cost of detecting sub-graphs from the
dataset.
· Semantic Graph Pruning: We also perform graph pruning on our semantic graphs.
Not all expansions into networks like ConceptNet would be useful and may contain
redundancies, thus increasing memory usage. For example, posts about “Having a
Child'', may contain the concept /c/en/baby, which when expanded into ConceptNet
returns /c/en/play. Within our dataset, if /c/en/play only exists as an expansion of
/c/en/baby, then /c/en/play would be redundant as a feature. Given this redundancy,
we can prune our graphs prior to frequent sub-graph mining, looking for nodes that
only ever appear as expansions of one particular node, vastly reducing our memory
complexity.
· Cyclic Graphs: In addition to pruning, we also represent our sub-graphs as cyclic,
grouping nodes with the same label together.
In Section 5.1 we assess the value of the optimisation steps above. Once frequent patterns
are mined, we use them to build event classifiers to identify the five events used in this paper.
4. EXPERIMENT SETUP
4.1 Dataset
Our dataset was collected back in December 2014, by mining Twitter’s advanced search
features, using query expansion into WordNet with several key concepts for each type of life
event. More information can be found in our previous paper (Dickinson, et al., 2015) This
dataset is composed of 12,241 tweets manually annotated through CrowdFlower. Five relevant
2
https://www2.informatik.uni-erlangen.de/EN/research/zold/ParSeMiS/index.html
6. IADIS International Journal on WWW/Internet
28
life events, common across age and culture (Janssen & Rubin, 2011), are represented in this
dataset: Starting School, Falling in Love, Getting Married, Having Children, and Death
of a Parent.
For each tweet, annotations contain answers to two questions: (i) Is this tweet about an
important life event? and (ii) Is this tweet related to a particular event theme? - where event
theme refers to one of the previously mentioned types of life events. Additionally, we
introduce a random sample of 2,000 tweets collected and internally annotated from a Twitter
sample endpoint to generate a “not about event or theme” class. This represents a class where
neither category of theme or event exists. The CrowdFlower annotations for this dataset are
publicly available.
4.2 Training Set Selection
We use the posts of the previously described dataset to generate the feature graphs (see
Section 3.1). These feature graphs are then mined, and the extracted patterns are used as
features to train classifiers for each type of life event (Death of a Parent, Having Children,
Falling in Love, Getting Married, and Starting School). For each life event, we construct a
tri-class classifier with the following labels: About Event and About Theme (+E+T), Not
About Event and About Theme (-E+T), Not About Event and Not About Theme (-E-T).
As named, +E+T contains those posts that are both about a particular event, and its given
theme, while those in -E+T contain posts that are thematically similar, but do not indicate an
event has happened. Our third class, -E-T, contains tweets extracted from Twitter’s random
sample endpoint. In each case, we balance the training set against the size of the smallest
class, with the following distributions: Getting Married (706 per class), Starting School (419
per class), Having Children (387 per class), Death of a Parent (116 per class), and Falling in
Love (114 per class).
7. DETECTING IMPORTANT LIFE EVENTS ON TWITTER USING FREQUENT SEMANTIC AND
SYNTACTIC SUBGRAPHS
29
Figure 2. Example of a combined semantic and syntactic graph for the text "I had a baby"
To consider if our approach is statistically significant, we generate our dataset 10 times for
each life event, so that we can perform t-tests against our baseline.
8. IADIS International Journal on WWW/Internet
30
4.3 Subgraph Feature Representation
After mining our subgraphs, we need to represent them in our classifier as a vector of features.
For simplicity, we represent each frequent pattern as a feature, whose value is either 1, when a
training instance contains that frequent pattern, or 0 when it does not.
4.4 Classifier Generation and Evaluation
Each training dataset, and combination of features (semantic ConceptNet, semantic WordNet,
syntactic), is used as input to a LibLINEAR classifier. LibLINEAR works particularly well
with very large feature sets. We also independently tested several other classifiers (J48, Naive
Bayes, and SVMS with linear, polynomial, sigmoid, and radial bias kernels), but in all cases,
LibLINEAR outperformed each in, F-Measure, time performance, and accuracy. The
generated classifier is tested using 10-fold cross validation considering Precision, Recall and
F1-measure as evaluation metrics. In section 5, we report some of our best performing feature
sets using this classification and evaluation set-up.
4.5 Baseline Selection
For our baseline, we implement the features outlined in Li et al. These included N-grams,
topic dictionaries, and entities. Entities are extracted using TextRazor3
, and for the case of
topic dictionaries, our topic models are generated slightly differently to Li et al, where we
discover 5 topics (representing our five chosen life events) across our total annotated dataset.
The top 40 words for each topic are used as our topic dictionaries, as outlined in Li et al.
Rather than treat n-grams, topics, and entities as a single baseline, for each experiment, we run
our classifiers against every permutation of the baseline features, and select the best one to
compare against our graph approach.
4.6 T-Tests
For our T-Tests, we generate 10 samples for each type of event. We create permutations of all
possible feature combinations (N-Grams, Entities, Topics, Syntactic, Semantic
ConceptNet, and Semantic WordNet), and compare against the best performing baseline
combination (mentioned in section 4.5).
Thus, the hypotheses for our t-tests are:
· H0: Graph features do not help improve the f-measure over our baseline.
· H1: By introducing our graph features, we improve our f-measure.
4.7 Feature Reduction
Due to the large number of features extracted from our pattern mining approach, we also
consider several feature reduction techniques within Weka. Our approach uses a ranker
method, where we filter our attributes via a chosen metric. In order to choose the best suitable
3
https://www.textrazor.com/
9. DETECTING IMPORTANT LIFE EVENTS ON TWITTER USING FREQUENT SEMANTIC AND
SYNTACTIC SUBGRAPHS
31
attribute metric, we compared the results of several attribute selection algorithms: Information
Gain, Gain Ratio, Correlation, and Symmetrical Uncertainty. We found that Gain Ratio gave a
significant improvement of 13 points for F1 score. Initially we remove any feature whose rank
is 0, then gradually reduce the number of features over 20 even intervals over the remaining
attributes.
5. RESULTS
Our results are displayed in Table 1 for all our events: Getting Married, Death of a Parent,
Having Children, Starting School, and Falling in Love. We compare the performance of the
best baseline (B) against the best feature combination. We report: F1, change in F1 to baseline
(∆ F1), Accuracy (A), Precision (P), Recall (R), and P-Value (P-Val). We observe statistical
significance when p-val < 0.05.
As can be seen from Table 1, in all cases, our graph-based methodology merged with some
of the baseline features produces a significant improvement in all measures (F1, Accuracy,
Precision, and Recall), across each of our events. Our biggest improvement is in Falling in
Love, with a 6.8% percentage point increase for F1, while our smallest is in Having Children,
with only 1.1%. However, the improvements over the baseline are all statistically significant
(p < 0.01).
Overall our improvements show a 4.54% point increase in F-Measure over our baselines.
For each classifier, the best performing baseline combination remains constant (N-Grams,
Topics, Entities), which is the full methodology outlined in Li et al. The best performing
feature combination is also the same (N-grams, Topics, Entities + WordNet and Syntactic
patterns).
Table 1. Classifier Results
Theme Feature Set F1 ∆ F1 A P R P-Val
Getting
Married
Best Baseline: Ent,Ngrm,Tpc 85.9% 86.0% 86.1% 86.0%
Con 66.5% -19.4% 67.8% 67.6% 67.8% <0.01
Ent 45.4% -40.4% 51.4% 63.1% 51.4% <0.01
Ngrm 83.7% -2.2% 84.0% 83.8% 84.0% <0.01
Syn 79.0% -6.9% 79.7% 80.8% 79.7% <0.01
Tpc 71.0% -14.9% 72.3% 71.9% 72.3% <0.01
WN 70.6% -15.3% 71.7% 71.4% 71.7% <0.01
Best Graph Only: Con, Syn,
Wn
86.3% 0.4% 86.5% 86.5% 86.5% 0.04
Con,Ent,Ngram,Syn,Tpc,Wn 90.7% 4.8% 90.8% 90.7% 90.8% <0.01
Death of a
Parent
Best Baseline: Ent,Ngrm,Tpc 86.2% 86.2% 86.7% 86.2%
Con 71.0% -15.2% 71.6% 71.5% 71.6% <0.01
Ent
Ngrm 85.4% -0.8% 85.5% 86.0% 85.5% 0.02
Syn 82.2% -4.0% 82.4% 83.9% 82.4% <0.01
Tpc 61.1% -25.1% 60.4% 67.7% 60.4% <0.01
10. IADIS International Journal on WWW/Internet
32
WN 79.4% -6.7% 79.7% 79.7% 79.7% <0.01
Best Graph Only:
Con,Syn,WN
87.8% 1.6% 87.9% 88.8% 87.9% 0.06
Ent,Ngrm,Syn,Tpc,WN 90.7% 4.5% 90.7% 91.1% 90.7% <0.01
Having
Children
Best Baseline: Ent,Ngrm,Tpc 89.8% 89.9% 90.2% 89.9%
Con 67.7% -23.1% 67.7% 67.9% 67.7% <0.01
Ent 35.5% -54.3% 41.4% 59.4% 41.4% <0.01
Ngrm 87.9% -1.9% 88.0% 88.3% 88.0% <0.01
Syn 79.7% -10.1% 80.2% 82.9% 80.2% <0.01
Tpc 72.4% -17.4% 73.1% 72.2% 73.1% <0.01
WN 73.8% -16.0% 74.7% 74.8% 74.7% <0.01
Best Graph
Only:Con,Syn,WN
86.2% -3.5% 86.5% 88.0% 86.5% <0.01
Ent,Ngrm,Syn,Tpc,WN 90.9% 1.1% 91.0% 91.7% 91.0% <0.01
Starting School
Best Baseline: Ent,Ngrm,Tpc 87.7% 87.8% 88.3% 87.8%
Con 64.9% -22.8% 66.1% 64.9% 66.1% <0.01
Ent 32.1% -55.6% 37.4% 53.4% 37.4% <0.01
Ngrm 85.0% -2.7% 85.2% 85.2% 85.2% <0.01
Syn 86.3% -1.4% 86.6% 87.9% 86.6% <0.01
Tpc 72.4% -15.3% 72.6% 72.4% 72.6% <0.01
WN 66.1% -21.6% 68.0% 67.0% 68.0% <0.01
Best Graph Only:Syn,WN 88.9% 1.2% 89.1% 90.1% 89.1% 0.025
Ent,Ngrm,Syn,Tpc,WN 93.0% 5.5% 93.0% 93.5% 93.0% <0.01
Falling in Love
Best Baseline: Ent,Ngrm,Tpc 83.7% 83.9% 84.2% 83.9%
Con 68.4% -15.4% 68.9% 69.4% 68.9% <0.01
Ent 29.5% -54.2% 34.6% 53.3% 34.6% <0.01
Ngrm 80.4% -3.3% 80.6% 81.1% 80.6% <0.01
Syn 82.1% -1.6% 82.3% 84.4% 82.3% <0.01
Tpc 62.3% -21.5% 65.1% 65.3% 65.1% <0.01
WN 70.3% -13.5% 71.3% 71.7% 71.3% 0.103
Best Graph
Only:Con,Syn,WN
86.8% 3.0% 86.9% 87.3% 86.9% 0.03
Ent,Ngrm,Syn,Tpc,WN 90.5% 6.8% 90.6% 91.2% 90.6% <0.01
5.1 Graph Optimizations
Earlier in Section 3.2, we described a number of steps we take to optimise our graph
processing steps. Here we report the outcome of applying those steps on our experiment and
results.
By detecting closed graphs only, we succeeded in significantly reducing the number of
detected sub-graphs. For example, for Getting Married, the reduction was from 1.2 million to
250k sub-graphs.
We explained that to improve computational performance, we limit the maximum number
of edges in a sub-graph to 1 for our semantic networks. For syntactic, as the graphs contain far
11. DETECTING IMPORTANT LIFE EVENTS ON TWITTER USING FREQUENT SEMANTIC AND
SYNTACTIC SUBGRAPHS
33
fewer nodes than our semantic graphs, we did not have to limit it. While limiting the number
of edges comes at a small cost to F-measure, the cost is limited compared to the reduction in
time. Table 2 shows some the trade off of limiting the number of edges in terms of F-measure
and time for the Death of a Parent dataset using the WordNet feature set. Time taken includes
pruning the graph as well as mining it. Note that Death of a Parent is one of our smallest
datasets. Running something similar with a larger dataset, such as Having Children, can take
several hours to obtain a result without limiting edges. This shows that the gains from this
pruning and limiting significantly outweigh the cost.
Table 2. Edge Limiting Results
Edge Limit Pruned Time Taken (S) F1 Score
No Limit True 492 0.727
No Limit False 486 0.726
1 True 18 0.706
1 False 14 0.708
After both, pruning and cyclic operations are applied, we reduce our un-pruned graphs to
26% of their original size. Table 3 shows the sizes of serialised graphs for the event Getting
Married with ConceptNet features. Without any pruning, our average graph size is around
69kb, with a total size of 132mb of all graphs loaded. After pruning, we see about a 50%
reduction in total memory. Table 3 also shows that merging our labels together to create cyclic
structures, further reduces by another 50%.
Table 3. Pruning Sizes for Getting Married, ConceptNet Graphs
Pruned Methodology Average Graph Size Total Graph Size
Pruned Cyclic 18kb 34mb
Pruned 35kb 67mb
Not Pruned Cyclic 40kb 76mb
Not Pruned 69kb 132mb
5.2 Graph Feature Analysis
In this section, we briefly discuss the performance of our graph features, showing some of our
top performing sub-graph features over our classifiers, as well as discuss how our graph
features perform individually.
Across all of our classifiers, we see an interesting split in how Gain Ratio has divided our
features. At the top of the list, we see a lot of semantic features, such as concepts, and synsets.
For example, for Getting Married, we see a number of synsnets such as officiate, and knot.
For Falling in Love, we see a large number of concepts like /c/en/spouse,
/c/en/proposing_to_woman, /c/en/wonderful_feel. Death of a Parent has some synsets such
as die, and change state (die-[Is_a]->change state). Having Children top features includes
synsets such as babys, and child. Starting School has a number of top features revolving
around school, and educational institution. Interestingly for all of these patterns, we tend to
see their associated n-grams much further down the list. This makes sense as these semantic
patterns have a higher frequency, as they are comprised of a number of different n-grams.
12. IADIS International Journal on WWW/Internet
34
As we go further down the list however, we start seeing more specific syntactical patterns.
Examples of these are shown in figures 3 and 4. As can be seen, rather than functioning as
content-based discriminators, these patterns tend to be composed of part of speech tags,
spanning both word order, and dependency relationships.
Figure 3. An example Syntactic Pattern for Falling in Love
Figure 4. An Example Syntactic Pattern for Starting School
Table 4 displays how our individual graph features are represented in our classifiers after
feature selection using Gain Ratio. This table displays:
· Class Distribution - The class distribution (About Event and Theme +E+T, Not
About Event but About Theme -E+T, Not about Event or Theme -E-T) of each
feature set (WordNet, Concept, Syntactic)
· Post Occurrence - The average occurrence of the feature set within the posts
· Ent!"# - The mean entropy for the feature set for our class distribution
· F1!"# - The mean F-Measure when only that feature set is used in classification.
From the table we can make several interesting observations. Firstly, we can see the
average performance of Concepts is quite low in comparison to our other two features. This
may help explaining why as an added feature, it did not always help improving the
13. DETECTING IMPORTANT LIFE EVENTS ON TWITTER USING FREQUENT SEMANTIC AND
SYNTACTIC SUBGRAPHS
35
classification performance. One explanation for concepts not always appearing in our best
classifiers might be that ConceptNet and WordNet have a very similar feature distribution. By
adding both, the available features are increased, thus making it more difficult to perform
feature selection. Another could be that Concepts may not be effective discriminators.
Another interesting point is that our semantic feature sets bias towards our classes with
themes, whereas our syntactic feature set (while still slightly biased), is more evenly
distributed. Intuitively this makes sense, as those posts that share a theme should have terms
that convey a similar meaning, thus sharing similar synsets and concepts.
We also see a difference in the proportion of posts that our features appear in. Whereas our
semantic features occur in a large percentage of our posts, our syntactic ones appear far less
often. Our semantic features also have a similar entropy value, whereas for our syntactic
features entropy is much lower. This would suggest that, while a syntactic feature may only
occur in a small proportion of the dataset, it biases heavily towards a particular class.
Interestingly, the higher average entropy for our ConceptNet features may also explain why
WordNet performs better in our results.
To summarize, our results indicate that semantic features are useful at establishing whether
or not a post is about a particular theme with frequently highly thematically biased features,
whereas our syntactic features help separate whether or not a post is about a particular event,
using infrequent and highly singular class biased features.
Table 4. Frequent Pattern Analysis
Features +E+T -E+T -E-T Occurrences Ent "# F1"#
WordNet 52.51% 43.99% 3.50% 6.49% 0.23 72.02%
Concept 51.36% 42.99% 5.65% 14.24% 0.46 67.48%
Syntactic 34.19% 48.60% 17.22% 1.19% 0.09 81.84%
6. DISCUSSION
In section 5 we discussed some of our results and observations. In this section we provide
further discussion about the proposed approach, the obtained results, and its limitations.
In this work, we focused on the identification of five of the most common life events. A
natural future expansion of this work would be to address the full set of life events listed in
(Janssen & Rubin, 2011). This would help testing the consistency of our results across other
types of events and bring our work closer to the broader goal of the detection of life events in
general.
When identifying these life events, our approach considers a classifier for each individual
event. As Li et al (Li, Ritter, Cardie, & Hovy, 2014) point out in their work, a single large
multi classifier is a more complex task compared to individual theme detection, due to the
overlap of n-grams in certain topics. However, given our observation of how our semantic
patterns work, we are planning to use these patterns for the construction of multi-class
classifiers and observe whether semantic patterns can improve the performance of multi-class
classifiers for the problem at hand.
While we have successfully identified a number of interesting frequent graph patterns,
there is room for improvement when extracting and using these patterns. Given the scalability
issues of the selected gSpan algorithm (Yan & Han, 2002) we have peformed a wide range of
14. IADIS International Journal on WWW/Internet
36
optimisations, which may be partially limiting the acuracy of our results. Our future plans
include testing newer state of the art algorithms (e.g., (Pan, Wu, Zhu, Long, & Zhang, 2015)),
which consider the feature selection problem as being part of the pattern mining discovery
phase, reducing memory and time cost. An alternative might be to consider using a more
distributed pattern-mining approach, such as the one outlined in (Bhuiyan & Hasan, 2014).
When extracting our semantic patterns we made the assumption that two concepts that are
close together in ConceptNet are similar, but this assumption may not always be valid. For
example, two concepts linked by the relationship “IsAntonym” in ConceptNet are not similar,
but opposed to each other (e.g., “life” and “death”). As future work we plan to either filter out
all negating relationships like this one, or to add a weighting system where a heuristic function
could possibly be used to calculate how close two concepts are together in the graph (Rada,
Mili, Bicknell, & Blettner, 1989). To further improve the identification and extraction of
semantic patterns our future work also considers expanding our semantic graphs into other
networks, such as DBpedia and the addition of semantic frames using FrameNet (Baker,
Fillmore, & Lowe, 1998).
We opted to serialize our frequent patterns into a key/map pair so that we can represent our
training set as a vector of binary values. However, there exist a number of alternative
classification algorithms that are designed specifically for graphs; some reliant on frequent
patterns themselves, others taking into account the structure of the graph. In terms of our
frequent patterns, we also see a number of weak infrequent ones. However, rather than
representing these patterns as independent of each other, we could consider a methodology
such as COM (Jin, Young, & Wang, 2009) where co-occurrence of disconnected low
frequency patterns is considered instead, converting them into higher frequency ones.
Regarding our selected features and results, when combined with our baseline features, our
graph features obtain F1 scores that are consistently and significantly higher than the
baselines. When looking at our top performing features, we found that patterns extracted from
ConceptNet and WordNet were regularly at the top, and tended to help discriminate between
our negative class, and our two thematic classes. However, when attempting to discriminate
theme from event, these types of patterns were less useful as each will tend to use similar
languages and concepts. Instead, we found our syntactical patterns helped to discriminate
between events and non-events.
7. CONCLUSION
We have presented a new approach for the problem of life event detection that focuses on five
major life events that are regarded as some of the most significant life events in the
psychology literature (Janssen & Rubin, 2011): Getting Married, Having Children, Death of a
Parent, Starting School, and Falling in Love. Our approach expands a Twitter post into both a
syntactic and semantic graph. This network is then mined for frequent sub patterns that can be
used as features within a classifier, filtered via their gain ratio. Our results showed consistent
and significant improvement over baselines from other work, that consider only features
extracted directly from the post (i.e., no graphs). We showed that our graph-based approach
achieves significantly better accuracy (4.48% improvement) and F-measure (4.54%
improvement) than the top performing baselines.
15. DETECTING IMPORTANT LIFE EVENTS ON TWITTER USING FREQUENT SEMANTIC AND
SYNTACTIC SUBGRAPHS
37
REFERENCES
Baker, C. F., Fillmore, C. J. & Lowe, J. B., 1998. The berkeley framenet project. s.l., s.n.
Bhuiyan, M. A. & Hasan, M. A., 2014. FSM-H: Frequent Subgraph Mining Algorithm in Hadoop. s.l.,
s.n., pp. 9-16.
Cavalin, P. R., Moyano, L. G. & Miranda, P. P., 2015. A Multiple Classifier System for Classifying Life
Events on Social Media. s.l., s.n., pp. 1332-1335.
Chen, D. & Manning, C. D., 2014. A Fast and Accurate Dependency Parser using Neural Networks.. s.l.,
s.n., pp. 740-750.
Choudhury, S. & Alani, H., 2014. Personal life event detection from social media. s.l., CEUR.
Covington, M. A., 2001. A fundamental algorithm for dependency parsing. Athens, s.n.
Di Eugenio, B., Green, N. & Subba, R., 2013. Detecting life events in feeds from twitter. s.l., s.n.
Dickinson, T. et al., 2015. Identifying Prominent Life Events on Twitter. s.l., s.n.
Janssen, S. M. J. & Rubin, D. C., 2011. Age Effects in Cultural Life Scripts. Applied Cognitive
Psychology, 25(2), pp. 291-298.
Jin, N., Young, C. & Wang, W., 2009. Graph classification based on pattern co-occurrence. Proc. Conf.
on Info. and Knowledge Man. - CIKM '09.
Li, J., Ritter, A., Cardie, C. & Hovy, E. H., 2014. Major Life Event Extraction from Twitter based on
Congratulations/Condolences Speech Acts. s.l., s.n., pp. 1997-2007.
Liu, X., Troncy, R. & Huet, B., 2011. Using social media to identify events. s.l., s.n.
Owoputi, O. et al., 2013. Improved part-of-speech tagging for online conversational text with word
clusters. s.l., s.n.
Pan, S. et al., 2015. Finding the best not the most: regularized loss minimization subgraph selection for
graph classification. Pattern Recognition, Volume 48, pp. 3783-3796.
Philippsen, M., Worlein, M., Dreweke, A. & Werth, T., 2011. Parsemis: the parallel and sequential
mining suite. Available at www2. informatik. uni-erlangen. de/EN/research/ParSeMiS.
Rada, R., Mili, H., Bicknell, E. & Blettner, M., 1989. Development and application of a metric on
semantic nets. s.l., s.n., pp. 17-30.
Resnik, P., 1999. Semantic Similarity in a Taxonomy: An Information-Based Measure and its
Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence
Research, Volume 11, pp. 95-130.
Sakaki, T., Okazaki, M. & Matsuo, Y., 2010. Earthquake Shakes Twitter Users: Real-time Event
Detection by Social Sensors. Raleigh, North Carolina USA, s.n.
Sayyadi, H., Hurst, M. & Maykov, A., 2009. Event detection and tracking in social streams. San Jose,
{CA}, s.n.
Wayne, C. L., 2000. Topic detection and tracking in English and Chinese. NY, s.n.
Yan, X. & Han, J., 2002. gspan: Graph-based substructure pattern mining. s.l., s.n., pp. 721-724.
Yan, X. & Han, J., 2003. CloseGraph: mining closed frequent graph patterns. Washington, s.n.