Leon Derczynski - University of Sheffield,
Kenny Meesters - TU Delft, Kalina Bontcheva - University of Sheffield, Diana Maynard- University of Sheffield
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
The Design of an Online Social Network Site for Emergency Management: A One-S...guest636475b
Web 2.0 is creating new opportunities for communication and collaboration. Part of this explosion is the increase in popularity and use of Social Network Sites (SNSs) for general and domain-specific use. In the emergency domain there are a number of websites, wikis, SNSs, etc. but they stand as silos in the field, unable to allow for cross-site collaboration. In this paper we describe ongoing design science research to develop and refine guiding principles for developing an SNS that will bring together emergency domain professionals in a “one-stop-shop.” We surveyed emergency professionals who study crisis information systems, to ascertain potential functionalities of such an SNS. Preliminary results suggest that there is a need for the envisioned SNS. Future research will continue to explore possible solutions to issues addressed in this paper.
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...Amit Sheth
Keynote @ Exploitation of Social Media for Emergency Relief and Preparedness (SMERP)
Co-located with: The Web Conference 2018 (formerly WWW)
Lyon, France. 23 April 2018
Abstract:
Crises are imposing massive costs to economies worldwide. Natural disasters caused record $306 billion in damage to the U.S. in 2017! Real-time gathering of relevant data through ubiquitous presence of mobile technologies and the ability to disseminate them through social media has forever changed how disaster and health crisis monitoring and response are now carried out. Both tradition crisis response organization as well as temporary, informal, self-organized and community-based organizations have come to increasingly rely on social media. Furthermore, ability to collect, repurpose and reuse data from past events is helping with preparedness and planning for future events.
In this talk, I will review our extensive experience on (a) interactions with variety of stakeholders involved in emergency response at city, county, country and international levels, (b) research on real-time social media analysis spanning spatio-temporal-thematic; people-content-network; linguistic-sentiment-emotion-intent analysis dimensions, (c) development and use of crisis response specific tools (location identification, demand-supply match) and the comprehensive Twitris semantic social intelligence system (which is also commercialized as Cognovi Labs), and (d) a variety of real-world evaluations and real-time uses (e.g., supplying data for Google Crisis map during Uttarakhand Floods, rescue during Kashmir Floods, neighborhood image map during Chennai Floods, providing information to FEMA during Oklahoma tornados), spread of disease and epidemiology (e.g., Zika spread), metro-level multi-agency disaster preparedness exercise, etc.
https://www.cse.iitk.ac.in/users/kripa/smerp2018/SMERP-at-Web2018-keynote.pdf
Memo for the Danish Emergency Management Agency by student Anna Boye Koldaas, Master of Science (MSc)-student in Security Risk Management at Copenhagen University.
The Design of an Online Social Network Site for Emergency Management: A One-S...guest636475b
Web 2.0 is creating new opportunities for communication and collaboration. Part of this explosion is the increase in popularity and use of Social Network Sites (SNSs) for general and domain-specific use. In the emergency domain there are a number of websites, wikis, SNSs, etc. but they stand as silos in the field, unable to allow for cross-site collaboration. In this paper we describe ongoing design science research to develop and refine guiding principles for developing an SNS that will bring together emergency domain professionals in a “one-stop-shop.” We surveyed emergency professionals who study crisis information systems, to ascertain potential functionalities of such an SNS. Preliminary results suggest that there is a need for the envisioned SNS. Future research will continue to explore possible solutions to issues addressed in this paper.
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...Amit Sheth
Keynote @ Exploitation of Social Media for Emergency Relief and Preparedness (SMERP)
Co-located with: The Web Conference 2018 (formerly WWW)
Lyon, France. 23 April 2018
Abstract:
Crises are imposing massive costs to economies worldwide. Natural disasters caused record $306 billion in damage to the U.S. in 2017! Real-time gathering of relevant data through ubiquitous presence of mobile technologies and the ability to disseminate them through social media has forever changed how disaster and health crisis monitoring and response are now carried out. Both tradition crisis response organization as well as temporary, informal, self-organized and community-based organizations have come to increasingly rely on social media. Furthermore, ability to collect, repurpose and reuse data from past events is helping with preparedness and planning for future events.
In this talk, I will review our extensive experience on (a) interactions with variety of stakeholders involved in emergency response at city, county, country and international levels, (b) research on real-time social media analysis spanning spatio-temporal-thematic; people-content-network; linguistic-sentiment-emotion-intent analysis dimensions, (c) development and use of crisis response specific tools (location identification, demand-supply match) and the comprehensive Twitris semantic social intelligence system (which is also commercialized as Cognovi Labs), and (d) a variety of real-world evaluations and real-time uses (e.g., supplying data for Google Crisis map during Uttarakhand Floods, rescue during Kashmir Floods, neighborhood image map during Chennai Floods, providing information to FEMA during Oklahoma tornados), spread of disease and epidemiology (e.g., Zika spread), metro-level multi-agency disaster preparedness exercise, etc.
https://www.cse.iitk.ac.in/users/kripa/smerp2018/SMERP-at-Web2018-keynote.pdf
Memo for the Danish Emergency Management Agency by student Anna Boye Koldaas, Master of Science (MSc)-student in Security Risk Management at Copenhagen University.
From drones to old-fashioned phone calls, data come from many unlikely sources. In a disaster, such as a flood or earthquake, responders will take whatever information they can get to visualise the crisis and best direct their resources. Increasingly, cities prone to natural disasters are learning to better aid their citizens by empowering their local agencies and responders with sophisticated tools to cut through the large volume and velocity of disaster-related data and synthesise actionable information.
What Data Can Do: A Typology of Mechanisms
Angèle Christin .
International Journal of Communication > Vol 14 (2020) , de Angèle Christin del Departamento de Comunicación de Stanford University, USA titulado "What Data Can Do: A Typology of Mechanisms". Entre otras cosas es autora del libro "Metrics at Work.
Twitris in Action - a review of its many applications Amit Sheth
Twitris is a System for Collective Social Intelligence. It has been used in a large number of and many types (disaster coordination, banding, epidemiology, public health, election/polical, social movement) of applications - often in real-time. This presentation gives a bird-eye review of some of these applications with links to explore them further.
Expelling Information of Events from Critical Public Space using Social Senso...ijtsrd
Open foundation frameworks give a significant number of the administrations that are basic to the wellbeing, working, and security of society. A considerable lot of these frameworks, in any case, need persistent physical sensor checking to have the option to recognize disappointment occasions or harm that has struck these frameworks. We propose the utilization of social sensor enormous information to recognize these occasions. We center around two primary framework frameworks, transportation and vitality, and use information from Twitter streams to identify harm to spans, expressways, gas lines, and power foundation. Through a three step filtering approach and assignment to geographical cells, we are able to filter out noise in this data to produce relevant geo located tweets identifying failure events. Applying the strategy to real world data, we demonstrate the ability of our approach to utilize social sensor big data to detect damage and failure events in these critical public infrastructures. Samatha P. K | Dr. Mohamed Rafi "Expelling Information of Events from Critical Public Space using Social Sensor Big Data" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25350.pdfPaper URL: https://www.ijtsrd.com/engineering/computer-engineering/25350/expelling-information-of-events-from-critical-public-space-using-social-sensor-big-data/samatha-p-k
Web 2.0 Technology Building Situational Awareness: Free and Open Source Too...Connie White
covers ways to use web apps, smart phones and free disaster management software like Sahana Eden, which offer agencies free and open source tools to customize and build situational awareness for their own agency or organizational needs.
OCHA Think Brief - Hashtag Standards for emergenciesJan Husar
POLICY AND STUDIES SERIES
These short think pieces are non-papers that present
relatively new ideas that require testing and validation.
The objective of the Think Brief is to generate feedback,
views and advice. The Think Briefs do not necessarily
represent the official views of OCHA
Understanding Immunisation Awareness and Sentiment with Social Media - Projec...UN Global Pulse
This multi-country study aims to track and analyse online conversations related to immunisation on social media and mainstream media in India, Kenya, Nigeria and Pakistan. Findings from the study showed that in social media, Nigerian and Pakistani politicians are active and influential in the vaccination debate and the political dimension is often referred to when discussing the failure to eradicate diseases such as polio. However, in Kenya, religious and ideological aspects were more frequently discussed. Twitter activity is primarily driven by sharing of news stories in all countries whereas Facebook focuses on the 'distrust' and 'ideals' categorisation.
Cite as: UN Global Pulse, “Understanding Immunisation Awareness and Sentiment Through Social and Mainstream Media”, Global Pulse Project Series no. 19, 2015.
Understanding Online Socials Harm: Examples of Harassment and RadicalizationAmit Sheth
https://dbsec2019.cse.sc.edu/Keynote.html
Abstract: As social media permeates our daily life, there has been a sharp rise in the misuse of social media affecting our society in large. Specifically, harassment and radicalization have become two major problems on social media platforms with significant implications on the well-being of individuals as well as communities. A 2017 Pew Research survey on online harassment found that 66% of adult Internet users have observed online harassment and 41% have personally experienced it. Nearly 18% of Americans have faced severe forms of harassment online such as physical threats, harassment over a sustained period, sexual harassment or stalking. Moreover, malicious organizations (e.g., terrorist groups, white nationalists not classified legally as terrorists but as a group with extreme ideology) have been using social media for sharing their propaganda and misinformation to persuade individuals and eventually recruit them to propagate their ideology. These communications related to harassment and radicalization are complex concerning their language and contextual characteristics, making recognition of such narratives challenging for researchers as well as social media companies. As most of the existing approaches fail to capture fundamental nuances in the language of these communications, two prominent challenges have emerged: ambiguity and sparsity. Sole data level bottom-up analysis has been unsuccessful in revealing the actual meaning of the content. Considering the significant sensitivity of these problems and its implications at individual and community levels, a potential solution requires reliable algorithms for modeling such communications.
Our approach to understanding communications between source and target requires deciphering the unique language, semantic and contextual characteristics, including sentiment, emotion, and intention. This context-aware and knowledge-enhanced computational approach to the analysis of these narratives breaks down this long-running and complex process into contextual building blocks that acknowledge inherent ambiguity and sparsity. Based on prior empirical and qualitative research in social sciences, particularly cognitive psychology, and political science, we model this process using a combination of contextual dimensions -- e.g., for Islamist radicalization: religion, ideology, and hate -- each elucidating a degree of radicalization and highlighting independent features to render them computationally accessible.
Processing Social Media Messages in Mass Emergency: A SurveyMuhammad Imran
Millions of people use social media to share information during disasters and mass emergencies. Information available on social media, particularly in the early hours of an event when few other sources are available, can be extremely valuable for emergency responders and decision makers, helping them gain situational awareness and plan relief efforts. Processing social media content to obtain such information involves solving multiple challenges, including parsing brief and informal messages, handling information overload, and prioritizing different types of information. These challenges can be mapped to information processing operations such as filtering, classifying, ranking, aggregating, extracting, and summarizing. This work highlights these challenges and presents state of the art computational techniques to deal with social media messages, focusing on their application to crisis scenarios.
Concurrent Distractions: A Cross-Cultural Study of Media Multitasking BehaviorAJHSSR Journal
ABSTRACT : As media usage continues to increase on a global scale, fueled by the proliferation of mobile
devices, this facilitates the effortless behavior of mediamultitasking. This paradigm shift in the way in which
media is consumed presents fundamental challenges for the domains of Human-Computer Interaction (HCI),
education, psychology, and commerce. This technological shift introduces a new dimension that is needed when
attempting to understand user interaction related to both the devices themselves andthe digital platforms
accessed. This study begins a process of developing an understanding of cross-cultural media multitasking
habits through a survey of a large group of experimental participants. In this study, participants from two
different countries were surveyed. The countries used in this study were the USA and Portugal. This research
provides valuable insights into theincreasingly common phenomenon of media multitasking and the similarities
and differences between cultures when users are engaging in this activity. This study contributes to previous
research in the realm of media multitasking by expanding on foundational knowledge on a global scale setting
the stage for more detailed research on predictors, outcomes, and habits of global media multitasking.
Data Science For Social Good: Tackling the Challenge of HomelessnessAnita Luthra
A talk presented at the Champions Leadership Conference Series - leveraging data provided by New York City’s Department of Homeless Services, software vendor Tibco partnered with SumAll.Org to help tackle the societal challenge of homelessness in New York City.
Quantified Self Ideology: Personal Data becomes Big DataMelanie Swan
A key contemporary trend emerging in big data science is the quantified self: individuals engaged in the deliberate self-tracking of any kind of biological, physical, behavioral, or transactional information, as n=1 individuals or in groups. The quantified self is one dimension of the bigger trend to integrate and apply a variety of personal information streams including big health data (genome, transcriptome, environmentome, diseasome), quantified self data streams (biosensor, fitness, sleep, food, mood, heart rate, glucose tracking, etc.), traditional data streams (personal and family health history, prescription history) and IOT (Internet of things) activity data streams (smart home, smart car, environmental sensors, community data). This talk looks at how personal data and group data are becoming big data as individuals and communities share, collaborate, and work with large personalized data sets using novel discovery methods such as anomaly detection and exception reporting, longitudinal baseline analysis, episodic triggers, and hierarchical machine learning.
Public Health Crisis Analytics for Gender ViolenceHemant Purohit
Research-progress talk on the use of data analytics methods for one of the major public health crisis in the world Gender-based Violence and the campaign engagement in the initiatives of Non-profit organizations.
"Big Data for Development: Opportunities and Challenges" UN Global Pulse
This White Paper is the culmination of UN Global Pulse’s research, collaborations, and consultations with experts to begin a dialogue around Big Data for Development. See: http://www.unglobalpulse.org/BigDataforDevWhitePaper
V Międzynarodowa Konferencja Naukowa Nauka o informacji (informacja naukowa) w okresie zmian Innowacyjne usługi informacyjne. Wydział Dziennikarstwa, Informacji i Bibliologii Katedra Informatologii, Uniwersytet Warszawski, Warszawa, 15 – 16 maja 2017
A Framework to Identify Best Practices: Social Media and Web 2.0 Technologies...Connie White
Social media is used in a variety of domains, including emergency management. However, the question of which technologies are most appropriate for a given emergency remains open. We present a framework of dimensions of emergencies that can assist in selecting appropriate social media for an emergency situation. Social media is not a panacea but can be used effectively given the proper functions available from the particular services provided by each of the Web 2.0 technologies available. The main objective of this paper is to identify the best practices for social media to leverage its ability given the complexities that coincide with events. This is a conceptual paper based on the results of preliminary studies involving group interactions with emergency professionals with various backgrounds. In addition, emergency management students who are professionals in the field followed by another interview soliciting information from information systems scientist were surveyed. We found that each situation called forth various dimensions where only sub phases of the stated dimension may be used given the task type derived from the event characteristics. This lays a foundation upon which a more formal approach can be taken to help tame the social media mania into a manageable set of ‘best practices’ from which emergencies can be managed more effectively given Web 2.0 technologies and social collaborative online tools.
From drones to old-fashioned phone calls, data come from many unlikely sources. In a disaster, such as a flood or earthquake, responders will take whatever information they can get to visualise the crisis and best direct their resources. Increasingly, cities prone to natural disasters are learning to better aid their citizens by empowering their local agencies and responders with sophisticated tools to cut through the large volume and velocity of disaster-related data and synthesise actionable information.
What Data Can Do: A Typology of Mechanisms
Angèle Christin .
International Journal of Communication > Vol 14 (2020) , de Angèle Christin del Departamento de Comunicación de Stanford University, USA titulado "What Data Can Do: A Typology of Mechanisms". Entre otras cosas es autora del libro "Metrics at Work.
Twitris in Action - a review of its many applications Amit Sheth
Twitris is a System for Collective Social Intelligence. It has been used in a large number of and many types (disaster coordination, banding, epidemiology, public health, election/polical, social movement) of applications - often in real-time. This presentation gives a bird-eye review of some of these applications with links to explore them further.
Expelling Information of Events from Critical Public Space using Social Senso...ijtsrd
Open foundation frameworks give a significant number of the administrations that are basic to the wellbeing, working, and security of society. A considerable lot of these frameworks, in any case, need persistent physical sensor checking to have the option to recognize disappointment occasions or harm that has struck these frameworks. We propose the utilization of social sensor enormous information to recognize these occasions. We center around two primary framework frameworks, transportation and vitality, and use information from Twitter streams to identify harm to spans, expressways, gas lines, and power foundation. Through a three step filtering approach and assignment to geographical cells, we are able to filter out noise in this data to produce relevant geo located tweets identifying failure events. Applying the strategy to real world data, we demonstrate the ability of our approach to utilize social sensor big data to detect damage and failure events in these critical public infrastructures. Samatha P. K | Dr. Mohamed Rafi "Expelling Information of Events from Critical Public Space using Social Sensor Big Data" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd25350.pdfPaper URL: https://www.ijtsrd.com/engineering/computer-engineering/25350/expelling-information-of-events-from-critical-public-space-using-social-sensor-big-data/samatha-p-k
Web 2.0 Technology Building Situational Awareness: Free and Open Source Too...Connie White
covers ways to use web apps, smart phones and free disaster management software like Sahana Eden, which offer agencies free and open source tools to customize and build situational awareness for their own agency or organizational needs.
OCHA Think Brief - Hashtag Standards for emergenciesJan Husar
POLICY AND STUDIES SERIES
These short think pieces are non-papers that present
relatively new ideas that require testing and validation.
The objective of the Think Brief is to generate feedback,
views and advice. The Think Briefs do not necessarily
represent the official views of OCHA
Understanding Immunisation Awareness and Sentiment with Social Media - Projec...UN Global Pulse
This multi-country study aims to track and analyse online conversations related to immunisation on social media and mainstream media in India, Kenya, Nigeria and Pakistan. Findings from the study showed that in social media, Nigerian and Pakistani politicians are active and influential in the vaccination debate and the political dimension is often referred to when discussing the failure to eradicate diseases such as polio. However, in Kenya, religious and ideological aspects were more frequently discussed. Twitter activity is primarily driven by sharing of news stories in all countries whereas Facebook focuses on the 'distrust' and 'ideals' categorisation.
Cite as: UN Global Pulse, “Understanding Immunisation Awareness and Sentiment Through Social and Mainstream Media”, Global Pulse Project Series no. 19, 2015.
Understanding Online Socials Harm: Examples of Harassment and RadicalizationAmit Sheth
https://dbsec2019.cse.sc.edu/Keynote.html
Abstract: As social media permeates our daily life, there has been a sharp rise in the misuse of social media affecting our society in large. Specifically, harassment and radicalization have become two major problems on social media platforms with significant implications on the well-being of individuals as well as communities. A 2017 Pew Research survey on online harassment found that 66% of adult Internet users have observed online harassment and 41% have personally experienced it. Nearly 18% of Americans have faced severe forms of harassment online such as physical threats, harassment over a sustained period, sexual harassment or stalking. Moreover, malicious organizations (e.g., terrorist groups, white nationalists not classified legally as terrorists but as a group with extreme ideology) have been using social media for sharing their propaganda and misinformation to persuade individuals and eventually recruit them to propagate their ideology. These communications related to harassment and radicalization are complex concerning their language and contextual characteristics, making recognition of such narratives challenging for researchers as well as social media companies. As most of the existing approaches fail to capture fundamental nuances in the language of these communications, two prominent challenges have emerged: ambiguity and sparsity. Sole data level bottom-up analysis has been unsuccessful in revealing the actual meaning of the content. Considering the significant sensitivity of these problems and its implications at individual and community levels, a potential solution requires reliable algorithms for modeling such communications.
Our approach to understanding communications between source and target requires deciphering the unique language, semantic and contextual characteristics, including sentiment, emotion, and intention. This context-aware and knowledge-enhanced computational approach to the analysis of these narratives breaks down this long-running and complex process into contextual building blocks that acknowledge inherent ambiguity and sparsity. Based on prior empirical and qualitative research in social sciences, particularly cognitive psychology, and political science, we model this process using a combination of contextual dimensions -- e.g., for Islamist radicalization: religion, ideology, and hate -- each elucidating a degree of radicalization and highlighting independent features to render them computationally accessible.
Processing Social Media Messages in Mass Emergency: A SurveyMuhammad Imran
Millions of people use social media to share information during disasters and mass emergencies. Information available on social media, particularly in the early hours of an event when few other sources are available, can be extremely valuable for emergency responders and decision makers, helping them gain situational awareness and plan relief efforts. Processing social media content to obtain such information involves solving multiple challenges, including parsing brief and informal messages, handling information overload, and prioritizing different types of information. These challenges can be mapped to information processing operations such as filtering, classifying, ranking, aggregating, extracting, and summarizing. This work highlights these challenges and presents state of the art computational techniques to deal with social media messages, focusing on their application to crisis scenarios.
Concurrent Distractions: A Cross-Cultural Study of Media Multitasking BehaviorAJHSSR Journal
ABSTRACT : As media usage continues to increase on a global scale, fueled by the proliferation of mobile
devices, this facilitates the effortless behavior of mediamultitasking. This paradigm shift in the way in which
media is consumed presents fundamental challenges for the domains of Human-Computer Interaction (HCI),
education, psychology, and commerce. This technological shift introduces a new dimension that is needed when
attempting to understand user interaction related to both the devices themselves andthe digital platforms
accessed. This study begins a process of developing an understanding of cross-cultural media multitasking
habits through a survey of a large group of experimental participants. In this study, participants from two
different countries were surveyed. The countries used in this study were the USA and Portugal. This research
provides valuable insights into theincreasingly common phenomenon of media multitasking and the similarities
and differences between cultures when users are engaging in this activity. This study contributes to previous
research in the realm of media multitasking by expanding on foundational knowledge on a global scale setting
the stage for more detailed research on predictors, outcomes, and habits of global media multitasking.
Data Science For Social Good: Tackling the Challenge of HomelessnessAnita Luthra
A talk presented at the Champions Leadership Conference Series - leveraging data provided by New York City’s Department of Homeless Services, software vendor Tibco partnered with SumAll.Org to help tackle the societal challenge of homelessness in New York City.
Quantified Self Ideology: Personal Data becomes Big DataMelanie Swan
A key contemporary trend emerging in big data science is the quantified self: individuals engaged in the deliberate self-tracking of any kind of biological, physical, behavioral, or transactional information, as n=1 individuals or in groups. The quantified self is one dimension of the bigger trend to integrate and apply a variety of personal information streams including big health data (genome, transcriptome, environmentome, diseasome), quantified self data streams (biosensor, fitness, sleep, food, mood, heart rate, glucose tracking, etc.), traditional data streams (personal and family health history, prescription history) and IOT (Internet of things) activity data streams (smart home, smart car, environmental sensors, community data). This talk looks at how personal data and group data are becoming big data as individuals and communities share, collaborate, and work with large personalized data sets using novel discovery methods such as anomaly detection and exception reporting, longitudinal baseline analysis, episodic triggers, and hierarchical machine learning.
Public Health Crisis Analytics for Gender ViolenceHemant Purohit
Research-progress talk on the use of data analytics methods for one of the major public health crisis in the world Gender-based Violence and the campaign engagement in the initiatives of Non-profit organizations.
"Big Data for Development: Opportunities and Challenges" UN Global Pulse
This White Paper is the culmination of UN Global Pulse’s research, collaborations, and consultations with experts to begin a dialogue around Big Data for Development. See: http://www.unglobalpulse.org/BigDataforDevWhitePaper
V Międzynarodowa Konferencja Naukowa Nauka o informacji (informacja naukowa) w okresie zmian Innowacyjne usługi informacyjne. Wydział Dziennikarstwa, Informacji i Bibliologii Katedra Informatologii, Uniwersytet Warszawski, Warszawa, 15 – 16 maja 2017
A Framework to Identify Best Practices: Social Media and Web 2.0 Technologies...Connie White
Social media is used in a variety of domains, including emergency management. However, the question of which technologies are most appropriate for a given emergency remains open. We present a framework of dimensions of emergencies that can assist in selecting appropriate social media for an emergency situation. Social media is not a panacea but can be used effectively given the proper functions available from the particular services provided by each of the Web 2.0 technologies available. The main objective of this paper is to identify the best practices for social media to leverage its ability given the complexities that coincide with events. This is a conceptual paper based on the results of preliminary studies involving group interactions with emergency professionals with various backgrounds. In addition, emergency management students who are professionals in the field followed by another interview soliciting information from information systems scientist were surveyed. We found that each situation called forth various dimensions where only sub phases of the stated dimension may be used given the task type derived from the event characteristics. This lays a foundation upon which a more formal approach can be taken to help tame the social media mania into a manageable set of ‘best practices’ from which emergencies can be managed more effectively given Web 2.0 technologies and social collaborative online tools.
On Semantics and Deep Learning for Event Detection in Crisis SituationsCOMRADES project
In this paper, we introduce Dual-CNN, a semantically-enhanced deep learning model to target the problem of event detection in crisis situations from
social media data. A layer of semantics is added to a traditional Convolutional Neural Network (CNN) model to capture the contextual information that is generally scarce in short, ill-formed social media messages. Our results show that
our methods are able to successfully identify the existence of events, and event types (hurricane, floods, etc.) accurately (> 79% F-measure), but the performance of the model significantly drops (61% F-measure) when identifying fine-grained event-related information (affected individuals, damaged infrastructures, etc.).
These results are competitive with more traditional Machine Learning models, such as SVM.
http://oro.open.ac.uk/49639/1/event_detection.pdf
Meliorating usable document density for online event detectionIJICTJOURNAL
Online event detection (OED) has seen a rise in the research community as it can provide quick identification of possible events happening at times in the world. Through these systems, potential events can be indicated well before they are reported by the news media, by grouping similar documents shared over social media by users. Most OED systems use textual similarities for this purpose. Similar documents, that may indicate a potential event, are further strengthened by the replies made by other users, thereby improving the potentiality of the group. However, these documents are at times unusable as independent documents, as they may replace previously appeared noun phrases with pronouns, leading OED systems to fail while grouping these replies to their suitable clusters. In this paper, a pronoun resolution system that tries to replace pronouns with relevant nouns over social media data is proposed. Results show significant improvement in performance using the proposed system.
A Holistic Approach to Evaluating Social Media's Successful Implementation in...Connie White
As emergency management agencies and organizations implement social media and web technology to support crisis information and communication efforts, many question if present strategies are beneficial. This is especially true if social media is being implemented for the first time or has not been experienced in a live disaster. Studies have been conducted providing information on a variety of interactions between Social Media and Emergency Management (SMEM). However, few have taken a formal scientific approach as a means of measurement providing a 'Comprehensive Performance Metric.' Performance metrics need to have consistency while providing room for implementing unique measurement criteria for individualized efforts. We offer a research design using field studies of real world cases, evaluating rural and metropolitan areas. The result produces a set of 'Best Practices' through implementation. By offering a means of measuring success, SMEM can continue to evolve by using a methodologically sound approach using social media.
Classifying Crises-Information Relevancy with SemanticsCOMRADES project
Prashant Khare, Gregoire Burel, and Harith Alani
Knowledge Media Institute, The Open University, United Kingdom
fprashant.khare,g.burel,h.alanig@open.ac.uk
Social Media, Crisis Communication and Emergency Management: Leveraging Web 2...Connie White
Detailing guidelines and safe practices for using social media across a range of emergency management applications‚ Social Media, Crisis Communication, and Emergency Management: Leveraging Web 2.0 Technologies supplies cutting-edge methods to help you inform the public‚ reduce information overload‚ and ultimately‚ save more lives.
Introduces collaborative mapping tools that can be customized to your needs
Explores free and open-source disaster management systems‚ such as Sahana and Ushahidi
Covers freely available social media technologies—including Facebook‚ Twitter‚ and YouTube
Emergency Management in the age of social convergencePatrice Cloutier
Conference on social media use in emergency management given at the Social Media in Government Conference on Oct. 3, 2011 for the Conference Board of Canada.
Proceedings of the 5th International ISCRAM Conference – WashiDaliaCulbertson719
Proceedings of the 5th International ISCRAM Conference – Washington, DC, USA, May 2008
F. Fiedrich and B. Van de Walle, eds.
Backchannels on the Front Lines:
Emergent Uses of Social Media in the
2007 Southern California Wildfires
Jeannette Sutton1, Leysia Palen1 & Irina Shklovski2
University of Colorado, Boulder1 University of California, Irvine2
[email protected], [email protected], [email protected]
ABSTRACT
Opportunities for participation by members of the public are expanding the information arena of disaster. Social
media supports “backchannel” communications, allowing for wide-scale interaction that can be collectively
resourceful, self-policing, and generative of information that is otherwise hard to obtain. Results from our study of
information practices by members of the public during the October 2007 Southern California Wildfires suggest that
community information resources and other backchannel communications activity enabled by social media are gaining
prominence in the disaster arena, despite concern by officials about the legitimacy of information shared through such
means. We argue that these emergent uses of social media are pre-cursors of broader future changes to the institutional
and organizational arrangements of disaster response.
Keywords
Crisis Informatics, disaster, information and communication technology, wildfire
INTRODUCTION
Disaster situations are non-routine events that result in non-routine behaviors. In times of disaster, people and
organizations adapt and improvise (Wachtendorf, 2004) to suit the conditions as needs demand. Even emergency
response organizations—which are strongly organized around locally- and federally-mandated protocols—adapt to
accommodate the situation particulars for warning, rescue, and recovery. Indeed, in the US, the organizational
structure that is activated during times of crisis is designed to be internally flexible. However, its ability to be
externally flexible when interfacing with the public is in doubt (Wenger, 1990; Buck, et al, 2006; Palen and Liu,
2007). Members of the public are known by sociologists to improvise in disaster situations, and are responsible for
leading important rescue and relief activities (Tierney, et al. 2001; Kendra and Wachtendorf, 2003; Palen and Liu,
2007). They leverage their own social networks to find and provide information outside the official response effort,
and to make critical decisions about, for example, heeding warning and making plans to evacuate (Mileti, et al., 2006).
These facts are often ignored during local and federal disaster management planning and policy implementation, with
the focus almost entirely on the role of the official response and their management of public-side activities. This
stance places public peer-to-peer communications as “backchannel” activity that does not have full legitimacy in the
information arena of disaster. However, the increasing presence of info ...
Mapping the Risk-Utility Landscape of Mobile Data for Sustainable Development...UN Global Pulse
The goal of this project was to determine how insights from mobile data might be used to maximum effect in support of policy planning and crisis response with minimal risk to privacy. The project aimed to determine the impact that aggregating mobile data to protect privacy has upon the utility of the data for (i) transportation planning and (ii) pandemic control and prevention. Utility of each data set was evaluated by surveying transportation experts and epidemiologists; re-identification risk for each data set was also assessed. Risk of re-identification was subsequently considered together with data utility to determine which level of aggregation is the minimum required to adequately protect individual privacy while preserving its value for policy planning and crisis response.
Cite as: “Mapping the Risk-Utility Landscape: Mobile Data for Sustainable Development & Humanitarian Action”, Global Pulse Project Series no. 18, 2015.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847. This material reflects only the author's view and the European Commission is not responsible for any use that may be made of the information it contains.
Evaluating Platforms for Community Sensemaking: Using the Case of the Kenyan ...COMRADES project
Vittorio Nespeca
TU Delft
V.Nespeca@tudelft.nl
Kenny Meesters
TU Delft
K.J.M.G.Meesters@tudelft.nl
Tina Comes
TU Delft
T.Comes@tudelft.nl
WiPe Paper – T12 - Designing for Resilience
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
https://www.researchgate.net/publication/324162897_Evaluating_Platforms_for_Community_Sensemaking_Using_the_Case_of_the_Kenyan_Elections_Vittorio_Nespeca
An Extensible Multilingual Open Source LemmatizerCOMRADES project
Ahmet Aker and Johann Petraka and Firas Sabbahb
Department of Computer Science, University of Sheffield
Department of Information Engineering, University of Duisburg-Essen
a.aker@is.inf.uni-due.de, johann.petrak@sheffield.ac.uk
firas.sabbah@stud.uni-due.de
D6.2 First report on Communication and Dissemination activitiesCOMRADES project
COMRADES project was launched in January 2016 with a lifetime of 36 months and it aims to empower communities with intelligent socio-technical solutions to help them reconnect, respond to, and recover from crisis situations.
COMRADES consortium will build a next generation, intelligent resilience platform to provide high socio-technical innovation and to support community resilience in crises situations. The platform will capture and process in real-time, multilingual social information streams from distributed communities, for the purpose of identifying, aggregating, and verifying reported events at the citizen and community levels. Resilience frameworks, guidelines and best practices will be embedded into the platform design and functionality, enriched with open datasets and open source software.
The main objectives of the project is to foster social innovation during crises for safeguarding communities during critical scenarios from inaccurate, distrusted, and overhyped information, and for raising citizen and community awareness of crisis situations by providing them with filtered, validated, enriched, high quality, and actionable knowledge. Community decision-making will be assisted by automated methods for real-time, intelligent processing and linking of crowdsourced crisis information.
This document forms deliverable D6.2 “First report on communication and dissemination activities”. It outlines the dissemination and communication objectives and strategy of the reporting period and focuses on the tools and activities that were undertaken to accomplish the objectives set. The deliverable reports on dissemination tools (website, social media, press releases, newsletter issues, brochures, etc.) used from M1 to M12 to disseminate the project implementing the online and offline dissemination strategy D6.1 deliverable in M6. Also, it presents the dissemination activities that have been implemented by the partners and are foreseen in the Description of Action for WP6. It will be updated yearly during the whole du project.
It is based on, and is consistent with, the DoA and the CA, but is not a substitute for reading these documents.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847
http://www.comrades-project.eu/outputs/deliverables/82-deliverables/46-d6-2-first-report-on-communication-and-dissemination-activities.html
This report describes the tools developed for multilingual text processing of social media. It gives details about the linguistic approaches used, the scope of the tools, and some results of performance evaluation. WP3 focuses on developing methods to detect relevant and informative posts, as well as content with clear validity, so that these can be dealt with efficiently without the clutter of uninformative or irrelevant posts obscuring the important facts. In order to achieve these objectives, low-level linguistic processing components are first required in order to generate lexical, syntactic and semantic features of the text required by the informativeness and trustworthiness components to be developed later in the project. These tools are also required by components developed in WP4 for the detection of emergency events, modelling and matchmaking. They take as input social media and other kinds of text-based messages and posts, and produce as output additional information about the language of the message, the named entities, and syntactic and semantic information.
In this report, we first describe the suite of tools we have developed for Information Extraction from social media for English, French and German. While English is the main language of messages dealt with by the tools in this project, it is very useful to be able to both recognise and deal with messages in other languages. French and German are therefore used as examples to show the adaptability of our tools to other languages, and our multilingual components thus serve as a testbed for new language adaptation techniques with which we have experimented during the project.
Various aspects of these tools are evaluated for accuracy. Second, we describe the tools we have developed for entity disambiguation and linking from social media, for English, French and German. These ensure not only that we extract relevant instances of locations, names of people and organisations, but that we know which particular instance we are talking about since these names may potentially refer to different things. By linking to a semantic knowledge base, we ensure both disambiguation and also that we have additional knowledge (for example, the coordinates of a location). The tools are evaluated for accuracy as well as speed, since traditionally these techniques are extremely slow and cumbersome to use in real world scenarios. The tools are all made available as GATE Cloud services.
http://www.comrades-project.eu/outputs/deliverables/82-deliverables/43-d3-1-multilingual-content-processing-methods.html
D4.1 Enriched Semantic Models of Emergency EventsCOMRADES project
COMRADES (Collective Platform for Community Resilience and Social Innovation during Crises, www.comrades-project.eu) aims to empower communities with intelligent socio-technical solutions to help them reconnect, respond to, and recover from crisis situations.
This deliverable analyses the COMRADES requirements from different project perspectives in order to design and implement a common semantic model that represents micro emergency events and related metadata. In particular we analyse: 1) the data structures used by the Ushahidi platform since it is used as the underlying platform of the COMRADES system; 2) the requirements for the tools that need to be integrated into COMRADES platform; 3) stakeholder interviews, and; 4) the structure of crisis related datasets.
Based on the NeOn methodology [1] and a qualitative and structural design approach [2], we created an Ontology Requirement Specification Document (ORSD) [3] that highlights the needs and specifies the competency questions that the model needs to address in order to comply with the COMRADES model requirements.
Following the development of the ORSD, we implement the COMRADES model as an ontology using RDF/OWL. In order to allow the usage of the ontology in multilingual scenarios we translate the classes, properties and relation names to different languages. Finally, for improving the interoperability of the model with existing ontological models we align some part of the COMRADES ontology with well-known ontologies such as SIOC and FOAF.
Although we cannot completely evaluate the ontological model since some data is not yet available for the model (i.e. the COMRADES platform is not yet fully developed), we show that the model can successfully represent 102 different competency questions.
http://www.comrades-project.eu/outputs/deliverables/82-deliverables/44-d4-1-enriched-semantic-models-of-emergency-events.html
SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for ...COMRADES project
Leon Derczynski and Kalina Bontcheva and Maria Liakata and Rob Procter and Geraldine Wong Sak Hoi and Arkaitz Zubiaga
Media is full of false claims. Even Oxford Dictionaries named “post-truth” as the word of 2016. This makes it more important than ever to build systems that can identify the veracity of a story, and the nature of the discourse around it.RumourEval is a SemEval shared task that aims to identify and handle rumours and reactions to them, in text. We present an annotation scheme, a large dataset covering multiple topics – each having their own families of claims and replies – and use these to pose two concrete challenges as well as the results achieved by participants on these challenges.
http://www.derczynski.com/sheffield/papers/rumoureval-task.pdf
Prospecting Socially-Aware Concepts and Artefacts for Designing for Community...COMRADES project
Defining flexible and consistent methods and artefacts to design for social impact is a current challenge for HCI. The ephemeral and vulnerable conditions of people living as refugees add even more questions about the suitability of design methods to the complexity of real — and many times tough — life . In this position paper we briefly introduce two concepts embraced by the Socially-aware Design Approach, the Semiotic Onion and the Basic Block of Culture. We then reflect about the potential contributions of applying these concepts and artefacts to inform design for boosting community resilience of people living as refugees.
http://oro.open.ac.uk/49641/1/prospecting-socially-aware.pdf
Behind the Scenes of Scenario-Based Training: Understanding Scenario Design a...COMRADES project
WiPe Paper – Prevention and Preparation
Proceedings of the 14th ISCRAM Conference – Albi, France, May 2017
Tina Comes, Frédérick Bénaben, Chihab Hanachi, Matthieu Lauras, Aurélie Montarnal, eds.
http://idl.iscram.org/files/nadiasaadnoori/2017/1525_NadiaSaadNoori_etal2017.pdf
Sustainable Performance Measurement for Humanitarian Supply Chain Operations COMRADES project
WiPe Paper – Logistics and Supply-Chain Proceedings of the 14th ISCRAM Conference – Albi, France, May 2017 Tina Comes, Frédérick Bénaben, Chihab Hanachi, Matthieu Lauras, Aurélie Montarnal, eds.
http://idl.iscram.org/files/lauralagunasalvado/2017/1510_LauraLagunaSalvado_etal2017.pd
Detecting Important Life Events on Twitter Using Frequent Semantic and Syntac...COMRADES project
Dickinson, Thomas; Fernandez, Miriam; Thomas, Lisa; Mulholland, Paul; Briggs, Pam and Alani, Harith (2016). Detecting Important Life Events on Twitter Using Frequent Semantic and Syntactic Subgraphs. IADIS International Journal on WWW/Internet, 14(2) pp. 23–37.
http://oro.open.ac.uk/48678/
DoRES — A Three-tier Ontology for Modelling Crises in the Digital AgeCOMRADES project
Burel, Gregoire; Piccolo, Lara S. G.; Meesters, Kenny and Alani, Harith (2017). DoRES — A Three-tier Ontology for Modelling Crises in the Digital Age. In: ISCRAM 2017 Conference Proceedings, (in press).
http://oro.open.ac.uk/49285/
D2.1 Requirements for boosting community resilience in crisis situationCOMRADES project
COMRADES (Collective Platform for Community Resilience and Social Innovation during Crises, www.comrades-project.eu) aims to empower communities with intelligent socio-technical solutions to help them reconnect, respond to, and recover from crisis situations.
This deliverable reviews theories, conceptual frameworks, standards (e.g., BS 11200 and ISO 22320-2011) and indicators of community resilience. It analyses relevant use cases and reports of resilience around various type of crises.
These assessments enable the project team to present a definition of community resilience that is tailored specifically to the needs and aims of the COMRADES project, which focuses on the role of information and technology as a driver of community resilience.
The project’s stance to improve resilience can therefore be summarized by:
Continuously enhancing community resilience through ICT, instead of focusing on specific shocks or disruptive events.
Enabling a broad range of actors to acquire a relevant, consistent and coherent understanding of a stressing situation.
Empower decision makers and trigger community engagement on response and recovery efforts, including long-term mitigation and preparation.
The deliverable uses this definition and the project’s overall understanding of community resilience, as presented in the review sections, to specify initial design and functional requirements for adopting and integrating resilience procedures and methodologies into the COMRADES collective resilience platform. At the same time, we develop an evaluation framework that enables us to measure the contribution of the COMRADES resilience platform to building community resilience.
Finally, we outline three prototypical crisis scenarios that will enable the project development, test and evaluation. As such, the deliverable informs the design of the community workshops and interviews planned that will be conducted in T2.2 and T2.3, as well as the work of technology development, particularly in WP5.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847
OMRADES is creating an open‐source, community resilience platform, designed by communities, for communities, to help them reconnect, respond to, and recover from crisis situations. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847 This material reflects only the authors view and the European Commission is not responsible for any use that may be made of the information it contains
OMRADES is creating an open‐source, community resilience platform, designed by communities, for communities, to help them reconnect, respond to, and recover from crisis situations. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847 This material reflects only the authors view and the European Commission is not responsible for any use that may be made of the information it contains
OMRADES is creating an open‐source, community resilience platform, designed by communities, for communities, to help them reconnect, respond to, and recover from crisis situations. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847 This material reflects only the authors view and the European Commission is not responsible for any use that may be made of the information it contains
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Helping Crisis Responders Find the Informative Needle in the Tweet Haystack
1. Leon Derczynski et al. Finding Informative Tweets
Helping Crisis Responders Find
the Informative Needle
in the Tweet Haystack
Leon Derczynski∗
University of Sheffield
Kenny Meesters
TU Delft
Kalina Bontcheva
University of Sheffield
Diana Maynard
University of Sheffield
v1.1.0
2017/11/14
ABSTRACT
Crisis responders are increasingly using social media, data and other digital sources of information to build a
situational understanding of a crisis situation in order to design an effective response. However with the increased
availability of such data, the challenge of identifying relevant information from it also increases. This paper presents
a successful automatic approach to handling this problem. Messages are filtered for informativeness based on a
definition of the concept drawn from prior research and crisis response experts. Informative messages are tagged
for actionable data – for example, people in need, threats to rescue efforts, changes in environment, and so on.
In all, eight categories of actionability are identified. The two components – informativeness and actionability
classification – are packaged together as an openly-available tool called Emina (Emergent Informativeness and
Actionability).
Keywords
informativeness, twitter, social media, actionability, information filtering
INTRODUCTION
A key aspect of dealing with the aftermath of disasters and critical events is to get a quick situational understanding
of the situation. This situational understanding enables responders to not only understand the gravity of the situation
and the needs of the affected community, but also the operational circumstances and the availability of resources.
However, during a crisis situation getting a coherent understanding of the situation can prove challenging, due to
its chaotic nature. Therefore crisis responders are continuously looking for key pieces of information to build,
implement and improve their response (Harrald 2006).
In recent years, the advent of information technology has provided crisis responders with new tools and systems to
collect information directly from the communities affected by crisis. At the same time, with the rise of social media
and the profusion of mobile technology the same communities are also increasingly empowered – and at times
incentivized – to share their data through an increased number of channels. These developments provide crisis
responders with access to more data than ever before, directly from the communities affected by the crisis (S. Vieweg
et al. 2010).
However, this data surge has given rise to new challenges. Where before the key challenges were to gather sufficient
data to build a situational understanding, the main challenges now are to identify the relevant messages (T. Comes,
Meesters, et al. 2017). While many responding agencies have confirmed the usefulness of social media and other
forms of electronic information gathering, it remains challenging to integrate an effective workflow in short-burning
∗corresponding author
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Kees Boersma and Brian Tomaszewski, eds.
arXiv:1801.09633v1[cs.CL]29Jan2018
2. Leon Derczynski et al. Finding Informative Tweets
crisis situations (Meesters et al. 2016). Currently, organizations often rely on external volunteers, such as the
Standby Task Force in the wake of the 2010 Haiti Earthquake (Patrick et al. 2011; Rencoret et al. 2010). However,
not all organizations have the capacities or knowledge to process the large amounts of data available to them in the
search for relevant messages.
Information filtering is critical to crisis workers dealing with streams of information related to emerging and ongoing
situations (S. E. Vieweg 2012; Temnikova et al. 2015). Today, messages in many forms arrive constantly in volumes
too large for humans to effectively prioritize in real-time. In this research, we aim to provide automated methods for
filtering this information so that crisis workers can effectively find potentially important messages and perform
information triage.
Crisis worker flow typically begins with defining selectors that determine what information is retrieved; these could
be keywords, particular sources, GPS coordinates, and so on. For example, to get content about hurricane Irma,
one could capture just messages that contain “irma" and originate within the monitored area. This is the first pass
at reducing the volume of information, but even with this, the volume of information balloons rapidly as events
progress; data for Typhoon Yolanda in 2013 goes from 182 tweets in the first three days to 1778 on the fifth day and
4385 a week later; the west Texas explosion generated 10K tweets on the first day, going from 128 tweets in hour 1
to 2456 in hour 4.1 And indeed these selectors do not guarantee that the incoming data is good; many messages can
be irrelevant (Homberg et al. 2014).
These challenges are however not only related to natural disasters, but also extend to other disruptions and incidents
faced by communities around the world. For example, in a recent iHub project exercise monitoring elections in
Kenya, workers found roughly 90% of the messages captured in the message monitoring Ushahidi platform were
irrelevant2 – not only due to their content, but also due to missing content, repetitive messages, and so on. In this
case the messages were deliberately solicited: people were encouraged to post their own observations (via various
channels). Despite this ’top-down’ approach, the collected data still resulted in the high percentage of irrelevant
messages. In sudden onset disasters, the number of messages and noise is potentially much higher. Methods to ease
the impact of high data volume are therefore still pressing.
These developments and trends show that a tool to reduce the amount of data seen by humans and filter to just
the relevant messages is not only necessary but would improve the usefulness of the data available to responders.
The development of auto-assessment of informativeness is the first challenge that needs to be addressed towards a
system of automated identification of relevant messages for crisis responders.
In this paper, we describe the openly-available Emina (Emergent Informativeness and Actionability) tool, which we
have developed to address these issues as part of the EU COMRADES project.3
DEVELOPMENT APPROACH
Our approach began by developing a definition for informativeness. With this, we can gather and annotate data for
its informativeness, using a crowdsourcing task, where readily-available workers are drawn from a pool over the
internet to quickly process the kinds of tasks that require human intelligence. This produces our training set. This
training set is used to train a machine learning system, which learns mathematical rules about the text in the training
set that help it predict the informativeness of future, previously-unseen messages, mimicking a deployed situation.
The same general approach is taken to learning actionability.
The developed method for assessing the informativeness and filtering the messages is divided into two stages.
Firstly, incoming messages are labelled for informativeness by a machine learning system. Messages deemed to be
uninformative are discarded. The sensitivity of this system can be tuned. For example, in situations where it is
critical to process every potentially important message, this can be achieved, at the cost of some irrelevant messages
coming in. In other situations, where it’s most important to keep up to date, and only summary information is
required, then all irrelevant messages can be discarded at the risk of some potentially relevant ones being missed.
The second stage is to determine what kind of action can be taken based on the message. This is termed actionability
classification. This paper describes a novel schema for actionability classification. The schema is prototyped,
updated, and then used to construct a dataset. A machine learning system is then trained to classify incoming
messages according to their actionability, based on this dataset.
1From CrisisLex data (Olteanu, Castillo, et al. 2014)
2See https://www.ushahidi.com/blog/2017/08/10/ushahidis-global-uchaguzi-partnership-crowdsourcing-kenyan-election-incidents-shows-
a-mostly-orderly-election
3See http://comrades-project.eu
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Kees Boersma and Brian Tomaszewski, eds.
3. Leon Derczynski et al. Finding Informative Tweets
Figure 1. How the informativeness task is presented to crowd workers
PREDICTING INFORMATIVENESS
Discarding uninformative data is critical when one aims to process crisis-relevant messages in real time. Typically,
when capturing messages related to a crisis, any potential selector (e.g. geographical area, hashtag) yields a high
volume of available content, via e.g. Tweets or SMS. However, only a limited part of this will in fact be relevant to
the crisis and also informative.
Therefore, the first step in dealing with any information stream related to a crisis is to differentiate between informative
and non-informative content. The two major technical challenges here are first in developing a computationally
operationalised definition of informativeness in crisis situations, and secondly in creating a technical solution that
can discriminate based on informativeness in the extraordinarily broad range of not only crisis situations but also
the social and other message media used around them.
Defining Informativeness
A key challenge in developing an automated approach for identifying relevant messages is to describe ‘relevant’ in a
manner that enables it to be transformed into an automated process. However, identifying messages which are
relevant, or informative (having an information value), is often both heavily context-dependent and also subjective.
Furthermore, informativeness patterns (and associated relevancy of information) may differ from one type of crisis
to another (Olteanu, S. Vieweg, et al. 2015). Because we rely partially on human-annotated data to train our system,
the definition of informativeness must additionally both provide useful distinctions for the end users of the tool (the
crisis workers) and be as easy as possible for human annotators to understand and apply to the data they are given.
Previous work has shown that the quality of crowdsourced data is heavily dependent on both the simplicity and
clarity of the task and definition (Sabou et al. 2014), and this data is critical for the development of the tool.
These issues makes it challenging both to capture good examples of informativeness, and to develop a definition for
it that can be used for crisis analysis. For this scenario, the notion of informativeness ideally needs to reflect the
extent to which information captured during a crisis is useful to responders; or at least whether or not a message
is related to the crisis. Rather than build a definition from first principles, we instead took a top-down approach,
considering what kinds of messages in practice would be useful, and generalising from that. To arrive at a definition,
we drew on a range of existing sources and efforts. These included third-party resources from established authorities
in the field of information management for crisis response, such as CrisisLex (Olteanu, Castillo, et al. 2014), the
OCHA Guiding Principles (Van de Walle et al. 2008), and the World Disaster report, as well as academic sources
such as the DERMIS principles (Turoff et al. 2004), and field research in Nepal (Baharmand et al. 2016). Finally,
we included research carried out within the COMRADES project, including monitoring exercises from iHub and
input from Delft University of Technology (T. Comes, Toerjesen, et al. 2016).
From these resources, and after some trial and error testing different definitions on a number of volunteer annotators
and getting feedback, we created a three-level definition of informativeness. It became clear that informativeness
often hinges on how urgent information is, so this concept is reflected in the levels:
• Informative: the information in the message would be helpful with saving lives or assisting the authori-
ties/other response teams in dealing better with the incident.
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Kees Boersma and Brian Tomaszewski, eds.
4. Leon Derczynski et al. Finding Informative Tweets
• Somewhat Informative: the message contains useful information that helps to understand better the situation.
But no emergency or urgent information can be found in the tweet.
• Not Informative: the message does not contain useful information regarding the incident, but is instead
expressing an opinion or contains unrelated information
Because we crowdsource the work to human annotators, we were mindful of how task design and framing could
affect results: taking care of cognitive biases is paramount to achieving good results (Sabou et al. 2014; Derczynski,
Bontcheva, et al. 2016; Eickhoff 2018). Specifically, we wanted annotators to think carefully about mid-cases. Our
initial tests showed that a binary decision (informative or not) was too rigid; on the other hand, giving annotators
too many categories made it hard to find the boundaries between them. A sample task is shown in Figure 1.
Data Collection
The text data used in this stage of the project was part of the dataset released in Schulz et al. 2017. This dataset
includes over 10K tweets that were selected and labelled for the domain of incident detection of 10 different cities.
The tweets were already labelled by crowdsourced annotators, and classified. The first dataset has two classes
(related and not related to the incident); the second has four classes (crash, fire, shooting, and not related). Note
that there is a distinction between related and informative: related messages are not necessarily informative, but
unrelated messages are definitely not informative. We thus retrieved 600 tweets from the first dataset taking only
those labelled as related. Following this, we filtered our dataset to remove all retweets, as well as tweets with
redundant content (i.e. subjectively carrying the same information or describing the same fine-grained event).
Annotators on CrowdFlower4 were then instructed to label these documents according to the three informativeness
categories listed above. Five crowd worker judgments were collected for each tweet. Crowd responses were then
adjudicated taking the most popular category. In general, total agreement was higher than 70% in all three groups,
and over 80% for the “not informative" category. The resulting annotated data has been made publicly available as
the COMRADES Crowdsourced Informativeness Dataset (CCSID) (Qarout et al. 2018).
Supplemental Data
At least a few thousand examples are required to train most machine learning systems. For extra data, we
used CrisisLex (Olteanu, Castillo, et al. 2014), which contains around 60,000 tweets. These are labelled for
informativeness, though not as stringently as CCSID; messages are coded as either “Informative", corresponding to
the union of CCSID’s first two options, or “Non-informative", corresponding to the third. In any event, it comprises
32K positive and 28K negative examples. While this dataset is not as reliable (for example, there are anecdotal
cases of useless “informative" tweets in it), the effect of this is diminished by balancing data and exploiting loss
functions during training. This helps us to get high quality from the CCSID while leveraging the bulk of the
CrisisLex annotations, as described below.
Machine Learning Approach
Having the data, the next step is to learn what informative messages generally look like. A machine learning
algorithm will pick up patterns in the data we present it with, and try to generalise from that what signals an
informative message. Then, when applied to new messages, it should be able to make some decision about what
is and is not informative. In this instance, because we are dealing with highly diverse text and have a reasonable
amount of data, we choose a convolutional neural network, which ought to be able to infer directly from text without
additional feature extraction.
Because we need to leverage the data from CrisisLex as well as CCSID, we can label a maximum of two classes, as
CrisisLex does not have the middle “informative but not urgent" label. Therefore, for this system, we reduce the
CCSID categories from three to two by combining “informative and urgent" with “informative but not urgent".
Convolutional networks work by repeatedly applying a “window" to a numerical representation, and combining all
the values under that window. For text, those numbers come from a word embedding. We followed the convolutional
text classification network of Zhang et al. 2015 to determine informativeness (Figure 2).
It is important to know how good a machine learning model is, so that we know when to stop training. Sometimes,
training for more time does not give a better result. This is because some systems match the training examples too
4http://www.crowdflower.com/
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Kees Boersma and Brian Tomaszewski, eds.
5. Leon Derczynski et al. Finding Informative Tweets
Figure 2. The convolutional neural network architecture used (from Zhang et al. 2015)
closely, and so will not operate well in a new environment. For example, if our training set only took examples from
shootings, and the term “AK47" occurred often in this data, then a system might end up thinking that highly specific
mentions of “AK47" indicate informativeness, instead of phrases like “wounded". This is called overfitting.
The quality of a machine learning model is judged in two ways. First, we can measure how well the trained model
works on the data that is used to train it (the “training set"). This is like measuring how well the system has learned
the data to which it has been exposed. The metric used for this is “training loss", and lower is better. The second
way is to measure the model’s performance on some other, held-out data (a “validation set"). A good score on
this indicates that the model is learning to predict things beyond the data it has already seen, i.e. the model is
generalising. This is called “validation loss", and again, a lower value is better.
Typically, one takes a dataset, and splits off a part which becomes the validation set. During training, training loss
decreases, as does validation loss. When training loss becomes lower than the validation loss, the model has started
to predict the training data better than other data, meaning it has started overfitting. So, the model selected is the
one just before this crossover of validation and training loss, with a minimum validation loss.
In the instance of our informativeness classifier, we have a somewhat limited dataset (large scale machine learning
text datasets typically comprise millions of examples). In addition, some of the training data (ie. from CrisisLex) is
not a precise match for the kinds of scenarios we envisage COMRADES outputs being deployed in. This means that
measuring loss against CrisisLex alone might gives a system that is very good for the kind of scenarios contained in
this dataset, but does not generalise well across crises.
To remedy this, we use part of the CCSID in the validation set, and give it a much larger proportion in the validation
set than the test set. Specifically, the validation set is comprised of 300 CCSID instances and 300 drawn from
CrisisLex (150 each of informative and non-informative). The remainder of the data, some 60,000 examples, forms
the training data. This means that scoring well on the more-varied CCSID, which matched COMRADES goals
well, is very important, despite it only comprising a smaller proportion of the training data. In this way, we can
select a model for informativeness classification that uses all the data we have available and is optimally likely to
score well in real-world deployment.
Evaluation
This training and validation partition were used with the configuration specified above. The final scores in this
case are an accuracy of 92% and an F1 of 91%. This is over the validation dataset, which represented both the
definition of informativeness we wanted to capture and also the diversity present in social media. The F1 indicates
that we achieve an excellent balance in errors between false positives (i.e. uninformative messages slipping through)
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Kees Boersma and Brian Tomaszewski, eds.
6. Leon Derczynski et al. Finding Informative Tweets
Figure 3. A schema from the MIT Humanitarian Response Lab for prioritising hurricane response
and false negatives (informative messages being missed). We attribute the high performance of this system to the
careful annotation process, as well as the volume and range of documents available for training. Additionally, we
purposefully created a system aimed at high performance on this data type.
ACTIONABILITY
Having filtered out uninformative content, the next stage is to assist crisis workers in information processing by
categorizing the remaining messages. This could be used to select individual messages and hand them over to a
specialist team, as well as to get a good overview of the crisis situation or detect emerging events.
Some former work has addressed this. Recently, the CREES system (Burel et al. 2017) used the CrisisLex categories
and large amount of data there to do some filtering of information into different categories. However, not all these
were actionable, and while there is some overlap, the categories are either not directly useful when determining
actionability, or do not provide a sufficient amount of granularity. Nevertheless, with its large amount of data,
CREES achieves good performance. Earlier work on determining actionability in crises was effective though studied
over a single crisis (Munro 2011). We go further and train on multiple crises, allowing a more general impression
of how actionability can look over a broad range of contexts.
Defining Actionability
Our requirements were that a relatively small number of categories should be selected, and all of these at a single,
top level. A hierarchical system is complex and hard to build; the added complexity here incurs cost and quality
risk, and for a real-time prototype system, it is better to provide something that works. Further subcategorization
could always be efficiently provided later if necessary.
Additionally, the categories should correspond to issues relevant to the type of crisis tracked by COMRADES and
envisaged for use by project stakeholders. This means some categories in other datasets feature less in our work.
For example, media opinion of the crisis response is less important in our use cases than trapped and injured people.
The source categories are given here; firstly, for the MIT Humanitarian Response Lab’s schema, as in Figure 3.5
Secondly, from CrisisNLP (Imran et al. 2016):
1. Injured or dead people: Reports of casualties and/or injured people due to the crisis
2. Missing, trapped, or found people: Reports and/or questions about missing or found people
5See https://humanitarian.mit.edu/ (MIT HR is also the source of the schema diagram)
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Kees Boersma and Brian Tomaszewski, eds.
7. Leon Derczynski et al. Finding Informative Tweets
3. Displaced people and evacuations: People who have relocated due to the crisis, even for a short time (includes
evacuations)
4. Infrastructure and utilities damage: Reports of damaged buildings, roads, bridges, or utilities/services
interrupted or restored
5. Donation needs or offers or volunteering services: Reports of urgent needs or donations of shelter and/or
supplies such as food, water, clothing, money, medical supplies or blood; and volunteering services
6. Caution and advice: Reports of warnings issued or lifted, guidance and tips
7. Sympathy and emotional support: Prayers, thoughts, and emotional support
8. Other useful information: Other useful information that helps understand the situation
9. Not related or irrelevant: Unrelated to the situation or irrelevant
We settled on a blended prototype set of top-level actionable information types. Unlike CrisisNLP, this set imposes
no specific ordering or prioritization on the data. This ordering is not applicable in all crises, nor indeed over time
in the same crisis. For example, dead bodies are not a priority compared with the trapped and injured, on day zero;
but on day 3-4, they may well be.
To avoid colouring user judgments, we took a conscious decision to make all tools as agnostic as possible, cutting
out any biases that we noticed. This has the goal of placing all decisions about response and action on the analyst,
instead of providing nudges about what has higher or lower rankings (as one implicitly does by ordering actionability
types).
• A specific resource, where the kind of need is given explicitly.
• Mentions of some group that’s responding or aiding (e.g. volunteers, military, government, businesses, aid
agencies).
• Threats to the general crisis response. Weather warnings, fires, military action etc.
• Changes in accessibility - limits to access, or other changes in how much transport can get through.
• Damage to infrastructure, livelihoods etc.
• Mention by name of the geographic areas affected
• Changes in the local environment (weather, hazards, etc.); e.g., a storm is intensifying
• General reporting about the rescue effort (from the media or the public)
• Opinion or individual message.
Note that some categories, such as those around Needs, are collapsed into one. Following this example, we first
should identify content about needs, and analysis of the exact type of information – for example, the resource
required, or the location – can be postponed (Munro 2011).
Gathering Actionability Data
Labelling qualitative data with quantitative, hard categories is a difficult process, with some margin of error. The
standard of this process directly reflects the quality of the resulting dataset and the performance of machine learning
systems trained on them.
Humans typically perform rather poorly in data labelling when faced with more than seven labels, and best when
faced with two (Sabou et al. 2014). As we have nine categories, the set must be decomposed somehow. As with the
informativeness task, we trialled various different combinations and setups using in-house volunteer annotators. We
had best results from splitting types into two groups, and providing the “personal opinion" option alongside both.
To maintain quality in crowdsourcing, a first portion of the data was annotated by experts who had experience in the
goals of the data, crisis response, and linguistic annotation. After settling discrepancies, this pilot part of the data
was converted to 118 “gold" messages which would be used as test examples for crowd workers; for each gold
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Kees Boersma and Brian Tomaszewski, eds.
8. Leon Derczynski et al. Finding Informative Tweets
Category Count
Needs 22
Response groups 99
Threats to response 195
Changes in accessibility 19
Damage to livelihoods 62
Mention of affected areas 400
Changes in environment 40
General reporting 83
Personal opinion 423
Table 1. Actionability data
example, the correct answers are pre-filled, and then used to give feedback on the job as crowd workers proceed.
The test data gives crowd workers a safe environment in which to learn how to annotate, while seeing real examples
of correctly-labelled data. This follows best practices in crowdsourced annotation of corpora (Sabou et al. 2014).
We combined tweets from all days of Hurricane Irma6 with data from CrisisLex, taking from the latter only crises
of natural origin, in order to filter out short-term anthropogenic events like shootings. These were then filtered for
informativeness based on the system introduced above, so that only informative tweets were scanned for actionable
information. In total, 1350 tweets were labelled according to the actionability criteria indicated above, distributed
as shown in Table 1.
Note that an individual tweet can contain more than one kind of information (or no relevant information at all).
Therefore, the totals above do not sum to the dataset size. This comprises our dataset of actionable messages,
annotated by a mixture of experts and crowd workers that have been trained with expert data.
Extracting Actionable Content
Having now a set of examples of actionable and non-actionable data, it is possible to train a system to triage
messages according to actionability.
Deep learning requires large amounts of data; its advantage lies in a better ability to leverage better data. In this
situation, however, we are faced with a paucity of examples from each actionability class, which is tough for deep
learning. Indeed, applying a convolutional text kernel as with earlier informativeness analyses led to very poor
results, with a F-score under 0.1 in almost every case. We attribute this to the extreme data sparsity.
Further, data where there is a strong imbalance is harder for most machine learning algorithms to learn to predict.
In this case, our dataset of over 1000 instances often contains only a minority of positive examples, complemented
by an overwhelming majority of negative examples. This is a difficult learning environment: for effective learning
the number of positive and negative examples should be roughly equal.
We cast actionability extraction as a multi-class labeling problem. That is, a message can have more than one kind
of actionable information at a time. For example,
There are people camped out between Lincoln and fifth who need water
contains both a need (type A) and a geographic location mentioned by name (type J) – so it’s labeled [A,J]; hence,
multi-class. We construct a solution based on many binary classifiers, one for each actionability type. Each classifier
will attempt to predict whether or not the message contains a single type of actionability data, and a single classifier
is dedicated to each kind of actionable information. Finally, the results are pooled to form an aggregate labeling of
the message.
For feature extraction, following Aker et al. 2017, we extracted from the data a set of keywords for each actionable
information type, and also induced word embeddings over recent social media data as a continuous representation of
individual word types. These were taken from 20 million tweets from 2016, filtered for English with langid.py (Lui
and Baldwin 2012) randomly selected for variation in time (across seasons and across time of day), processed
with GloVe (Pennington et al. 2014), and having 25 dimensions (a low dimensionality is less prone to errors with
this relatively sparse training data). Documents are converted to data by splitting them into words using a social
media tokenizer (O’Connor et al. 2010) and then, for each word in the keyword list, measuring the proportion
of words in the document that are similar to the keyword. Similarity is defined by cosine vector proximity, with
6Kindly provided by PushShift – http://pushshift.io
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Kees Boersma and Brian Tomaszewski, eds.
9. Leon Derczynski et al. Finding Informative Tweets
Figure 4. Tuning parameters for the RBF SVM
a cut-off of 0.45. That is to say, document words whose vectors have a cosine with the keyword under 0.45 are
disregarded. Additionally, each document word is scaled by its score; so, if a ten-word document has one similar
keyword, and the similarity is 0.6, the overall value for this keyword is 0.06 (10% × 0.6). This feature extraction
converts documents into a one-dimensional vector with a value for each keyword.
Sample keywords for two of the actionability types are:7
Personalopinion: [i people us all me please good will trump your pray praying toll concerned
omg worried god]
Changes in accessibility: [G accessibility street bridge blocked derailment collapse close
flooded closed careful cancel cancelled canceled avoid alert affect advised]
These terms are those likely to be indicative of a particular class. They are extracted by finding the most
discriminative words for each class through a simple Bayesian analysis. The full set of keywords and actionability
types are downloadable from the Emina distribution and can be edited to suit new situations.
We needed a classifier that was rapid, could learn with with a low amount of data, and that did not select a liner
decision boundary. To this end, the RBF SVM fit our needs. In addition, we reduced the “negative" data (i.e. the
non-actionable cases) down to match the number of positive examples for each class, randomly removing examples.
Evaluation
We measured both the accuracy of the system (i.e. the proportion of examples it got right) as well as its F1 score
and its recall (i.e. the proportion of known actionable messages that were retrieved by the classifier). Recall
was particularly important here as precise systems (i.e. those that return only actionable information) were not
considered as useful as those with good recall; for us to correctly pick up two instances of people in need and ignore
four hundred would still have high precision – when no false need messages were returned (i.e. those where there is
no need expressed) – but poor impact on crisis analysis and response.
Radial Basis Function (RBF) SVM had excellent performance in some categories and good performance in others.
Other classifiers did best in some areas but much worse in others. RBF SVM’s overall performance indicates it is
stable for this task especially given that we have relatively small datasets and the way messages are written varies
very wildly across crises, even more so than to be expected in social media (Derczynski, Maynard, et al. 2013).
The RBF SVM works by exploring the space out from each example, to see where boundaries lie. This effectively
tells us the impact or sensitivity of each keyword, in the context of other keywords. It seems that such an approach
is more effective than finding linear separators in such a data-imbalanced, high-variance environment.
RBF is tuned with two parameters, C and gamma. The gamma parameter defines the influence that a single data
point has. C trades off error against simplicity – a very complex system might make few errors, but may be fragile,
having very poor performance on new data. The behavior of an RBF SVM can be highly sensitive to gamma; large
7 The full set is provided with the Emina toolkit
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Kees Boersma and Brian Tomaszewski, eds.
10. Leon Derczynski et al. Finding Informative Tweets
Class Accuracy F1 Recall Baseline F1
Needs 66.83% 57.50% 63.01% 45.21%
Response groups 63.13% 62.94% 62.63% 40.18%
Threats to response 57.18% 57.72% 58.46% 33.70%
Change in accessiblity 57.89% 57.89% 57.89% 18.18%
Damage to infrastructure, livelihoods 56.45% 54.24% 51.61% 20.16%
Geographic names 56.63% 57.84% 59.50% 45.23%
Changes in environment 50.00% 47.37% 45.00% 23.58%
Reporting about the rescue 57.83% 58.82% 60.24% 13.95%
Personal opinions 66.59% 63.43% 57.94% 42.27%
Table 2. Performance of actionability detectors
Figure 5. Types of actionable data during the 2013 Alberta floods
values give individual examples power to distort the system’s behavior. To best tune our system, we searched for
parameters (Figure 4. Results are for category G, access to infrastructure, though simplified for visualization to the
first two keyword parameters.
The plots show that performance is best with a high C and a low gamma – that is, we should not fit the data too
tightly to either of these points. In the end, C=20 and gamma=3 worked for the general case. Overall results for
RBF SVM are given in Table 2. These are compared with a keyword-match baseline, which matches texts if any of
the keywords for a given actionability type are present.
This actionability extraction, coupled with the informativeness filtering, is released as an open source tool Emina at
https://github.com/GateNLP/emina.
PROFILE OF A CRISIS
Using both informativeness filtering and actionability classification, we are able to see how the kind of actionable
information in a crisis situation develops over time. To this end, we applied informativeness filtering and then
actionability extraction to messages around individual crises in prior datasets.
Visualization
To measure the balance of actionable information over time, we bucketed messages according to their timestamps
and then calculated the proportion of each actionability type. The results are shown in a proportional stacked bar
chart. That is to say, for each time slot, the chart shows the proportion of actionable information during that period.
If the dominant information type is “needs", then this second will have a larger proportion than the others, regardless
of volume. An example for the 2013 Alberta floods is shown in Figure 5. This kind of visualization is an accessible
summary of the profile of the crisis over time.
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Kees Boersma and Brian Tomaszewski, eds.
11. Leon Derczynski et al. Finding Informative Tweets
CONCLUSION
Filtering out uninformative messages is crucial to effective use of human monitors of electronic messages during a
crisis. The majority of inputs in recent crises are irrelevant, and systems that preprocess and reduce the volume
of irrelevant messages become invaluable. We developed a pair of techniques that first remove the bulk of
irrelevant messages, and then attempt to select particular kinds of actionable messages, according to some top-level
actionability categories.
We were able to extract informative messages at over 90% accuracy, incorporating both the accurate selection of
informative content and also avoiding extraction of non-informative content. Furthermore, we were able to recall
between 50% and 70% of actionable information over eight individual information classes. The results presented
in this paper are a first step to examine the feasibility of automatically identifying informative, and actionable
messages such as those present from social media sources during a crisis event. While the results in this paper are
promising, further research is needed to improve this approach. More importantly, given the importance of data and
responsibilities of crisis responders, additional research is needed into the risks and ethics associated with automatic
data processing. With automated services and tools, like Emina presented in this paper, we gain the potential to
support crisis responders, digital volunteers and communities themselves in processing large volumes of data.
ACKNOWLEDGMENTS
This work was supported by funding from the European Commission through grant agreement 687847, Comrades.
The authors are grateful to CrowdFlower and Rob Munro for their continued support and the resources generously
donated which enabled creation of these datasets, as well as to Pushshift.io and Jason Baumgartner, who helped
provide the Irma data rapidly during and after the crisis.
REFERENCES
Aker, A., Derczynski, L., and Bontcheva, K. (2017). “Simple Open Stance Classification for Rumour Analysis”. In:
Baharmand, H., Boersma, K., Meesters, K., Mulder, F., and Wolbers, J. (2016). “A multidisciplinary perspective on
supporting community disaster resilience in Nepal”. In: Proc. ISCRAM.
Burel, G., Saif, H., and Alani, H. (2017). “Semantic Wide and Deep Learning for Detecting Crisis-Information
Categories on Social Media”. In: International Semantic Web Conference. Springer, pp. 138–155.
Comes, T., Meesters, K., and Torjesen, S. (2017). “Making sense of crises: the implications of information
asymmetries for resilience and social justice in disaster-ridden communities”. In: Sustainable and Resilient
Infrastructure, pp. 1–13.
Comes, T., Toerjesen, S., and Meesters, K. (2016). D2.1: Requirements for boosting community resilience in crisis
situation.
Derczynski, L., Bontcheva, K., and Roberts, I. (2016). “Broad Twitter Corpus: A Diverse Named Entity Recognition
Resource.” In: Proc. COLING, pp. 1169–1179.
Derczynski, L., Maynard, D., Aswani, N., and Bontcheva, K. (2013). “Microblog-genre noise and impact on
semantic annotation accuracy”. In: Proc. Conference on Hypertext and Social Media. ACM, pp. 21–30.
Eickhoff, C. (2018). “Cognitive Biases in Crowdsourcing”. In: Proc. WSDM.
Harrald, J. R. (2006). “Agility and discipline: Critical success factors for disaster response”. In: The annals of the
American Academy of political and Social Science 604.1, pp. 256–272.
Homberg, M. van den, Meesters, K., and Van de Walle, B. (2014). “Coordination and Information Management in
the Haiyan Response: observations from the field”. In: Procedia Engineering 78, pp. 49–51.
Imran, M., Mitra, P., and Castillo, C. (2016). “Twitter as a lifeline: Human-annotated twitter corpora for NLP of
crisis-related messages”. In: Proc. LREC.
Lui, M. and Baldwin, T. (2012). “langid.py: An off-the-shelf language identification tool”. In: Proceedings of the
ACL 2012 system demonstrations. Association for Computational Linguistics, pp. 25–30.
Meesters, K., Beek, L. van, and Van de Walle, B. (2016). “#Help. The Reality of Social Media Use in Crisis
Response: Lessons from a Realistic Crisis Exercise”. In: Proc. HICSS. IEEE, pp. 116–125.
Munro, R. (2011). “Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol”.
In: Proc. CoNLL. ACL, pp. 68–77.
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Kees Boersma and Brian Tomaszewski, eds.
12. Leon Derczynski et al. Finding Informative Tweets
O’Connor, B., Krieger, M., and Ahn, D. (2010). “TweetMotif: Exploratory Search and Topic Summarization for
Twitter”. In: Proc. ICWSM, pp. 384–385.
Olteanu, A., Castillo, C., Diaz, F., and Vieweg, S. (2014). “CrisisLex: A Lexicon for Collecting and Filtering
Microblogged Communications in Crises”. In: Proc. ICWSM.
Olteanu, A., Vieweg, S., and Castillo, C. (2015). “What to expect when the unexpected happens: Social media
communications across crises”. In: Proc. CSCW. ACM, pp. 994–1009.
Patrick, J. et al. (2011). “Evaluation insights Haiti earthquake response emerging evaluation lessons”. In: Evaluation
Insights.
Pennington, J., Socher, R., and Manning, C. (2014). “GloVe: Global vectors for word representation”. In: Proc.
EMNLP, pp. 1532–1543.
Qarout, R., Maynard, D., Derczynski, L., and Bontcheva, K. (2018). COMRADES Crowdsourced Informativeness
Dataset (CCSID).
Rencoret, N., Stoddard, A., Haver, K., Taylor, G., and Harvey, P. (2010). “Haiti Earthquake response context
analysis”. In:
Sabou, M., Bontcheva, K., Derczynski, L., and Scharl, A. (2014). “Corpus Annotation through Crowdsourcing:
Towards Best Practice Guidelines.” In: Proc. LREC, pp. 859–866.
Schulz, A., Guckelsberger, C., and Janssen, F. (2017). “Semantic Abstraction for generalization of tweet classification:
An evaluation of incident-related tweets”. In: Semantic Web 8.3, pp. 353–372.
Sebastian, A., Lendering, K., Kothuis, B., Brand, A., Jonkman, S., Gelder, P. van, Kolen, B., Comes, M., Lhermitte,
S., Meesters, K., et al. (2017). “Hurricane Harvey Report: A fact-finding effort in the direct aftermath of Hurricane
Harvey in the Greater Houston Region”. In:
Temnikova, I., Vieweg, S., and Castillo, C. (2015). “The case for readability of crisis communications in social
media”. In: Proc. WWW. ACM, pp. 1245–1250.
Turoff, M., Chumer, M., Van de Walle, B., and Yao, X. (2004). “The design of a dynamic emergency response
management information system (DERMIS)”. In: JITTA: Journal of Information Technology Theory and
Application 5.4, p. 1.
Van de Walle, B., Van Den Eede, G., and Muhren, W. (2008). “Humanitarian information management and systems”.
In: International Workshop on Mobile Information Technology for Emergency Response. Springer, pp. 12–21.
Vieweg, S. E. (2012). “Situational awareness in mass emergency: A behavioral and linguistic analysis of microblogged
communications”. PhD thesis. University of Colorado at Boulder.
Vieweg, S., Hughes, A. L., Starbird, K., and Palen, L. (2010). “Microblogging during two natural hazards events:
what twitter may contribute to situational awareness”. In: Proc. CHI. ACM, pp. 1079–1088.
Zhang, X., Zhao, J., and LeCun, Y. (2015). “Character-level convolutional networks for text classification”. In: Proc.
NIPS, pp. 649–657.
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Kees Boersma and Brian Tomaszewski, eds.