Successfully reported this slideshow.
Your SlideShare is downloading. ×

Classifying Crisis Information Relevancy with Semantics (ESWC 2018)

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 33 Ad

Classifying Crisis Information Relevancy with Semantics (ESWC 2018)

Download to read offline

Social media platforms have become key portals for sharing and consuming information during crisis situations. However, humanitarian organisations and effected communities often struggle to sieve through the large volumes of data that are typically shared on such platforms during crises to determine which posts are truly relevant to the crisis, and which are not. Previous work on automatically classifying crisis information was mostly focused on using statistical features. However, such approaches tend to be inappropriate when processing data on a type of crisis that the model was not trained on, such as processing information about a train crash, whereas the classifier was trained on floods, earthquakes, and typhoons. In such cases, the model will need to be retrained, which is costly and time-consuming.
In this paper, we explore the impact of semantics in classifying Twitter posts across same, and different, types of crises. We experiment with 26 crisis events, using a hybrid system that combines statistical features with various semantic features extracted from external knowledge bases. We show that adding semantic features has no noticeable benefit over statistical features when classifying same-type crises, whereas it enhances the classifier performance by up to 7.2% when classifying information about a new type of crisis.

Social media platforms have become key portals for sharing and consuming information during crisis situations. However, humanitarian organisations and effected communities often struggle to sieve through the large volumes of data that are typically shared on such platforms during crises to determine which posts are truly relevant to the crisis, and which are not. Previous work on automatically classifying crisis information was mostly focused on using statistical features. However, such approaches tend to be inappropriate when processing data on a type of crisis that the model was not trained on, such as processing information about a train crash, whereas the classifier was trained on floods, earthquakes, and typhoons. In such cases, the model will need to be retrained, which is costly and time-consuming.
In this paper, we explore the impact of semantics in classifying Twitter posts across same, and different, types of crises. We experiment with 26 crisis events, using a hybrid system that combines statistical features with various semantic features extracted from external knowledge bases. We show that adding semantic features has no noticeable benefit over statistical features when classifying same-type crises, whereas it enhances the classifier performance by up to 7.2% when classifying information about a new type of crisis.

Advertisement
Advertisement

More Related Content

Similar to Classifying Crisis Information Relevancy with Semantics (ESWC 2018) (20)

Recently uploaded (20)

Advertisement

Classifying Crisis Information Relevancy with Semantics (ESWC 2018)

  1. 1. www.comrades-project.eu Classifying Crisis-information Relevancy with Semantics Prashant Khare, Gregoire Burel, Harith Alani 1 {prashant.khare, g.burel, h.alani} @open.ac.uk Knowledge Media Institute, The Open University, UK ESWC2018 – 5 June 2018 Heraklion, Crete
  2. 2. www.comrades-project.eu Motivation 2 People of NSW, be careful because there's fires spreading! Stay safe everyone! Hundreds of volunteers in Mexico tried to unearth children they hoped were still alive beneath a school's ruins Two trucks and one car in the water after a road collapse at Hwy 287 and Dillon. #cowx #boulderflood CRISIS Wildfire Floods Earthquake
  3. 3. www.comrades-project.eu Motivation 3 Challenges  A flood of data gets generated. For e.g.:  Over a million tweets were posted during the 2017 Hurricane Harvey.  500% increase in the tweets bandwidth during 2011 Japan earthquake.  Almost impossible to manually absorb and process such sheer volumes.  In addition, the characteristics of social media posts such as short length, colloquialism, syntactic issues pose additional challenges of processing the data.
  4. 4. www.comrades-project.eu Motivation 4 FEMA launched an initiative to use public social media data for situational awareness purpose1. 1: https://www.dhs.gov/sites/default/files/publications/privacy-pia-FEMA-OUSM-April2016.pdf Image source – fema.gov
  5. 5. www.comrades-project.eu Motivation 5 Relevant and Non-Relevant
  6. 6. www.comrades-project.eu Key Problem- Diverse forms of Crisis 6 Floods FireQuake Human Disaster Food & Supplies Crisis
  7. 7. www.comrades-project.eu Key Problem- Broad Spectrum Data 7 The diverse range of situations result in a broad spectrum of content People of NSW, be careful because there's fires spreading! Stay safe everyone! BREAKING: Reports of shots fired at LAX Airport, says senior government official. Two trucks and one car in the water after a road collapse at Hwy 287 and Dillon. #cowx #boulderflood Report: Between 3 and 5 firefighters missing following massive blast at West, Texas, fertilizer plant, police say Hundreds of volunteers in Mexico tried to unearth children they hoped were still alive beneath a school's ruins during earthquake Casualties from 7.2 #earthquake in the #Philippines is now 20+ according to authorities. Casualties from 7.2 #earthquake in the #Philippines is now 20+ according to authorities.
  8. 8. www.comrades-project.eu Access Relevant Information Across Crisis Situations 8 • How do we handle information overload? • How do we identify relevant and irrelevant information across diverse crisis situations? • Can we learn from one type of crisis situation, and identify relevant information in another type?
  9. 9. www.comrades-project.eu Previous Efforts - Identifying Crisis Related Information • ML Classification Methods:  Supervised Approaches: Often making use of n-grams, linguistic features, and/or statistical features of tweets.  Unsupervised Approaches: Keyword processing and clustering. • Semantic Models:  Representation of the information emerging from Crisis Events, providing faceted search of crisis related information. 9
  10. 10. www.comrades-project.eu Hypothesis and Aim • Hypothesis:  Semantics establish a consistency across various types of crisis situations thereby enabling identification of relevant information and can enhance the discriminative power of the classification systems. • Go beyond statistical features, n-grams, and incorporate the contextual semantics to the statistical features. 10
  11. 11. www.comrades-project.eu Statistical Features Example of statistical features: - Text length. - Number of words. - Presence and count of various Parts of Speech (PoS). - Data specific features such as hashtags (in tweets). - E.g., #neworleans #nola #algiers #nolafood #hurricanekatrina. - Readability Score (Gunning Fox Index using average sentence length (ASL) and percentage of complex words (PCW) : 0.4*(ASL + PCW)). 11
  12. 12. www.comrades-project.eu Semantic Features Example of semantic features: Additional information about terms found in the tweets can be extracted using NER tools, entity linking tools, and semantic databases: - Entity linking in Knowledge base. - Co-occurring words (from a data corpus) - Synset Sense – WordNet - Hierarchical Context: Hypernyms, Synonyms - Dbpedia properties. 12
  13. 13. www.comrades-project.eu Extracting Semantics Available tools for entity extraction and knowledge expansion: NER  DBpedia Spotlight  Alchemy (IBM)  Babelfy (BabelNet)  Text Razor NLP API  Aylien Text Analysis API Knowledge Bases  Dbpedia  YAGO  BabelNet  WordNet  Google Knowledge Graph  Wikidata 13
  14. 14. www.comrades-project.eu Babelfy and BabelNet BabelNet – a multilingual lexicalised semantic network formed by combining various knowledge resources- WordNet, Wikipedia, Wikitionary, OmegaWiki etc. It can enable multilingual NLP applications. It can be used for words sense disambiguation and entity linking with Babelfy. Babelfy – A words sense disambiguation and entity linking API built on top of BabelNet. 14
  15. 15. www.comrades-project.eu Features extracted Statistical Features  Number of Nouns, Verbs, Pronouns  Tweet Length  Number of words/tokens  Number of Hashtags Semantic Features  BabelNet Semantics  BabelNet Sense: English labels of entities identified via Babelfy.  BabelNet Hypernym: Direct English hypernyms of each entity (at a distance 1).  Dbpedia Semantics: List of properties associated with Dbpedia URI returned by Babelfy.  subject, label, type, city, state, country 16
  16. 16. www.comrades-project.eu Semantic Enrichment- Broader Perspective Features Post A Post B ‘No confirmed casualties yet from landslide reported in Compostela Valley. #PabloPH’ ‘News: Italy quake victims given shelter http://t.co/cXQEusVm via @BBC’ Babelfy Entities Sense (English) confirm, casualty, report, landslide Italy, earthquake, victim, shelter, news Hypernyms (English) victim, affirm, flood, seismology, geology, soil slide, announce, disaster, natural disaster, geological phenomenon natural disaster, geological phenomenon, broadcasting, communication, nation, country, unfortunate DBpedia dbc:landslide, dbr:landslide, dbo:place, dbc:Geological hazards, dbc:Seismology, dbc:Geological hazards, dbc:Seismology, dbr:Earthquake, dbc:Communication, dbr:News 17
  17. 17. www.comrades-project.eu Method • Collect Data from CrisisLex.org- collection of Crisis oriented tweets. • Extract Statistical Features. • Semantic Enrichment of tweets via annotation using Babelfy API. • Expand the semantics by incorporating hypernyms through BabelNet. • Retrieve Dbpedia features through SPARQL endpoint. • Classify using SVM classification method. 19
  18. 18. www.comrades-project.eu Data • CrisisLexT26 • 26 crisis events with 1000 labelled tweets in each event. • 4 Labels: Related & Informative, Related & Not Informative, Not Related, and Not Applicable. • Merged Related & Informative, Related & Not Informative – Related. • Merged Not Related, and Not Applicable – Not Related. 20
  19. 19. www.comrades-project.eu Data • After removing duplicates: 21378 Related and 2965 Not Related. • To prevent bias, we chose a balanced data- • Selected same number of Related tweets as Not Related in each event. • Final figure: 2966 Related and 2965 Not Related. 21
  20. 20. www.comrades-project.eu Data 22 Related Not Related Total Related Not Related Total CWF Col. Wildfire 242 242 484 COS Costa Rica E’qke 470 470 940 GAU Gautemalla E’quake 103 103 206 ITL Italy E’quake 56 56 112 PHF Philippines Flood 70 70 140 TYP Typhoon P 88 88 176 VNZ Venezuela Fire 60 60 120 ALB Alberta Flood 16 16 32 ABF Australia Bushfire 183 183 366 BOL Bohol E’quake 31 31 62 BOB Boston Bomb 69 69 138 BRZ Brazil Fire 44 44 88 CFL Col.Fire 61 61 122 GLW Glasg Crash 110 110 220 LAX LA Shootout 112 112 224 LAM Train Crash 34 34 68 MNL Manila Flood 74 74 148 NYT NY Train Crash 2 1 3 QFL Queensland Flood 278 278 556 RUS Russia Meteor 241 241 482 SAR Sardinia Flood 67 67 134 SVR Savar Building 305 305 610 SGR Singapore Haze 54 54 108 SPT Spain Train Crash 8 8 16 TPY Typhoon Y 107 107 214 WTX West Texas Ex. 81 81 162
  21. 21. www.comrades-project.eu Data- Event Type Distribution 23 Event Type Events Event Type Events Wildfire/Bushfire (2) CWF, ABF Haze (1) SGR E’quakes(4) COS, ITL, BOL, GAU Helicopter Crash (1) GLW Flood/Typhoons (8) TPY, TYP, CFL, QFL, ALB, PHF, SAR, MNL Building Collapse (1) SVR Terror Shooting/Bombing (2) LAX, BOB Location Fire (2) BRZ, VNZ Train Crash (2) SPT, LAM Explosion (1) WTX Meteor (1) RUS Crisis Type Distribution Wildfire/Bushfire E’quakes Flood/Typhoons Terror Shooting/Bombing Train Crash Meteor Haze Helicopter Crash Building Collapse Location Fire Explosion
  22. 22. www.comrades-project.eu Experiment Design Feature Models:  Statistical Features (SF- baseline)  Statistical Features + BabelNet Semantics (SF + SemEF_BN)  Statistical Features + Dbpedia Semantics (SF + SemEF_DB)  Statistical Features + BabelNet Semantics + Dbpedia Semantics (SF + SemEF_BNDB) Crisis Classification Model  Merge the entire data and perform 20 iterations of 5-fold cross-validation across all the models to evaluate the performance. 24 Statistical Features (SF), BabelNet Semantics (SemEF_BN), DBpediaSemantics (SemEF_DB), BabelNet and Dbpedia Semantics (SemEF_BNDB)
  23. 23. www.comrades-project.eu Experiment Design Cross Crisis Classification  Criteria 1- Content relatedness classification of already seen crisis event type.  When type of test data already exists in training data.  e.g. A classifier trained on data containing tweets/documents from flood event types (along with other event types), is used to classify data from a new flood type crisis event. 25 Statistical Features (SF), BabelNet Semantics (SemEF_BN), DBpediaSemantics (SemEF_DB), BabelNet and Dbpedia Semantics (SemEF_BNDB)
  24. 24. www.comrades-project.eu Experiment Design Cross Crisis Classification  Criteria 2- Content relatedness classification of unseen crisis event type.  When type of test data does not exist in training data.  e.g. A classifier trained on data containing tweets/documents from crisis events types except building fire event types, and is used to classify data from a such crisis event. To classify - “With death toll at 300, Bangladesh factory collapse becomes worst tragedy in garment industry history” 26
  25. 25. www.comrades-project.eu Experiment • Classifier Selection  Support Vector Machine with Linear Kernel  Chosen after determining its performance significance over RBF Kernel, Polynomial Kernel, and Logistic Regression via 20 iterations of 5-fold CV over the entire data) • Tools & Library  Scikit-learn Library  Python 2.7 27
  26. 26. www.comrades-project.eu Results Crisis Classification Model (20 iterations 5- fold cross validation) 28 Features Pmean Rmean Fmean Std. Dev. σ (20 iteration) ∆F /F (%) Sig. (p-value) SF (Baseline) 0.8145 0.8093 0.8118 0.0101 - SF + SemEF_BN 0.8233 0.8231 0.8231 0.0111 1.3919 <0.00001 SF + SemEF_DB 0.8148 0.8146 0.8145 0.0113 0.3326 0.01878 SF + SemEF_BN DB 0.8169 0.8167 0.8167 0.0106 0.6036 0.00001 Statistical Features (SF), BabelNet Semantics (SemEF_BN), DBpediaSemantics (SemEF_DB), BabelNet and Dbpedia Semantics (SemEF_BNDB)
  27. 27. www.comrades-project.eu Results Cross-Crisis Classification- Criteria 1 29 SemEF_BN SemEF_DB SemEF_BNDB Test F F ∆F /F (%) F ∆F /F (%) F ∆F /F (%) Flood/Typhoon TPY 0.803 0.776 -3.44 0.771 -4.01 0.780 -2.83 TYP 0.863 0.840 -2.66 0.829 -3.84 0.851 -1.29 ALB 0.718 0.749 4.25 0.844 17.41 0.844 17.41 QFL 0.783 0.792 1.18 0.77 -1.66 0.781 -0.22 CFL 0.801 0.827 3.28 0.754 -5.88 0.765 -4.41 PHF 0.764 0.763 -0.13 0.771 0.93 0.743 -2.83 SAR 0.570 0.677 18.79 0.648 13.70 0.650 -14.10 Earthquake GAU 0.780 0.725 -7.1 0.784 0.51 0.770 -1.30 ITL 0.583 0.562 -3.58 0.615 5.49 0.588 0.98 BOL 0.742 0.724 -2.38 0.758 2.2 0.674 -9.07 COS 0.790 0.770 -2.56 0.739 -6.42 0.750 -5.08 Statistical Features (SF), BabelNet Semantics (SemEF_BN), DBpediaSemantics (SemEF_DB), BabelNet and Dbpedia Semantics (SemEF_BNDB)
  28. 28. www.comrades-project.eu Results Cross-Crisis Classification- Criteria 2 30 SemEF_BN SemEF_DB SemEF_BNDB Test F F ∆F /F (%) F ∆F /F (%) F ∆F /F (%) Terror/Bomb/Train LAX 0.652 0.677 3.9 0.665 1.95 0.656 0.58 LAM 0.618 0.626 1.2 0.616 -0.34 0.628 1.62 BOB 0.608 0.635 4.4 0.605 -0.56 0.607 -0.19 SPT 0.547 0.686 25.56 0.746 36.5 0.686 25.56 Flood/Typhoon TPY 0.642 0.606 -5.67 0.651 1.39 0.582 -9.45 TYP 0.678 0.679 -0.12 0.661 -2.54 0.603 -10.99 ALB 0.716 0.705 -1.63 0.81 13.02 0.712 -0.63 QFL 0.681 0.657 -3.51 0.698 2.58 0.696 2.23 CFL 0.776 0.706 -9.04 0.704 -9.27 0.754 -2.87 PHF 0.532 0.566 6.52 0.632 18.9 0.556 4.67 SAR 0.537 0.553 2.93 0.595 10.69 0.617 14.84 Earthquake GAU 0.487 0.495 1.62 0.630 29.39 0.593 21.79 ITL 0.509 0.516 1.26 0.553 8.54 0.555 8.93 BOL 0.724 0.639 -11.73 0.674 -6.86 0.588 -18.77 COS 0.515 0.480 -6.71 0.538 4.56 0.527 2.33 Statistical Features (SF), BabelNet Semantics (SemEF_BN), DBpediaSemantics (SemEF_DB), BabelNet and Dbpedia Semantics (SemEF_BNDB)
  29. 29. www.comrades-project.eu Results and Observations • Based on IG score across each feature model (on the overall data), we observed very event specific features in SF model such as collapse, terremoto, fire, earthquake in top ranked features. • Observed 7 different hashtags in top 50 features (indicate event specific vocabulary). • In SF+SemEF_BN and SF+SemEF_DB models, we observed concepts such as natural_hazard, structural_integrity_and_failure, conflagration, perception, geological_phenomenon, dbo:location, dbc:building_defect etc in top 50 features. • structural_integrity_and_failure – annotated entity for term like collapse, building collapse – frequently occurring terms in earthquake, flood type events. • Natural_disaster – hypernym to event terms such as flood, landslide, earthquake. 31
  30. 30. www.comrades-project.eu Results and Observations • On an average SF+SemEF_DB is the best performing model (from Criteria 2). • An avg. percentage gain in F1 score (△F/F) of +7.2% with a Std. Dev. 12.83%. • Improvement over the baseline SF model, in 10 out of 15 events • 5 of 7 flood/typhoon, 3 of 4 earthquake, 2 of 4 crash/terrorist. • The results show that when type of test event is NOT seen in the training data, semantics enhance classifier performance. 32
  31. 31. www.comrades-project.eu Results and Observations • Semantics generalise event specific terms and consequently adapt to new event types (e.g., dbc:flood and dbc:natural hazard ). • Semantic concepts can be also be too general and thus do not help the classification of document (e.g., desire and virtue hypernyms). – Virtue is hypernym of broad range of concepts such as loyalty, courage, cooperation, charity. • Automatic semantic extraction tools could extract many non- relevant entities and therefore might confuse the. – e.g. “Super Typhoon in Philliphines is 236 mph It's roughly the top speed of Formula 1 cars http://t.co/vcRE…” – the annotation and semantic extraction results in 33
  32. 32. www.comrades-project.eu Further explorations • A more in-depth error analysis of misclassified documents is required. • Event type is based on the nature of the crisis. However, events of different types could produce overlapping content. Hence, content similarity could also be taken into account, along with event types. • Data about the same crisis event can emerge in multiple languages. Hence we need to expand the analysis to multilingual content. • Khare, P., Burel, G., Maynard, D., and Alani, H., Cross-Lingual Classification of Crisis Data, Int. Semantic Web Conference (ISWC), Monterey, 2018 (to be presented) 34
  33. 33. www.comrades-project.eu 35 Thank you! Questions?

Editor's Notes

  • ‘Classifying Crisis-information Relevancy with Semantics’
  • As the topic of the paper suggests, crisis situations are the principal motivation behind the work. People around the world are impacted by crisis and disasters in various forms. And in this era with the ability to share and access information in real time they resort to different online social media forums. Twitter certainly is among the most prominent medium for sharing and accessing real time information.
  • To support this idea, I would like to highlight this case from Hurricane Harvey in 2017. What you see highlighted are two very crucial piece of information shared in course of crisis situations. A volunteer driven handle, that collates info on a portal about who needs help and rescue, and who could assist in that geographical area. This is an interaction between two parties that resulted in rescuing 3 elderly ladies. But these critical information are not always easy to find and access as we are also well aware of the challenges that social media projects along side the opportunity that it offers.
  • But given those challenges, the opportunities these platforms offer have widely been acknowledged by humanitarian and government agencies.
  • The need for having tools and systems to rightly determine what is valuable on social media with respect to its relatedness to crisis situations is highlighted in this minor exercise of performing keyword based search on twitter during Hurricane Harvey. It is evident that not everything that might contain crisis specific based terminology is always a related content.

    That’s precisely where the problem of ‘what is crisis related’ defined.
  • But the problem explored in this work isn’t just about what is crisis related and what is not. They key problem is the diversity in the crisis events. Different crisis situations have different types and levels of impact on human life in form of well being, civic facilities, and what not.
  • Which results in a very broad spectrum of data. For instance, if we see here the social media data from situations like e’quakes, wildfires, explosions, road situations , terrorism to name a few. You can see how diverse the information is. For that we do need to come up with ways to be able to filter in as much diverse related content as we can.
  • So as a requirement of disaster management, we come down to the following questions. Since we understand that manually sieving through the social stream is nearly impossible so can we have automated ways to identify relevant information and can we learn from crisis situations to identify relevancy in new crisis situations.
  • Previous approaches have made attempts at tackling this problem. Some have adopted ML approaches where they go either by supervised classification approach or unsupervised.

    Supervised Approaches:
    Often making use of n-grams, linguistic features, and/or statistical features of tweets.

    Unsupervised Approaches:
    Keyword processing and clustering.

    Some of the approaches perform semantic enrichment of the data to create a faceted search on top of the semantic data. But that does not strictly tell you if something is crisis related or not. And may require a new strategy to search each time in new type of crisis events.
  • Our hypothesis is that the crisis relatedness is exhibited by combination of various concepts that occur in the user generated content. It might not always be just one key term that establishes the crisis relatedness of a tweet. So, we go beyond the statistical features and incorporate the contextual semantics along with the statistical features. We hypothesize that different crisis situations, while they maybe very discrete in their vocabulary and sense, can relate to each other somewhere at a broader contextual sense.
  • A number of statistical features can be extracted: length, number of words, various part of speech such as noun, verbs, pronouns, hashtags, and some can also calculate the readability score that scores a tweet on how easy or complicated it is structured to read. Gunning Fox Index is one of the methods to calculate that.
  • Various additional semantics can be considered. Co-occurring words (words that are very common to occur together across large scale corpuses). Word-embeddings are a good example of it. Extracted entities- along with the original text can sharpen the context more. Next, we can refer to the knowledge base (such as DBpedia) to retrieve extra information/properties about that entity. We can also use hierarchical context (from WordNet) to retrieve hypernyms, synonyms to each concept to generalise the context more.
  • We require NER tools and Knowledge Bases to extract the semantics that we want. Here are some of the well known tools available.
  • In this work we have relied on BabelNet knowledge base and NER API built on top of it. BabelNet is a multilingual semantic network resource which incorporates multiple knowledgebase such as wikipedia and wordnet, that really caters to the requirements here.
  • Here is an example of using BabelNet to semantically enrich a tweet by extracting entities and their hypernyms. We annotate the key entities in the text and then look for their corresponding hypernyms and augment them to the overall context.
  • The features that we have extracted are as follows. The statistical features are… The semantic features are. We extract hypernyms assuming that diverse concepts can relate when the context is expanded to parent levels of concepts. We refer to the Dbpedia properties to retrieve extra information/properties about that entity, which can link us to broader knowledge about its nature.
  • To gain a perspective of what we imply by semantic enrichment across the crisis types let us look at the following real scenario tweets.
  • Another example elaborating the same.
  • As an end to end process in a nutshell, this is what we do.
  • CrisisLex is a very popular data repository which has time and again been referred to for various related research studies. We have used one particular data corpus from Crisis Lex, and that is called CrisisLexT26. This dataset comprises of manually labelled tweets of 26 crisis events that occurred between 2012 and 13. Each of these 26 events have close to 1000 labelled tweets. So, here they have 4 labels. We merge the two to create a binary class system.
  • To have a balanced learning of both the classes, we ensure to pass an un-skewed data to the classifier. In each of the 26 events we select the same number of related tweets as the unrelated.
  • Here you can see related and unrelated distribution across all the 26 crisis events. As it is obvious, there isn’t an equal distribution across each event though.

    I would like to highlight each event is basically a broad crisis situation and tweets have been collected during that situation and that we call as event. For instance CWF is Colorado Wildfire, and it contains various tweets collected during that event.
  • Here we have categorised events based on their types. And this infographic shows that maximum events in this dataset are in flood/typhoons and next is earthquakes. Next we have same number of crash and shooting/bombing events.
  • Now we describe out experiment design. We create 4 feature models to evaluate. Statistical which is our baseline. Next, we create 3 semantic models where we enhance the SF feature model with the semantic features. In 3 models, we create SF + Babel Semantics, SF + Dbpedia semantics, and in last model we combine both the Babel and Dbpedia features. When we add the semantics we concatenate the semantics with the original text of the tweet and the do the tokenisation and create n-grams.

    So, these are the feature models. Now we design the classification methods. First, as a broad run, we simply merge the data from all the events and perform 20 iterations of 5-fold CV to just see how the 4 feature models perform.
  • Now here, we begin designing the cross crisis classification methods. Where we aim to evaluate how the classifier performs in while classifying the data from a new crisis event type. We set up 2 criteria. In Criteria 1, we test a type of crisis event when the type is already seen by the classifier in the training data.
  • In Criteria 2, we test a type of crisis event when the type is not seen by the classifier in the training data. For instance, we have a classifier trained on data collected during say flood and earthquake events. And we get to classify something coming from a factory collapse situation like here. Basically these 2 criteria set up here are critical part of our analysis.
  • We chose SVM linear kernel. SVM is known for its suitability in text classification problems. On top of it we compared the linear kernel’s performance against other kernels and logistic regression following a 20 iteration of 5-fold CV over entire data. SVM Linear Kernel was found to be more statistically significant, and had a better mean F1 value of 0.8118 and a p-value of < 0.00001.
  • We look at the results, so this is when we merge the entire data and do cross validation. So while this is not cross-crisis classification strictly, but it still shows that broadly the semantics perform slightly better than the baseline, improvement ranging from 0.6 to 1.4 %.
  • Now in Criteria 1, where the event is completely new but the classifier has seen the type of the test event in training. In this case we, take out the test event from the data, and use rest of the dataset as training data.

    As seen in an earlier infographic, flood/typhoon and earthquake had good number of events in the overall dataset, we chose to perform our analysis on these crisis types.
    So, when the classifier has already seen a type of crisis event, the semantics may not always be superior to the statistical features as we can observer from the results. The Dbpedia semantics seem to be the more consistent of the semantic feature model, performing better than the baseline in 6 out of 11 test events.
  • Now, when we see the Criteria 2, where the type of the event is not seen in the training data. In this case, while we test an event of a given type we ensure that none of the other events of similar type are in the training data. Here we see that firstly the performance of baseline drops significantly in comparison to baseline in Criteria 1. Secondly the semantic models, particularly Dbpedia feature model, outperforms the baseline in 10 out of 15 test events. As an additional category of types we also included the events from train crash, bombing/terror attacks.
  • To analyse how the semantics were effecting the nature of the data and the classifier, we performed Information Gain across all the feature models. We observed the in Statistical Features we observed very crisis specific terms among the top ranked features. Also, we saw 7 different hashtags in top 50 features and that is indicative of how vocabulary specific important features are. As we analysed the semantic models, we saw more generic, yet crisis related concepts showing up in the top ranked features. Some of the semantics concepts existed across different crisis types.
  • Overall the Dbpedia semantics was the best performing model when the classifier was tested on an unseen type of the crisis. The avg. gain was 7.2% over the baseline. And it performed uniformly well across all the 3 tested types. The Dbpedia performed well, likely due to its better coverage and semantic depth. Something we need to explore more.
  • Few more take aways from the analysis were: generalizing the data semantically help in adapting new crisis events.

    Sometimes very broad and general concepts can result in underperformance of the classifier. For instance, virtue is the hypernym of diverse concepts which can often be used in different context.

    Also, sometimes semantic extraction can yield very unwanted concepts and in huge volumes. For e.g. here.
  • As a progression to this work, we need to perform a more in-depth error analysis.

    Currently we only take the type of event in account, which is broadly the nature of a given crisis. However, different events can have overlapping content based on similar situations. So it would make sense to take that also into account.

    The crisis data can also originate in different languages, so how the classifier can be tuned to handle multilingual aspect of the crisis data, that analysis should be expanded. We made an attempt on doing the same, and that can be referred to in the near future at the following research paper that we will present at upcoming semantic web conference.

×