Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Identifying Prominent Life Events on Twitter - K-Cap 2015


Published on

Social media is a common place for people to post and share
digital reflections of their life events, including major events
such as getting married, having children, graduating, etc.
Although the creation of such posts is straightforward, the
identification of events on online media remains a challenge.
Much research in recent years focused on extracting major
events from Twitter, such as earthquakes, storms, and
floods. This paper however, targets the automatic detection
of personal life events, focusing on five events that psychologists
found to be the most prominent in people lives. We
define a variety of features (user, content, semantic and interaction)
to capture the characteristics of those life events
and present the results of several classification methods to
automatically identify these events in Twitter. Our proposed
classification methods obtain results between 0.84 and
0.92 F1-measure for the different types of life events. A novel
contribution of this work also lies in a new corpus of tweets,
which has been annotated by using crowdsourcing and that
constitutes, to the best of our knowledge, the first publicly
available dataset for the automatic identification of personal
life events from Twitter

Published in: Social Media
  • Login to see the comments

Identifying Prominent Life Events on Twitter - K-Cap 2015

  2. 2. Quick Overview Some Background What we did Discussion
  3. 3. Some Background
  4. 4. Why are we doing this? As content creators, we post a lot of stuff on social media This content can range from silly cats, to important life events that have happened to us However, as users, we effectively lose access to this information about ourselves and forget what’s there By being able to mine and present this data to users, we can look at giving users a tool to aid in self reflection over their own online digital presence
  5. 5. So what are we doing? As part of the Reellives project, we are looking at making short “Reels” from a users social media content These reels are intended as mini documentaries about a users life on social media This presents two main problems for us to solve: ◦ R.Q. 1) How can we extract meaningful events about ourselves from our social media data? ◦ R.Q. 2) How can we present these events in a cohesive narrative? R.Q. 1 is being tackled by KMI, where we are looking at event extraction. R.Q. 2 is being tackled by Edinburgh, who are looking at taking our output, as their input, to construct narratives.
  6. 6. So what are we doing? Social Media Storage Extraction Events Story Generation “Reel” Life Event Detection StoryFabula Narrative
  7. 7. What is a life event? There already exists a large body of research of event detection on social media. However, not much has been done on focusing on life or personal events. Semantically, they are no different: ◦ Both types will have a time and a location ◦ An action occurs ◦ The event is experienced by one or more agents However: ◦ With general events we care more about the broader social and political significance ◦ With life events we care more about the personal significance
  8. 8. What is a Life Event? We can also get some intuition for life events from Autobiographical Memory. Autobiographical memory is type of memory system that deals with specific events that happened to us ◦ This is opposed to semantic memory which is our knowledge of things It can be modelled with three separate layers ◦ Lifetime periods ◦ When I was at school I had my first kiss ◦ General Events ◦ I got married ◦ Event-Specific Knowledge ◦ My tie was red at the wedding In our work, we can consider the event-specific knowledge to be reflected in social media posts
  9. 9. What We Did
  10. 10. Types of life events To start off our research, we looked at identifying a finite number of life events. The types of life events we chose are inspired by work done in Autobiographical Memory ◦ S. M. Janssen and D. C. Rubin. Age effects in cultural life scripts. Applied Cognitive Psychology Their research showed a common consensus, amongst different age groups, of 48 life events that would happen to a fictional child over the course of their life. From this study, we selected 5 of the top events mentioned in a paper ◦ Getting Married ◦ Having Children ◦ Starting School ◦ A Parents Death ◦ First Love We also look at combining all positive “about an event” into a training set to create a more general “Is this about an event” classifier.
  11. 11. What we did – Data Collection We chose Twitter due to ease of use for extracting large datasets. Our selection methodology was based around a simple keyword search, where we considered the root concepts for each of our events, and enhanced with synonyms from WordNet. We extracted Tweets from Twitter’s front-end search, as opposed to their API ◦ This is due to their API having a 7 day limit ◦ Twitter now indexes every tweet, making it available to scrape from their front-end search application Additional details were extracted for each Tweet, using their Lookup API with the extracted Tweet ID.
  12. 12. What we did - Annotations To annotate our dataset, we turned to CrowdFlower To start with, we ran several small trials of annotation exercises on CrowdFlower to make sure our questions were satisfactory We initially had 7 questions: ◦ Is this tweet about Getting Married? ◦ Is this tweet about an event? ◦ Was the tweet before, during, or after the event? ◦ Is the author of the tweet experiencing the event? ◦ Is anyone else experiencing the event with the author? ◦ Is anyone else named in the tweet experiencing the event? ◦ Did the event happen where it was tweeted? This did not prove too popular as we had large number of quiz failures
  13. 13. What we did - Annotations Obvious failure for this initial test run were too many questions and possible subjectivity for our given definition of an event. After another trial, we finally settled on only asking two questions: ◦ Q1 - Is this tweet related to a particular topic theme? (Topic theme is the cluster we extracted from) ◦ Q2 - Is this tweet about an important life event? We also provided users a list of the 46 life events that Jansen and Rubin identified, as a way to get them to understand what we were after. This ran much better, and our final agreement ratings were 89.5% and 87.17% respectively
  14. 14. What we did – Feature Sets Our feature sets were divided into several groups: ◦ User features ◦ H1) Certain types of users may be more prone to share life events in Twitter ◦ Content Features ◦ H2) Posts written in a certain way may be related to life events ◦ Semantic Features ◦ H3) Posts about life events might be semantically associated with certain entities or concepts ◦ Interaction Features ◦ H4) Users who do not normally talk with the poster, might start interacting for certain types of life events
  15. 15. What we did - Classifiers We ended up just testing two classifiers, as other work had already tested a number of different classifiers on similar datasets: ◦ J48 ◦ Naïve Bayes We did try SVM’s as well, but due to poor performance, omitted it from our results. To evaluate we used 10-fold cross validation, reporting standard classification performance measures of Precision, Recall, and F1 scores.
  16. 16. What we did - Results https://
  17. 17. Discussion
  18. 18. Why the dominance of content features? Unigrams outstripped performance of other feature sets. This is similar to other similar papers, and slightly disappointing. While the classifiers were biased towards the keywords chosen, it is disappointing other feature sets did not perform well. In the case of interaction features this might be because: ◦ We were limited in what types of interaction features we could obtain, due to the limits of Twitters API ◦ The dataset might have been annotated incorrectly ◦ For example, stories of other people are annotated, rather than people declaring an event about themselves ◦ Due to the nature of Twitter and it’s followers, interaction features might just not be a good discriminator. ◦ Sites like Facebook though, which tends to be private, might have better performance in this area
  19. 19. Choice of targeting specific life events Targeting only five specific life events, dilutes what we can actually extract from social media Our binary classifier worked alright, but: ◦ Due to dependency on unigrams, it will probably not perform very well outside of these 5 events This is no silver bullet for solving our research question
  20. 20. Collecting the dataset The collected dataset was biased to certain words due to a keyword search A better way to collect these datasets would be to randomly sample twitter profiles, and annotate their timelines However, it is likely that only a small number of tweets are actually about these types of events in a users timeline To achieve a decent training set, we would need to annotate lots of tweets which is very costly
  21. 21. Twitter and the annotation process Using CrowdFlower is a great way to gain lots of annotations fast However, with Twitter data we think the annotation is flawed for these types of questions Lack of context ◦ Is a 140 character max text string enough context to annotate these types of events ◦ Example: Is “MadJacks Forever Memories” about getting married? ◦ Madjacks is a wedding venue in Las Vegas, so this might be? First vs third party annotation ◦ While lack of context for a third party is an issue, if the owner of the tweet annotated it, would we get better results? Extracting useful interaction features is difficult ◦ There is no API to get conversations for tweets. Mining this manually is possible, but annoying. ◦ You can’t get access to which users have favourited a tweet
  22. 22. Facebook would be better… …but it has heavy privacy controls to access user data While this is great for users, it’s annoying for researchers Retrieving content from Facebook all needs to be done within an application ◦ These days, a User ID is hashed with your application ID ◦ If you have a standard user ID, you can’t access the Facebook graph API to retrieve information about it Asking people to just give us their Facebook data with a single sign on approach isn’t the best approach either ◦ Users are reluctant to just give researchers their private data ◦ What do they get out of it? (besides the results of the research)
  23. 23. Is Instagram the middle ground? Like Twitter, there are a lot of open Instagram accounts ◦ Sites like index large numbers of users and offer tag based search Like Twitter, it is (currently) easy to extract Instagram data ◦ While the API, like Twitter, is limited, it is possible to extract full user profiles ◦ Instagram works with a REST based architecture, returning user posts in JSON feeds that can be paginated allowing full extraction of posts ◦ Using the API each post can be augmented with additional information not available in the media stream While we think of Instagram only being photos, most photos have short captions similar to Twitter length ◦ Comments can also provide semantic context
  24. 24. Future Work We are currently looking at collecting Instagram and Facebook data for future experiments ◦ Facebook data is being collected with a trivial app that users can use Unsupervised life event detection ◦ As opposed to targeting specific events, being able to extract any type would be of more value ◦ Currently we are looking at knowledge based approaches using ConceptNet to achieve this Graph Classification of Posts ◦ So far we have employed fairly flat vectors when considering feature sets ◦ As opposed to this, an alternative is to treat posts as graphs, looking at relationships within semantic (ConceptNet, DBpedia etc), interactions, and dependency parsing ◦ Graph frequent pattern mining might identify new feature sets that we can look at using