Presented at the 4th DEOS workshop, http://diadem.cs.ox.ac.uk/deos13/
Social media presents itself as a context-rich source of big data, readily exhibiting volume, velocity and variety. Mining information from microblogs and other social media is a challenging, emerging research area. Unlike carefully authored news text and other longer content, social media text poses a number of new challenges, due to the short, noisy, context-dependent, and dynamic nature.
This talk will discuss firstly how Linked Open Data (LOD) vocabularies (namely DBpedia and YAGO) have been used to help entity recognition and disambiguation in such content. We will introduce LODIE, the LOD-based extension of the widely used ANNIE open-source entity recognition system. LODIE includes also entity disambiguation (including products, as well as names of persons, locations, and organisations) and has been developed as part of the TrendMiner and uComp projects. Quantitative evaluation results will be shown, including a comparison against other state-of-the-art methods and an analysis of how errors in upstream linguistic pre-processing (i.e. tokenisation and POS tagging) can affect disambiguation performance. Our results demonstrate the importance of adjusting approaches for this genre.
The second half of the talk will focus on fine-grained events in tweets. Awareness of temporal context in social media enables many interesting applications. We identify events using the TimeML schema, focusing on occurrences and actions. Challenges of event annotation will be discussed, as well as the development of a supervised event extractor specifically for social media. We evaluate this against traditional event annotation approaches (e.g. Evita, TIPSem).