Be the first to like this
Over the past few years large-scale textual resources, such as news articles, social media, search queries or even digitised books, have been the centre of various research efforts. In particular, the open nature of the microblogging platform of Twitter provided the unique opportunity for various appealing ideas to be evaluated. Based on the hypothesis that this online stream of content should represent at least a biased fraction of real-world situations or opinions, we have proposed core algorithms for estimating the current rate of an infectious disease, such as influenza, or even of a natural, less stable phenomenon like rainfall rates. A simplified emotion analysis on a longitudinal set of tweets revealed interesting patterns, including signs of rising anger and fear before the UK riots in August, 2011. A similar analysis on the Google N-gram book database uncovered collective patterns of affect over the course of the 20th century. Through the extension of linear text regression approaches by the addition of a user dimension, we proposed a family of bilinear regularised regression models, which found application in the approximation of voting intention trends. Finally, we reversed the previous modelling principle in an attempt to qualitatively analyse user attributes or behaviours.