Over the past few years large-scale, user-generated, online resources, such as social media or search engine queries, have been driving various interesting research efforts. This talk will showcase a series of ideas that we have developed based on the hypothesis that this online stream of content should represent a biased -at least- fraction of real-world situations, opinions or behaviour. Firstly, core algorithms have been proposed for mapping Twitter or Google search data to estimations about the current rate of an infectious disease, such as influenza. Then, through the addition of a user dimension in linear regularised text regression models, a family of bilinear models has been introduced, capable of improving approximations of voting intention trends. Finally, by redirecting the previous modelling principle, social media content has been used to learn as well as
qualitatively interpret user attributes, such as impact, occupation,
income and socioeconomic status.