Genre is one of the textual dimensions that can be used to reconstruct the communicative context needed to assess the value of information with respect to a purpose (business, learning, finding, monitoring, predicting, etc.). When we know the genre of a text, we can surmise the CONTEXT where a text has been created and for which purpose. Therefore we can more confidently decide whether a text contains the information we are looking for. For example, factual texts might have more credibility than opinionated texts. In this respect, genres such as press conferences, declarations or announcements by a White House spokesman might be more reliable than subjective genres, e.g. newspapers’ editorials or op-ed articles. On the other hand, if we want to test the pulse and explore the feelings about a product or a politician, we might give more weight to more emotional genres like blogs, forums or social networks’ microposts.
In recent years, important steps forward have been taken in Automatic Genre Identification (AGI). AGI can be defined as a meta-discipline that leverages on and spans Computational Linguistics, NLP, Corpus Linguistics, Information Retrieval, Information Extraction, Text Mining, Text Analytics, Sentiment Analysis and LIS, among others. Promising computational models have been proposed to automatically identify the genre(s) of a text, although no agreement has been reached on the definition of the concept of genre itself. AGI research has shown that genre classes such as blogs, online newspaper front pages, FAQs, DIYs can be automatically identified using a wide range of genre-revealing features -- from linguistic cues to character n-grams -- with a variety of classification algorithms.
In a world where information overload is still pervasive and where technology encourages massive text production through emailing, blogging, tweeting and social network communication, it is likely that the concept of genre and AGI are useful to convert unclassified and unstructured textual data to more structured and contextualized information.
This talk presents a summary of the state-of-the-art in AGI and discusses how genre-aware applications could help extract actionable information from raw textual data.