Think of them as the “buzz” from the document. They work really well when rolled up across many documents – so you can get a feel for what, exactly, are people saying.
They are completely automatic.
We can also tell you the themes that are lexically associated with an Entity, and not just the themes that are important inside a document.
We extract themes by identifying candidate themes via part-of-speech patterns. If you are a Salience customer, you can tweak these, but most people don’t.
Once we have extracted them, we score them using a combination of Lexical Chaining and some of our own proprietary scoring algorithms.
Iterative process – cluster and see, then allow for exploration into each of those ideas by building clusters that are associated with the various things So – a tool that allows you to see the big ideas and who are they connected with, then what’s the spread across the different users Then iterate based on the topics and entities, etc to dive deeper into the issue Computers as partners, not as replacements – if you know the buckets already, then categorization is a great thing But, this process allows you to go back and forth and dive in and out. A tool with which to do research.
H2O World - Clustering & Feature Extraction on Text - Seth Redmore