http://www.cc.gatech.edu/~agray/6240spr11/ http://datamining.typepad.com/data_mining/2006/04/visualizing_tex_1.html There are so many different things included in text mining From social network dynamics to searching the web
http://www.nactem.ac.uk/assist/ Here is an example in social sciences but These techniques are used to evaluate the capacities of industrial companies via what they put on their webpages
Chance discovery means discovering chances - the breaking points in systems, the marketing windows in business, etc. It involves determining the significance of some piece of information about an event and then using this new knowledge in decision making. The techniques developed combine data mining methods for finding rare but important events with knowledge management, groupware, and social psychology. Theoretical Computer Science Springer.com ser·en·dip·i·ty 1. The faculty of making fortunate discoveries by accident. 2. The fact or occurrence of such discoveries. 3. An instance of making such a discovery. Fortuitous accidents Accidents in medicine: The idea sends chills down your spine as you conjure up thoughts of misdiagnoses, mistakenly prescribed drugs, and wrongly amputated limbs. Yet while accidents in the examining room or on the operating table can be regrettable, even tragic, those that occur in the laboratory can sometimes lead to spectacular advances, life-saving treatments, and Nobel Prizes. PBS NOVA
“ It takes years of study to create a chance discovery, “ writes Ashley Hay. author of the Science of Serendipity Can we use text mining techniques to speed up this process?
There are so many different things included in text mining from social network dynamics to searching the web
The importance of the corpus <ul><li>Using a corpus </li></ul>Here on the left is an example in the social sciences but these techniques are also used to evaluate the capacities of industrial companies via what they put on their webpages and so on...
What will you discover? Should you make separate corpuses for before & after the source of the poison was correctly identified as bacterial? Once you have begun manually identify key phrases, add synonyms, and see the the patterns that result ... then you can try to automate this process. How can this system be optimized? When you have some results there is also the challenge of putting them into perspective. If the corpus of the king cobra has many phrases similar to the blue ringed octopus what does this mean?
For many years medicine has known that dental plaque was caused by oral bacteria However in 2008 University of Florida researchers cornered the bacterial ringleaders of gum disease inside human artery-clogging plaque see Human Atherosclerotic Plaque Contains Viable Invasive Actinobacillus actinomycetemcomitans and Porphyromonas gingivalis by Emil V. Kozarov, Brian R. Dorn, Charles E. Shelburne, William A. Dunn Jr, and Ann Progulske-Fox The Curious Case of Dental & Arterial Plaques
What does this mean? If these two have the same cause perhaps other places were the keyword is plaque is also bacterial in origin or that microbes are implicated
Again more or less the same protocol <ul><li>Manually search texts to identify keyphrases
Include synonyms, science direct, google scholar etc
Find other medical conditions that refer to plaque or use simlar phrases