Math is getting better at machine
learning (extract knowledge from data)
1980s: Pair wise document similarity (document clustering)
1990s: Latent Semantic Analysis (what does the word mean?)
1 2 3 4 5 6 7
1 2 3 4 5 6 7 8 9
really just matrix multiplication:
term vector (query) x strength matrix = doc vector
The full record that supports that claim
should be available for detailed
examination and critique
We were taught to share these discoveries by
publishing a paper or filing a patent after all
the work was done.
Corporate email communications
Green = Internal
Yellow = External
… not the objects
What if we,
Is the openness of the next generation going to change the
scientific process by allowing computers to mine the “human
Proving a hypothesis.
Finding the unknown correlations.
Sentiment Analysis for stock
Brokerage houses are using computers to
“micro-trade” stocks based on sentiment
analysis of blog sphere.
Figure from Glance, Hurst, Nigam, Siegler, Stockton, & Tomokiyo, KDD’05
My talk from 2
The more we share
the smarter it gets.
Googledocs determined the
commonality of what I was
entering and automatically
completed the rest of the list.
GoogleDocs searched the web.
Next generation versions will be
able to tell you what correlations
exist between two seemingly