How ambiguous? Look it up! The companies have agreed to a brief delay in implementing their agreement. 37 14 39 17 54 62 20 8 84 8 7 9 7,788,584,618,680,320 possible interpretations Each word disambiguates the others # definitions
Isn’t the Semantic Web supposed to fix these problems? The Semantic Web was intended to support machine – machine communication to manage the day to day mechanisms of trade, bureaucracy, and daily life (Berners-Lee, 1999).
Web Ontology Language: OWL Semantic Web Line up the information in web pages with predefined categories
Sports Recreation Baseball Basketball Cricket Gloves Basketballs Baseballs Wicket Is a Is a Is a Batter Is a Is a Uses Uses Uses Player Uses Player Ontology: set of concepts, categories, relations Ontologies cast meaning into categories Is a
People are creative For example: 20 - 25% of the searches on Google on any day have never been seen before
What categories matter to you? “basketball?” Bouncy things Round things Things to dribble Things that my brother hates Things with a pebbly surface Things that Barack Obama likes Things that float An infinite number of ways to categorize
“ The meaning of a word is its use in the language.” — Ludwig Wittgenstein Philosophical Investigations , § 43.
Truevert learns the meaning of words in the same way that people do, from the context in which they are used Truevert works in any language
Gabbro is a dark, coarse-grained, igneous rock formed underground. It is chemically equivalent to basalt. Gabbro is rarely used as a building stone. Do you know the meaning of the word “Gabbro?”
Blah blah blah court blah blah blah lawyer blah blah blah blah bailiff blah blah blah blah blah. Blah blah court blah blah blah basketball blah blah blah blah blah blah freethrow blah blah blah blah. Computer creates model of word use patterns from documents in its vertical Legal vertical Sport vertical
Model identifies characteristic word patterns for vertical Court & (lawyer or bailiff or jury or attorney or …) = legal Court & (basketball or hoops or freethrow or …) = sports
Truevert is a project of OrcaTec LLC. Headquartered in Ojai, CA. OrcaTec is a leading provider of information discovery software including intelligent semantic search, near duplicate clustering, language identification, email threading, and interesting phrase finding. OrcaTec-developed software was nominated by the Jet Propulsion Laboratory as NASA software of the year 2008. OrcaTec software has been used in electronic discovery and advertising applications as well as knowledge management. Core OrcaTec software is patent pending.