Why Semantic Search Is Hard

2,227 views

Published on

Describes problems with semantic search and how Truevert technology overcomes them

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,227
On SlideShare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
70
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Why Semantic Search Is Hard

  1. 1. Why is Semantic Search so Hard? and What Truevert Does About It Powered by www.truevert.com www.orcatec.com
  2. 2. Semantic search harnesses the meaning of words to improve the quality of search results
  3. 3. Using meaning is difficult
  4. 4. Language is dynamic Jabberwocky Effect Humpty Dumpty Syndrome Making up new words Using old words in new ways blog Twitter
  5. 5. Strike Bank Words are ambiguous
  6. 6. How ambiguous? Look it up! The companies have agreed to a brief delay in implementing their agreement. 37 14 39 17 54 62 20 8 84 8 7 9 7,788,584,618,680,320 possible interpretations Each word disambiguates the others # definitions
  7. 7. Isn’t the Semantic Web supposed to fix these problems? The Semantic Web was intended to support machine – machine communication to manage the day to day mechanisms of trade, bureaucracy, and daily life (Berners-Lee, 1999).
  8. 8. Web Ontology Language: OWL Semantic Web Line up the information in web pages with predefined categories
  9. 9. Sports Recreation Baseball Basketball Cricket Gloves Basketballs Baseballs Wicket Is a Is a Is a Batter Is a Is a Uses Uses Uses Player Uses Player Ontology: set of concepts, categories, relations Ontologies cast meaning into categories Is a
  10. 10. Ontologies Limit thinking to known tracks
  11. 11. People are creative For example: 20 - 25% of the searches on Google on any day have never been seen before
  12. 12. What categories matter to you? “basketball?” Bouncy things Round things Things to dribble Things that my brother hates Things with a pebbly surface Things that Barack Obama likes Things that float An infinite number of ways to categorize
  13. 13. What’s Truevert ’s solution?
  14. 14. “ The meaning of a word is its use in the language.” — Ludwig Wittgenstein Philosophical Investigations , § 43.
  15. 15. Truevert learns the meaning of words in the same way that people do, from the context in which they are used Truevert works in any language
  16. 16. Gabbro is a dark, coarse-grained, igneous rock formed underground. It is chemically equivalent to basalt. Gabbro is rarely used as a building stone. Do you know the meaning of the word “Gabbro?”
  17. 17. Blah blah blah court blah blah blah lawyer blah blah blah blah bailiff blah blah blah blah blah. Blah blah court blah blah blah basketball blah blah blah blah blah blah freethrow blah blah blah blah. Computer creates model of word use patterns from documents in its vertical Legal vertical Sport vertical
  18. 18. Model identifies characteristic word patterns for vertical Court & (lawyer or bailiff or jury or attorney or …) = legal Court & (basketball or hoops or freethrow or …) = sports
  19. 19. Word use patterns are meaning
  20. 20. Follow your own path Truevert delivers results tuned to your interests
  21. 21. Truevert’s patterns let YOU find the results that YOU are looking for
  22. 22. Green Vertical Semantic Search Results
  23. 23. Truevert is a project of OrcaTec LLC. Headquartered in Ojai, CA. OrcaTec is a leading provider of information discovery software including intelligent semantic search, near duplicate clustering, language identification, email threading, and interesting phrase finding. OrcaTec-developed software was nominated by the Jet Propulsion Laboratory as NASA software of the year 2008. OrcaTec software has been used in electronic discovery and advertising applications as well as knowledge management. Core OrcaTec software is patent pending.
  24. 24. Contact Truevert www.truevert.com [email_address] 805-918-4612

×