Why Semantic Search Is Hard
Upcoming SlideShare
Loading in...5

Why Semantic Search Is Hard



Describes problems with semantic search and how Truevert technology overcomes them

Describes problems with semantic search and how Truevert technology overcomes them



Total Views
Views on SlideShare
Embed Views



3 Embeds 6

http://www.linkedin.com 3
https://www.linkedin.com 2
http://www.slideshare.net 1


Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Why Semantic Search Is Hard Why Semantic Search Is Hard Presentation Transcript

  • Why is Semantic Search so Hard? and What Truevert Does About It Powered by www.truevert.com www.orcatec.com
  • Semantic search harnesses the meaning of words to improve the quality of search results
  • Using meaning is difficult View slide
  • Language is dynamic Jabberwocky Effect Humpty Dumpty Syndrome Making up new words Using old words in new ways blog Twitter View slide
  • Strike Bank Words are ambiguous
  • How ambiguous? Look it up! The companies have agreed to a brief delay in implementing their agreement. 37 14 39 17 54 62 20 8 84 8 7 9 7,788,584,618,680,320 possible interpretations Each word disambiguates the others # definitions
  • Isn’t the Semantic Web supposed to fix these problems? The Semantic Web was intended to support machine – machine communication to manage the day to day mechanisms of trade, bureaucracy, and daily life (Berners-Lee, 1999).
  • Web Ontology Language: OWL Semantic Web Line up the information in web pages with predefined categories
  • Sports Recreation Baseball Basketball Cricket Gloves Basketballs Baseballs Wicket Is a Is a Is a Batter Is a Is a Uses Uses Uses Player Uses Player Ontology: set of concepts, categories, relations Ontologies cast meaning into categories Is a
  • Ontologies Limit thinking to known tracks
  • People are creative For example: 20 - 25% of the searches on Google on any day have never been seen before
  • What categories matter to you? “basketball?” Bouncy things Round things Things to dribble Things that my brother hates Things with a pebbly surface Things that Barack Obama likes Things that float An infinite number of ways to categorize
  • What’s Truevert ’s solution?
  • “ The meaning of a word is its use in the language.” — Ludwig Wittgenstein Philosophical Investigations , § 43.
  • Truevert learns the meaning of words in the same way that people do, from the context in which they are used Truevert works in any language
  • Gabbro is a dark, coarse-grained, igneous rock formed underground. It is chemically equivalent to basalt. Gabbro is rarely used as a building stone. Do you know the meaning of the word “Gabbro?”
  • Blah blah blah court blah blah blah lawyer blah blah blah blah bailiff blah blah blah blah blah. Blah blah court blah blah blah basketball blah blah blah blah blah blah freethrow blah blah blah blah. Computer creates model of word use patterns from documents in its vertical Legal vertical Sport vertical
  • Model identifies characteristic word patterns for vertical Court & (lawyer or bailiff or jury or attorney or …) = legal Court & (basketball or hoops or freethrow or …) = sports
  • Word use patterns are meaning
  • Follow your own path Truevert delivers results tuned to your interests
  • Truevert’s patterns let YOU find the results that YOU are looking for
  • Green Vertical Semantic Search Results
  • Truevert is a project of OrcaTec LLC. Headquartered in Ojai, CA. OrcaTec is a leading provider of information discovery software including intelligent semantic search, near duplicate clustering, language identification, email threading, and interesting phrase finding. OrcaTec-developed software was nominated by the Jet Propulsion Laboratory as NASA software of the year 2008. OrcaTec software has been used in electronic discovery and advertising applications as well as knowledge management. Core OrcaTec software is patent pending.
  • Contact Truevert www.truevert.com [email_address] 805-918-4612