Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Brave new search world


Published on

Keynote by Ran Hock.
3 maart 2016, VOGIN-IP-lezing, Amsterdam

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Brave new search world

  1. 1. 1 Brave New Search World Ran Hock Online Strategies
  2. 2. 2 Brave New Search World • The nature of “search” is changing radically. • Structure is being created from (relatively) unstructured data. • The “Semantic Web” is becoming an actuality. • Natural Language Processing (NLP) and other technologies are being extensively applied to search and search-related activities.
  3. 3. 3 Brave New Search World • These technologies are making the following kinds of things happen: – “Knowledge graphs” – “Entity” identification in numerous applications – Natural language search statements – Actual searching of images (not just of image metadata) • These advances are coming not just from Google but from numerous services, especially for “news” search.
  4. 4. 4 Some Themes/Perspectives • What is happening is more evolutionary than revolutionary. Many, but not all, of the "pieces" of the technology have been around for a while. • Structure is being derived out of (not totally) chaos. We are going from words to meaning. • Google isn’t the only player here. • We can take real advantage of the developments. • Using what you already know about “search” is important.
  5. 5. 5 Unstructuredness of Data • Part of the “organization of knowledge” problem • Particularly acute for textual material • To a computer, a “word” is a string of characters bounded by spaces or punctuation and has no “meaning”. • When we are searching for something, we are searching for meaningful things, not character strings. • Meaning can be derived from context by the use of NLP.
  6. 6. 6 Where We Were Recently • Boolean Logic – Actually a precursor/example of Artificial Intelligence (AI) applied to “search”. – Still a part of search AI • Boolean is (from our infancy) a central aspect of how we think, a part of our “consciousness” • Old approach: Searching by concepts
  7. 7. 7 Where We Were Recently “Old” (circa 1975 – 2???) search strategy (searching by “concepts”) OR
  8. 8. 8 Where We Were Recently (cont.) • Ranking of web search results was/is based on a wide range (ca 200) factors, “signals” • User-controlled field searching (intitle: etc.) • Etc.
  9. 9. 9 The “Newer” Technologies • Semantic Web Technologies • Artificial Intelligence (AI) used at a broad level and utilizing various AI subfields • AI - Expert Systems approaches • AI - Natural Language Processing (NLP) • AI - NLP - Entity identification (extraction, disambiguation, classification, etc.) • AI - Machine Learning • Big Data processing
  10. 10. 10 Technologies: The Semantic Web • W3C “informal” definition – "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” (from Tim Berners-Lee et al, The Semantic Web. Scientific American, May 2001.)
  11. 11. 11 Technologies: The Semantic Web • Essence: • “strings to things” • “words to meaning” • Technologically accomplished on webpages by means of a specialized xml markup language, etc.
  12. 12. 12 Technologies: The Semantic Web • Idea born pre-1999 • In practice, also requires other technologies such as Natural Language Processing, etc. • 2006 - Berners-Lee and colleagues stated that: "This simple idea…remains largely unrealized". • 2013 - more than four million Web domains contained Semantic Web markup.
  13. 13. 13 Technologies: AI - Expert Systems • Search results ranking has long used an “expert systems” approach, mimicking what an experienced researcher looks for: – Words appearing in the title – Number of times cited (linked-to) – Proximity of words – Words in the abstract – Words in headings – Etc. • This will continue, more and more automatically.
  14. 14. 14 Technologies: Natural Language Processing • A part of artificial intelligence and computational linguistics • Deals with helping computers “understand” written and spoken languages • Plays a key role in voice input for search, natural language search statements, translations, and more.
  15. 15. 15 Technologies: Natural Language Processing Google's syntactic systems • predict part-of-speech tags for each word in a given sentence, • identify morphological features such as gender and number. • label relationships between words, such as subject, object, modification, etc. • leverage large amounts of unlabeled data • incorporate neural net technology.
  16. 16. 16 Technologies: Natural Language Processing Google’s semantic systems • identify entities in free text, • label them with types (such as person, location, or organization), • cluster mentions of those entities within and across documents (co-reference resolution), • incorporates multiple sources of knowledge and information to aid with analysis of text
  17. 17. 17 Technologies: Entity Extraction • A.k.a. named-entity recognition, entity identification • Complementary to other natural language processing • Identifies things, people, places, etc. within text (and speech). • Relates to the idea of concepts referred to earlier. • Because “text” is based on language, “structure” is there but the structure is not readily evident to a computer.
  18. 18. 18 Technologies: Entity Extraction • Context-based connections allow discernment of different meanings of a word. • Entity extraction draws inferences based on the logical content of the data. • Entity extraction may be the single most important tool for bringing structure to unstructured data, specifically text. • Also used for search query “suggestions”. • An excellent example is found in Silobreaker.
  19. 19. 19
  20. 20. 20
  21. 21. 21
  22. 22. 22 Technologies: Machine Learning Computers teaching themselves Google RankBrain • Used in processing search results, part of Google’s Hummingbird search algorithm • A way of interpreting a search statement in order to find web pages that may not have the specific words in the search statement. • Uses patterns from seemingly unconnected other “complex” searches to find similarities in the current search, then applying that information to most likely useful content. • Google regards this as the third most important signal.
  23. 23. 23 Technologies: Big Data • The existence of “big data” collections provides unprecedented opportunities for computational approaches for computers to “understand” text. • In neural networking image entity identification experiments, the accuracy of machine learning algorithms improves vastly when used with large pools of data. • "...Google’s search engine queries a 100 petabyte index that incorporates over 200 indicators and whose algorithms change more than 500 times per year."
  24. 24. 24 Specific Applications of These (and Other) Technologies • Continued gradual incorporation of “expert” techniques • Natural language search statements • Search by voice • Image recognition and search: search of images, search by image, and facial recognition • Knowledge Graphs • Entities in news search
  25. 25. 25 Gradual Incorporation of “Expert” Techniques • An “ordinary” search isn’t what it used to be. • Google has now quietly taken over more of the “old” “professional searcher” techniques and now automatically adds not just word variants, but synonyms.
  26. 26. 26 Gradual Incorporation of “Expert” Techniques • Suggested searches (based on known connections and not just based on your character string) A "data-driven" approach - trillions of words, vs "rules“. Not just word variants. • The old “synonyms” (~diet) option didn’t just go away. It is now applied automatically. (Few people use the OR.)
  27. 27. 27 Gradual Incorporation of “Expert” Techniques • “Did you mean” is now more often “Showing results for”
  28. 28. 28 Gradual Incorporation of “Expert” Techniques • “Fuzzy Logic” – As well as searching for words that are “close”, Google may drop some of your “concepts” for some records
  29. 29. 29 Gradual Incorporation of “Expert” Techniques – If Google “thinks” you want specific facts and “sees” a matching answer, you may get that immediately.
  30. 30. 30 Specific Applications: Natural Language Search Statements • Don’t hesitate to use them! • The above two searches give different (and relevant) answers • This is especially important for Google Now and Siri!
  31. 31. 31 Specific Applications: Voice Search • Apple (iOS) - Siri • Google – Google Now • Bing – Cortana (recently deceased?) • These “expect” natural language, so natural language will yield the best results.
  32. 32. 32 Specific Applications: Image Recognition and Search: Search of Images Not much recent obvious change in Bing’s or Google’s regular image search, but: • “Categorization” (aspect of entity extraction) is now shown on image search results pages • Google, Microsoft (Bing) and Apple are heavy into research on image identification and classification. • What’s happening/coming can be anticipated by looking at Google Photos.
  33. 33. 33 Specific Applications: Image Recognition and Search: Search of Images Bing Image Search
  34. 34. 34 Specific Applications: Image Recognition and Search: Search of Images
  35. 35. 35 Specific Applications: Image Recognition and Search: Search of Images • In December 2015, Microsoft beat out 5 competitors (including Google) in the ImageNet contest for machine recognition of images • Machines were trained to recognize images using a “deep neural networking” method. • Competitors must locate and identify objects from 100,000 photographs found in Flickr and search engines and then place them in 1,000 object categories. • Microsoft, the winner, had an error rate of 3.5 percent for classification and 9 percent for localization. • Machine learning using neural networking is also very successfully used for translations, such as in Skype’s new translation offering
  36. 36. 36 Specific Applications: Image Recognition and Search: Search by Image
  37. 37. 37 Specific Applications: Image Recognition and Search: Entity and Facial Recognition in Google Photos
  38. 38. 38 Specific Applications: Knowledge Graphs • Knowledge graphs do not originate with Google (but Google has made the term widely known.) • “Knowledge graph theory was initiated by C. Hoede, a discrete mathematician at the University of Twente and F.N. Stokman, a mathematical sociologist at the University of Groningen, both in the Netherlands.” (ca 1982)
  39. 39. 39 Specific Applications: Google Knowledge Graph • The Google Knowledge Graph, overall, is a database about “things” and the connections between those things. • Delivers and summarizes key facts about people, places, things. • The selection of those facts is based on connections regarding that entity and related entities and on what other users have asked about that entity.
  40. 40. 40 Specific Applications: Google Knowledge Graph • Launched May 2012 • At its heart, Google Knowledge Graph is a database of facts. • At that time it contained 18 billion facts between 570 million objects. • The kinds of things included vary with the kind of entity. • Content comes primarily from Wikipedia, World Factbook, Freebase/Wikidata, plus other sources.
  41. 41. 41
  42. 42. 42
  43. 43. 43 Specific Applications: Google Knowledge Graph • The key power of Google Knowledge Graph lies in its utilization of connections between entities as searched for by other users. • At present, its present main weakness is its heavy un-vetted reliance on Wikipedia, which is not always right, e.g., the Wikipedia article on Knowledge Graph.
  44. 44. 44 WRONG!
  45. 45. 45
  46. 46. 46 Bing’s Knowledge Graph • Named “Snapshot”, it uses Bing’s Satori technology • Launched in June 2012 • Utilizes Wikipedia, Freebase, Qwiki, LinkedIn, Britannica, etc. • Builds into results interactive features such as audio and video
  47. 47. 47
  48. 48. 48
  49. 49. 49 Specific Applications: News Applications Examples of News Sites Effectively Using These Technologies • Silobreaker (example shown earlier) • EMM
  50. 50. 50 Specific Applications: News Applications EMM – European Media Monitor • From the European Commission • Computerized analysis of news trends and story content • Makes extensive use of NLP techniques for entity extraction and clustering • “Organizes” a vast quantity of knowledge very efficiently.
  51. 51. 51
  52. 52. 52
  53. 53. 53
  54. 54. 54 So, How do we as researchers take advantage of this? • Get in the habit of using what's new (Siri, Google Now, natural language). Join the Evolution! • Actually pay attention to Google Instant (suggestions). • Don't forsake the old. There are times when you need to turn the auto-pilot off and take charge. • Ask questions you didn't bother asking before [because you didn't think the search engine would do it.]
  55. 55. 55 So, how do we as researchers take best advantage of this? • Increase awareness of information quality criteria • Worry a bit - – Worrisome - the general public's further reliance on quick, single, local, twitter-length answers – Worrisome - Localization, – Worrisome -"echo chambers“ – " Machines making decisions on our behalf” • Enjoy the new.
  56. 56. 56 Questions? Ran Hock Online Strategies