Optimizing Unstructured Data

8,723 views

Published on

Semantic search isn't just about structured data. The way we write can help Google to understand content and, if you know what you're doing, allow you to land your content in an AnswerBox. Find out how to turn unstructured content into structured data and why SEO is all about reducing friction.

Published in: Marketing
1 Comment
20 Likes
Statistics
Notes
No Downloads
Views
Total views
8,723
On SlideShare
0
From Embeds
0
Number of Embeds
4,229
Actions
Shares
0
Downloads
23
Comments
1
Likes
20
Embeds 0
No embeds

No notes for slide

Optimizing Unstructured Data

  1. 1. Optimizing Unstructured Data
  2. 2. @ajkohn @SEMpdx #SearchFest
  3. 3. My name is AJ Kohn
  4. 4. Blind Five Year Old Since 2007
  5. 5. Making the complex simple
  6. 6. Semantic Search
  7. 7. We have a problem
  8. 8. Ugh, as if!
  9. 9. WHAT?!
  10. 10. WHAT?!
  11. 11. Semantic search is about understanding meaning
  12. 12. OKAY!
  13. 13. OKAY!
  14. 14. Context
  15. 15. Context matters
  16. 16. Context matters
  17. 17. Natural Language Processing
  18. 18. Finding all expressions that refer to the same entity in a text Coreference Resolution Part of Speech (POS) Tagging Assign a part of speech to each word in a text
  19. 19. The word quiet isn’t spelled wrong but Google knew that I probably meant to write quite awesome instead
  20. 20. Machine learning
  21. 21. Making predictions based on patterns and rules from prior data
  22. 22. Google is better at getting meaning from text because of access to more data
  23. 23. Entities
  24. 24. Letters and Words
  25. 25. Things
  26. 26. “New York” hasPopulation: 8.046 Million hasPointsofInterest: Empire State Building hasAddress: 350 5th Avenue hasHeight: 1,250 feet
  27. 27. The Knowledge Graph
  28. 28. Connections and relationships between entities and documents
  29. 29. Named Entity Recognition (NER)
  30. 30. One size doesn’t fit all
  31. 31. Context-Dependent Fine-Grained Entity Type Tagging
  32. 32. Not just any entities but salient entities
  33. 33. 66 entities on a page and less than 5% are salient http://bit.ly/bigdealentities
  34. 34. How do you train a machine learning model to identify salient entities?
  35. 35. Hello McFly!
  36. 36. Word up
  37. 37. Word to your mother
  38. 38. Words
  39. 39. “Keywords don’t matter anymore”
  40. 40. Ice Bear cried, but just inside
  41. 41. I love structured data but optimizing unstructured data is far more powerful
  42. 42. Text on the page is more important now
  43. 43. Words = Entities ^ Context ^ Meaning
  44. 44. We can turn unstructured content into structured data
  45. 45. How much do you trust Google? How much do you trust Google?
  46. 46. Stop writing for people and start writing for search engines http://bit.ly/focusedwriting
  47. 47. 28%
  48. 48. Most users don’t read but skim and scan instead http://bit.ly/usersdontread
  49. 49. First you looked here Then here
  50. 50. A penny for a paragraph return
  51. 51. Mirroring
  52. 52. Not only do we mirror body language we seek it out when searching
  53. 53. Keyword rich text and subheads allow users to resume reading at any time
  54. 54. Keyword is not a four letter word
  55. 55. Better to you query syntax call it
  56. 56. But what about user delight?
  57. 57. Could you not
  58. 58. Task Completion > Aesthetics
  59. 59. Our job is to reduce friction
  60. 60. After writing your content go back and find where you can replace pronouns with nouns Remember that readers won’t often ‘see’ these nouns but will use them as visual signposts
  61. 61. “It’s such a gorgeous work of art” “Lobster and Cat is a beautiful painting” ArtworkType: paintingArtworkTitle: Lobster and Cat hasArtist: Pablo Picasso
  62. 62. Intent
  63. 63. Google may better understand the meaning of my query but do they know why I’m searching?
  64. 64. Why are they really searching?
  65. 65. Why are they really searching? Common Problems with the Eureka 4870 Eureka 4870 Troubleshooting Tips Local Vacuum Cleaner Repair Shops Eureka 4870 Replacement Parts Guide to Buying a New Vacuum Cleaner
  66. 66. Why are they really searching? Common Problems with the Eureka 4870 Eureka 4870 Troubleshooting Tips Local Vacuum Cleaner Repair Shops Eureka 4870 Replacement Parts Guide to Buying a New Vacuum Cleaner
  67. 67. Our job is to decode the intent from the query syntax http://bit.ly/aggregatingintent
  68. 68. Target the keyword Optimize the intent
  69. 69. What are we really talking about?
  70. 70. This is a factbox triggered by entities and the Knowledge Graph
  71. 71. This answerbox is triggered by semi-structured data
  72. 72. This answerbox is triggered by specific patterns of text
  73. 73. Answerbox triggered by patterns of text and specific understanding
  74. 74. Answerbox triggered by patterns of text and specific understanding
  75. 75. Answerbox triggered by patterns of text and semi-structured data
  76. 76. Answerbox triggered by patterns of text and specific understanding
  77. 77. Game's the same, just got more fierce
  78. 78. Skate to where the puck is going to be, not to where it has been
  79. 79. The Link Graph
  80. 80. The Link Graph + Scored Entities <entity A> <entity B> <entity C> <entity B> <entity C> <entity A> <entity A> <entity D> <entity B> <entity D>
  81. 81. Entity authority could flow through links similar to anchor text
  82. 82. TL;DL
  83. 83. We can help Google to find structure, entities and meaning in our content The easier we make it, the more likely we are to satisfy robots and humans
  84. 84. AJ Kohn Owner, Blind Five Year Old www.blindfiveyearold.com aj@blindfiveyearold.com @ajkohn

×