Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SearchLove Boston 2017 | Will Critchlow | Building Robot Allegiances

638 views

Published on

Under Sundar Pichai, Google is doubling down on machine learning and artificial intelligence. Computer capabilities are improving at a frightening rate, and there are already parts of our jobs that would be better off done by robots. In this talk, Will is going to highlight the areas where humans are falling behind and give you some tips on what to do about it.

Published in: Marketing
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Nice Job!.... STARTUPS get funding...Send your pitchdeck to over 5700 of VC's and Angel's with just 1 click. Visit: Angelvisioninvestors.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

SearchLove Boston 2017 | Will Critchlow | Building Robot Allegiances

  1. 1. Knowing ranking factors won’t be enough How to avoid losing your job to a robot @willcritchlow
  2. 2. I’m going to tell you about a robot that understands ranking factors better than any of you ...but before I get to that, let’s look at a bit of history...
  3. 3. The other day I searched:
  4. 4. Unsurprisingly, I got an answer
  5. 5. But it got me thinking about how, in 2009, the results would have looked more like this.
  6. 6. In 2009, it would have looked more like this. With every title containing the keyphrase.
  7. 7. In 2009, it would have looked more like this. With every title containing the keyphrase. Most at the beginning.
  8. 8. OK. Maybe wikipedia would have been #1.
  9. 9. My mental model for ~2009 ranking factors had three different modes:
  10. 10. One in the hyper-competitive head My mental model for ~2009 ranking factors had three different modes: One in the competitive mid-tail ...and one in the long-tail
  11. 11. One in the hyper-competitive head
  12. 12. Tons of perfectly on-topic pages to choose from One in the hyper-competitive head
  13. 13. So pick only perfectly-on-topic pages One in the hyper-competitive head
  14. 14. (*) Page authority, but the domain inevitably factors into that calculation. This is why so many homepages ranked One in the hyper-competitive head ...and rank by authority (*)
  15. 15. This resulted in a mix of homepages of mid-size sites, and inner pages on huge sites One in the hyper-competitive head
  16. 16. But the general way to move up was through increased authority One in the hyper-competitive head
  17. 17. Kind of search result Pages ranking To move up... Head Homepages of mid-size sites and inner pages of massive sites. All perfectly-targeted. Improve authority. Mid-tail Long-tail
  18. 18. One in the hyper-competitive head One in the competitive mid-tail
  19. 19. Wealth of ROUGHLY on-topic pages to choose from One in the competitive mid-tail
  20. 20. PERFECTLY on-topic could do well even on a relatively weak site One in the competitive mid-tail
  21. 21. Rank the roughly on-topic pages by authority x “on-topicness” One in the competitive mid-tail
  22. 22. Move up with better targeting or more authority One in the competitive mid-tail
  23. 23. Kind of search result Pages ranking To move up... Head Homepages of mid-size sites and inner pages of massive sites. All perfectly-targeted. Improve authority. Mid-tail Perfectly on-topic pages on relatively weak sites plus roughly on-topic on bigger sites. Improve targeting or authority. Long-tail
  24. 24. One in the hyper-competitive head One in the competitive mid-tail ...and one in the long-tail
  25. 25. In the long-tail, a site of arbitrary weakness could rank if it was the most relevant ...and one in the long-tail
  26. 26. Otherwise, massive sites rank with off-topic pages that mention something similar ...and one in the long-tail
  27. 27. Generally, move up with better targeting ...and one in the long-tail
  28. 28. Kind of search result Pages ranking To move up... Head Homepages of mid-size sites and inner pages of massive sites. All perfectly-targeted. Improve authority. Mid-tail Perfectly on-topic pages on relatively weak sites plus roughly on-topic on bigger sites. Improve targeting or authority. Long-tail Arbitrarily-weak on-topic pages and roughly-targeted deep pages on massive sites. Improve targeting.
  29. 29. Kind of search result Pages ranking To move up... Head Homepages of mid-size sites and inner pages of massive sites. All perfectly-targeted. Improve authority. Mid-tail Perfectly on-topic pages on relatively weak sites plus roughly on-topic on bigger sites. Improve targeting or authority. Long-tail Arbitrarily-weak on-topic pages and roughly-targeted deep pages on massive sites. Improve targeting. So that was ~2009
  30. 30. It’s not so simple any more. Google is harder to understand these days.
  31. 31. PageRank (the first algorithm to use the link structure of the web) We know how we got to ~2009...
  32. 32. Information retrieval PageRank
  33. 33. Information retrieval PageRank Original research
  34. 34. Information retrieval PageRank Original research TWEAKS ...with growing complexity in subsequent years
  35. 35. Particularly this comment from a user called Kevin Lacker (@lacker):
  36. 36. I was thinking about it like it was a math puzzle and if I just thought really hard it would all make sense. -- Kevin Lacker (@lacker)
  37. 37. Hey why don't you take the square root? -- Amit Singhal according to Kevin Lacker (@lacker)
  38. 38. oh... am I allowed to write code that doesn't make any sense? -- Kevin Lacker (@lacker)
  39. 39. -- Amit Singhal according to Kevin Lacker (@lacker) Multiply by 2 if it helps, add 5, whatever, just make things work and we can make it make sense later.
  40. 40. 3 big reasons: High- dimension Non-linear Discontinuous
  41. 41. High- dimension Non-linear Discontinuous
  42. 42. High- dimension Non-linear Discontinuous
  43. 43. High- dimension Non-linear Discontinuous
  44. 44. You might know what any one of the levers does, but they can interact with each other in complex ways This is what a high-dimensional function looks like
  45. 45. High- dimension Non-linear Discontinuous
  46. 46. We sell custom cigar humidors. Our custom cigar humidors are handmade. If you’re thinking of buying a custom cigar humidor, please contact our custom cigar humidor specialists at custom.cigar.humidors@example.com What this needs is another mention of [cigar humidors]
  47. 47. With no mentions of [cigar] or [humidor] this page would be unlikely to rank And yet you can clearly go too far, and have the effect turn negative. This is called nonlinearity. The cigar example is taken directly from Google’s quality guidelines.
  48. 48. High- dimension Non-linear Discontinuous
  49. 49. Discontinuities are steps in the function Think about so-called “over-optimization” tipping points
  50. 50. Think about category pages: Do you recommend removing “SEO text”? We’ve tested it, so we know the answer.
  51. 51. If you said “yes”, congratulations (+3.1% organic sessions in a split-test)
  52. 52. Unless you’re responsible for this site No effect / possible negative effect
  53. 53. No, but I’m still pretty good at this You’re thinking this to yourself right now.
  54. 54. I promised to tell you about a robot that is better than even experienced SEOs... Well. It turns out all we needed was a coin to flip. You’re all fired.
  55. 55. It’s only going to get worse under Sundar Pichai
  56. 56. Who knows who this is? (This is the only CC-licensed photo of him on the internet)
  57. 57. ENHANCE What about now?
  58. 58. John Giannandrea - Google’s head of search Sundar’s choice to lead search after Amit. Previously running machine learning.
  59. 59. ...and of course Jeff Dean is doing Jeff Dean things (c.f. Chuck Norris)
  60. 60. Jeff Dean puts his pants on one leg at a time, but if he had more legs, you would see that his approach is O(log n). Source: Jeff Dean facts
  61. 61. Once, in early 2002, when the search back-ends went down, Jeff Dean answered user queries manually for two hours. Result quality improved markedly during this time
  62. 62. When Jeff Dean goes on vacation, production services across Google mysteriously stop working within a few days. This was reportedly actually true
  63. 63. The original Google Translate was the result of the work of hundreds of engineers over 10 years.
  64. 64. Director of Translate, Macduff Hughes said that it sounded to him as if maybe they could pull off a neural-network-based replacement in three years.
  65. 65. Jeff Dean said “we can do it by the end of the year, if we put our minds to it”.
  66. 66. Hughes: “I’m not going to be the one to say Jeff Dean can’t deliver speed.”
  67. 67. A month later, the work of a team of 3 engineers was tested against the existing system. The improvement was roughly equivalent to the improvement of the old system over the previous 10 years.
  68. 68. Hughes sent his team an email. All projects on the old system were to be suspended immediately. [Read the whole story ]
  69. 69. Background reading:(backchannel, bloomberg)
  70. 70. How to avoid losing your job to a robot This is what you promised, Will.
  71. 71. Let’s start by understanding some robot weaknesses
  72. 72. What’s this?
  73. 73. Ooh. Ooh. I know this one. -- robot
  74. 74. “It’s a leopard. I’m like 99% sure.”
  75. 75. Computers are better than humans at classification, but struggle with adversaries Read more about this here -- Cheetah, Leopard, Jaguar
  76. 76. We don’t fully understand all ML mistakes See: adversarial AI
  77. 77. And when you’re trying to fool the machine... See: adversarial AI
  78. 78. And when you’re trying to fool the machine... See: adversarial AI
  79. 79. You get some really wild examples See: adversarial AI
  80. 80. Lesson: We expect adversarial abilities to take a step backwards They will remain good at classifying bad links but will be likely to fall prey to weird outcomes in adversarial situations
  81. 81. We’re going to see new kinds of bugs
  82. 82. Rules of ML [PDF] outlines engineering lessons from getting ML into production at Google
  83. 83. That document also has a section on trying to understand what the machines are doing
  84. 84. But human explainability may not even be possible Not every concept a neural network uses fits neatly into a concept for which we have a word. It’s not clear this is a weakness per se, but...
  85. 85. ...this means that engineers won’t always know more than we do about why a page does or doesn’t rank The big knowledge gap of the future is data - clickthrough rates, bounce rates etc.
  86. 86. Check out Tom Capper’s presentation on how engineers’ statements can be misleading
  87. 87. ...and remember the confounding split-tests It’s already not always as simple as “feature X is good” Which all means we may need to be more independent-minded and do more of our own research
  88. 88. So how do we fight back?
  89. 89. Michael Lewis’ latest book is about Kahneman and Tversky spelling. It recounts a story about a piece of medical software that existed in the 1960s.
  90. 90. It was designed to encapsulate how a range of doctors diagnosed stomach cancer from x-rays.
  91. 91. It proceeded to outperform those same doctors despite only containing their expertise. Real people have biases, and fool themselves. Encapsulate your own expert knowledge.
  92. 92. At Distilled, we use a methodology we call the balanced digital scorecard. This encapsulates our beliefs about how to build a high-performing business. Applying it helps avoid our own biases.
  93. 93. Also, while we are talking about books, The Checklist Manifesto is an important part of avoiding the same cognitive biases.
  94. 94. Focus on consulting skills I’ve written a few things about this (DistilledU module, writing better business documents, using split-tests to consult better). Use case studies and creativity. Computers are better at diagnosis than cure. This means: getting things done, convincing organizations, applying general knowledge, learning new things.
  95. 95. We are going to need to be better than ever at debugging things. I wrote about debugging skills for non-developers here. A lot of the story of enterprise consulting is going to be about figuring out why things have gone wrong in the face of sparse or incorrect information from Google.
  96. 96. Disregard expert surveys Firstly, there are all the problems outlined in the search result pairs study - both in the ability of experts to understand factors, and in your ability to use the information even if they do. Secondly, they are broken with another bias called the “law of small numbers” from Lewis’ book. PS - I say this as a participant in many of them Me
  97. 97. Equally, building your digital strategy on what Google tells you to do will become an even worse idea than it already is.
  98. 98. This is why we have been investing so much in split-testing Check out odn.distilled.net if you haven’t already. The team will be happy to demo for you. We served ~5 billion requests last quarter and recently published everything from response times to our +£100k / month split test.
  99. 99. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm
  100. 100. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar
  101. 101. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. There are things computers remain bad at, and rankings will become more opaque even to Google engineers
  102. 102. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. There are things computers remain bad at, and rankings will become more opaque even to Google engineers 4. We remain relevant by: a. Using methodologies and checklists to capture human capabilities and avoid our biases
  103. 103. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. There are things computers remain bad at, and rankings will become more opaque even to Google engineers 4. We remain relevant by: a. Using methodologies and checklists to capture human capabilities and avoid our biases b. Becoming great consultants and change agents
  104. 104. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. There are things computers remain bad at, and rankings will become more opaque even to Google engineers 4. We remain relevant by: a. Using methodologies and checklists to capture human capabilities and avoid our biases b. Becoming great consultants and change agents c. Debugging the heck out of everything
  105. 105. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. There are things computers remain bad at, and rankings will become more opaque even to Google engineers 4. We remain relevant by: a. Using methodologies and checklists to capture human capabilities and avoid our biases b. Becoming great consultants and change agents c. Debugging the heck out of everything d. Avoiding being misled by experts or Google
  106. 106. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. There are things computers remain bad at, and rankings will become more opaque even to Google engineers 4. We remain relevant by: a. Using methodologies and checklists to capture human capabilities and avoid our biases b. Becoming great consultants and change agents c. Debugging the heck out of everything d. Avoiding being misled by experts or Google Testing!
  107. 107. What about that robot I promised you? The coin flip wasn’t really it
  108. 108. keras.io
  109. 109. The specifics of DeepRank We started with a broad range of unbranded keywords from our STAT rank tracking. For each of the URLs ranking in the top 10, we gathered key metrics about the domain and page - both from direct crawling and various APIs. We turned this into a set of pairs of URLs {A,B} with their associated keyword, metrics, and their rank ordering. Gather and process training data
  110. 110. The specifics of DeepRank We started with a broad range of unbranded keywords from our STAT rank tracking. For each of the URLs ranking in the top 10, we gathered key metrics about the domain and page - both from direct crawling and various APIs. We turned this into a set of pairs of URLs {A,B} with their associated keyword, metrics, and their rank ordering.
  111. 111. We have so far trained on just 10 metrics for a relatively small sample (hundreds) of keywords. Our current version is only a few layers deep with only 10 hidden dimensions. The current training samples 30 pairs at a time and trains against them for 500 epochs. The specifics of DeepRank Train the model Gather and process training data
  112. 112. Train the model Gather and process training data The specifics of DeepRank Model The next task is to get way more metrics for thousands of keywords. This will enable us to train a much deeper model for much longer without overfitting. We also have some more hyperparameter tuning to do.
  113. 113. To run the model, we input a pair of pages with their associated metrics. New input
  114. 114. Model New input
  115. 115. We get back a probability of page A outranking page B. Model Probability- weighted predictions New input
  116. 116. The goal is a winning combination of human and machine Human + computer beats computer (for now)
  117. 117. Questions: @willcritchlow
  118. 118. ● Mobius strip ● Confusion ● Signal box ● Cigar ● Discontinuity ● Confidence ● Burt Totaro ● Sundar Pichai ● John Giannandrea ● Chuck Norris ● Jeff Dean ● Fencing ● Keyboard Image credits ● Go ● Robot ● Leopard print sofa ● Leopard ● Bug ● Lego robots ● Iron Man ● Boston

×