Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site!

6,012 views

Published on

Presentation delivered on 12.04.19, at April's edition of Brighton SEO. Contains an introduction, basic & more advanced usages of Chrome Puppeteer, Headless Chrome & how you can use it to monitor your site!

Published in: Software

BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site!

  1. 1. slideshare.net/Tom-Pool How To Use Chrome Puppeteer To Fake Googlebot And Monitor Your Site Tom Pool // BlueArray // @cptntommy
  2. 2. Who Am I? @cptntommy #BrightonSEO
  3. 3. @cptntommy #BrightonSEO
  4. 4. Look After Technical Output Of The Agency @cptntommy #BrightonSEO
  5. 5. Always Trying To Find Ways To Make My Teams Job Easier @cptntommy #BrightonSEO
  6. 6. So I Was Watching Google I/O 18 (Which Is Awesome BTW) @cptntommy #BrightonSEO
  7. 7. And I Saw A Really Really Really Cool Talk @cptntommy #BrightonSEO
  8. 8. Eric Bidelman @cptntommy #BrightonSEO
  9. 9. This Got Me Thinking @cptntommy #BrightonSEO
  10. 10. I Can Use This To Help Me With My Job! @cptntommy #BrightonSEO
  11. 11. So I Went Away & Did A Shit Ton Of Research @cptntommy #BrightonSEO
  12. 12. That Included @cptntommy #BrightonSEO
  13. 13. Headless Chrome @cptntommy #BrightonSEO
  14. 14. Chrome @cptntommy #BrightonSEO
  15. 15. And A Little Bit Of Coding @cptntommy #BrightonSEO
  16. 16. (Not Much!) @cptntommy #BrightonSEO
  17. 17. I Want All Of You To At Least Take @cptntommy #BrightonSEO
  18. 18. A Small Piece Of Knowledge From This @cptntommy #BrightonSEO
  19. 19. I’ll Also Tweet Out This Deck @cptntommy #BrightonSEO
  20. 20. So... @cptntommy #BrightonSEO
  21. 21. What Is Headless Chrome? @cptntommy #BrightonSEO
  22. 22. @cptntommy #BrightonSEO
  23. 23. @cptntommy #BrightonSEO
  24. 24. @cptntommy #BrightonSEO
  25. 25. @cptntommy #BrightonSEO
  26. 26. Headless Chrome = None Of That Shit @cptntommy #BrightonSEO
  27. 27. @cptntommy #BrightonSEO
  28. 28. @cptntommy #BrightonSEO
  29. 29. Google Chrome Is Running, But With No User Interface @cptntommy #BrightonSEO
  30. 30. So It Is ‘Headless’ @cptntommy #BrightonSEO
  31. 31. Why Should You Even Care? @cptntommy #BrightonSEO
  32. 32. You Can: @cptntommy #BrightonSEO
  33. 33. Scrape The Shit Out Of (JS) Websites @cptntommy #BrightonSEO
  34. 34. Copy The DOM, & Paste To A Text File @cptntommy #BrightonSEO
  35. 35. Compare Source Code With DOM & Export Differences @cptntommy #BrightonSEO
  36. 36. Generate Screenshots of Pages @cptntommy #BrightonSEO
  37. 37. Crawl Single Page Applications @cptntommy #BrightonSEO
  38. 38. I Know, JS Is Evil, But It Ain’t Going Away! @cptntommy #BrightonSEO
  39. 39. Screaming Frog Does Have JS Rendering Features. Utilises (Something Like) Headless Chrome @cptntommy #BrightonSEO
  40. 40. Google Can Render JS, But It Is In No Way Perfect, Or Even That Effective @cptntommy #BrightonSEO
  41. 41. Countless Case Studies @cptntommy #BrightonSEO
  42. 42. Crawl Single Page Applications @cptntommy #BrightonSEO
  43. 43. Automate WebPage Checks @cptntommy #BrightonSEO
  44. 44. Used For Webpage Testing (Clicking On Buttons, Filling In Forms, General Fuckery) @cptntommy #BrightonSEO
  45. 45. Great For Emulating User Behaviour! @cptntommy #BrightonSEO
  46. 46. Great For Seeing How Much Shit A Website Can Take Before It Breaks! @cptntommy #BrightonSEO
  47. 47. The Problem Is... @cptntommy #BrightonSEO
  48. 48. You Have To Run Basic Headless Chrome From Command Line @cptntommy #BrightonSEO
  49. 49. @cptntommy #BrightonSEO
  50. 50. /Applications/Google Chrome.app/Contents/MacOS/Google Chrome @cptntommy #BrightonSEO
  51. 51. /Applications/Google Chrome.app/Contents/MacOS/Google Chrome --headless @cptntommy #BrightonSEO
  52. 52. /Applications/Google Chrome.app/Contents/MacOS/Google Chrome --headless --remote-debugging- port=9222 @cptntommy #BrightonSEO
  53. 53. /Applications/Google Chrome.app/Contents/MacOS/Google Chrome --headless --remote-debugging- port=9222 --disable-gpu @cptntommy #BrightonSEO
  54. 54. /Applications/Google Chrome.app/Contents/MacOS/Google Chrome --headless --remote-debugging- port=9222 --disable-gpu https://www.bluearray.co.uk @cptntommy #BrightonSEO
  55. 55. Now @cptntommy #BrightonSEO
  56. 56. I Really Really Love Using Command Line @cptntommy #BrightonSEO
  57. 57. @cptntommy #BrightonSEO
  58. 58. But This Really Really Made Me Cry @cptntommy #BrightonSEO
  59. 59. So How Do I Make It Easy? @cptntommy #BrightonSEO
  60. 60. Like I Said - I’m Always Trying To Make My Job Easier @cptntommy #BrightonSEO
  61. 61. And This Was Not Easy! @cptntommy #BrightonSEO
  62. 62. So I Went Away & Did A Bigger Shit Ton Of Research @cptntommy #BrightonSEO
  63. 63. Eric Bidelman @cptntommy #BrightonSEO
  64. 64. What Is Chrome Puppeteer? @cptntommy #BrightonSEO
  65. 65. @cptntommy #BrightonSEO
  66. 66. BlahBlahBlahBlahBlahBlahBlah BlahBlahBlahBlahBlahBlahBlahBlahBlah BlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlah BlahBlahBlahBlahBlahBlahBlahBlahBlahBlah BlahBlahBlahBlahBlahBlahBlah @cptntommy #BrightonSEO
  67. 67. OOOOOOOO API @cptntommy #BrightonSEO
  68. 68. Node Can Be Used For Making Applications @cptntommy #BrightonSEO
  69. 69. And It Can Also Be Used To help Control Headless Chrome @cptntommy #BrightonSEO
  70. 70. And Trust Me It’s Easy! @cptntommy #BrightonSEO
  71. 71. So How Can I Get Chrome Puppeteer? @cptntommy #BrightonSEO
  72. 72. If You Want To Run Tests On Your Local Machine @cptntommy #BrightonSEO
  73. 73. You Have To Install NPM & Node.js @cptntommy #BrightonSEO
  74. 74. @cptntommy #BrightonSEO
  75. 75. Someone’s Made This Easy! @cptntommy #BrightonSEO
  76. 76. So If You Are On PC @cptntommy #BrightonSEO
  77. 77. It’s Pretty Straightforward @cptntommy #BrightonSEO
  78. 78. Just Install From The Node.js Websites @cptntommy #BrightonSEO
  79. 79. bit.ly/pc-pup-brighton19 @cptntommy #BrightonSEO
  80. 80. If You Are On Mac @cptntommy #BrightonSEO
  81. 81. (Like Me) @cptntommy #BrightonSEO
  82. 82. It’s Not That Easy @cptntommy #BrightonSEO
  83. 83. bit.ly/pupbrighton19 @cptntommy #BrightonSEO
  84. 84. You Wanna Open Up Terminal @cptntommy #BrightonSEO
  85. 85. @cptntommy #BrightonSEO
  86. 86. ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/ install/master/install)" @cptntommy #BrightonSEO
  87. 87. This Installs Homebrew, That Makes Everything E-Z @cptntommy #BrightonSEO
  88. 88. @cptntommy #BrightonSEO
  89. 89. @cptntommy #BrightonSEO
  90. 90. When This Has Done Its Thing @cptntommy #BrightonSEO
  91. 91. You Have To Install 2 More Things, And We’ll Be Ready To Rock @cptntommy #BrightonSEO
  92. 92. brew install node @cptntommy #BrightonSEO
  93. 93. @cptntommy #BrightonSEO
  94. 94. And Then @cptntommy #BrightonSEO
  95. 95. npm i puppeteer @cptntommy #BrightonSEO
  96. 96. Now You Are All Good! @cptntommy #BrightonSEO
  97. 97. You Can Now Run Chrome Puppeteer On Your Machine! @cptntommy #BrightonSEO
  98. 98. For Example @cptntommy #BrightonSEO
  99. 99. If I Wanted To Take A Screenshot Of A Single Webpage @cptntommy #BrightonSEO
  100. 100. There Is A Bunch Of Code Coming Up @cptntommy #BrightonSEO
  101. 101. That Can All Be Seen In The Following Link (I’ll Also Tweet It) @cptntommy #BrightonSEO
  102. 102. https://bit.ly/Brighton SEO19 @cptntommy #BrightonSEO
  103. 103. @cptntommy #BrightonSEO
  104. 104. let browser = await puppeteer.launch({headless: true}); @cptntommy #BrightonSEO
  105. 105. let page = await browser.newPage(); @cptntommy #BrightonSEO
  106. 106. await page.goto('https://www. bluearray.co.uk/'); @cptntommy #BrightonSEO
  107. 107. await page.screenshot({ @cptntommy #BrightonSEO
  108. 108. await page.screenshot({ path: './testimg.jpg', @cptntommy #BrightonSEO
  109. 109. await page.screenshot({ path: './testimg.jpg', type: 'jpeg'}); @cptntommy #BrightonSEO
  110. 110. await page.close(); await browser.close(); @cptntommy #BrightonSEO
  111. 111. File Is Saved As Screenshot.js @cptntommy #BrightonSEO
  112. 112. So To Run This Small Piece Of Code @cptntommy #BrightonSEO
  113. 113. Go To Terminal (In Same Folder As Code), And Type In @cptntommy #BrightonSEO
  114. 114. Node Screenshot.js @cptntommy #BrightonSEO
  115. 115. And Then, 5 Seconds later, @cptntommy #BrightonSEO
  116. 116. @cptntommy #BrightonSEO
  117. 117. If You Wanted To See The Browser Do These Steps @cptntommy #BrightonSEO
  118. 118. let browser = await puppeteer.launch({headless: True}); @cptntommy #BrightonSEO
  119. 119. let browser = await puppeteer.launch({headless: False}); @cptntommy #BrightonSEO
  120. 120. You Can Also Provide A List Of URLs @cptntommy #BrightonSEO
  121. 121. @cptntommy #BrightonSEO And Get A Shit Ton Of Screenshots!
  122. 122. Now I’m Sure You Can See Where This Is Headed @cptntommy #BrightonSEO
  123. 123. Faking Googlebot! @cptntommy #BrightonSEO
  124. 124. With A Few Tweaks to The Code @cptntommy #BrightonSEO
  125. 125. await page.setUserAgent ('Googlebot'); @cptntommy #BrightonSEO
  126. 126. Googlebot’s User Agent Is Not Just ‘Googlebot’ @cptntommy #BrightonSEO
  127. 127. It’s Fuck*** Huge @cptntommy #BrightonSEO
  128. 128. Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.ht@cptntommy #BrightonSEO
  129. 129. And Then You Gotta Set Googlebot’s Viewport @cptntommy #BrightonSEO
  130. 130. await page.setViewport @cptntommy #BrightonSEO
  131. 131. await page.setViewport ({width: 1024, height: 1024}); @cptntommy #BrightonSEO
  132. 132. FYI This Is Not Really Googlebot @cptntommy #BrightonSEO
  133. 133. As Unfortunately @cptntommy #BrightonSEO
  134. 134. Can’t Change Chrome Version That Puppeteer Uses To 41 :( @cptntommy #BrightonSEO
  135. 135. As Chrome Puppeteer Was Released After Chrome 41 (*Not Backwards Compatible) @cptntommy #BrightonSEO
  136. 136. However! @cptntommy #BrightonSEO
  137. 137. Can Be Persuasive In Getting A Client To Ensure Their Content Is SSR’d (If Needed) @cptntommy #BrightonSEO
  138. 138. Chrome Puppeteer Can Be Installed On The Server @cptntommy #BrightonSEO
  139. 139. We Can Then Provide Puppeteer With A List Of URLs, And It Can Work Through Them All @cptntommy #BrightonSEO
  140. 140. And Show How They Would Appear To Google, Instead Of @cptntommy #BrightonSEO
  141. 141. In The Case Of Some JS Sites @cptntommy #BrightonSEO
  142. 142. @cptntommy #BrightonSEO
  143. 143. A Blank Page @cptntommy #BrightonSEO
  144. 144. Which Is Cool & A Nice Trick @cptntommy #BrightonSEO
  145. 145. But The Really Cool Stuff Is Yet To Come @cptntommy #BrightonSEO
  146. 146. So Who Here Has Heard Of (Or Used) ContentKing? @cptntommy #BrightonSEO
  147. 147. It’s Fairly Awesome @cptntommy #BrightonSEO
  148. 148. Allows You To Monitor A Site In Real-Time @cptntommy #BrightonSEO
  149. 149. With It Letting you Know Of Any Issues @cptntommy #BrightonSEO
  150. 150. Meta Changes, New 404 Errors, Updated Links…. @cptntommy #BrightonSEO
  151. 151. BUT @cptntommy #BrightonSEO
  152. 152. Like Most Good Tools, It Costs Money @cptntommy #BrightonSEO
  153. 153. Maybe You Don’t Wanna Eat Into Your Budget @cptntommy #BrightonSEO
  154. 154. This Next Example Shows How We Can Use Puppeteer @cptntommy #BrightonSEO
  155. 155. Monitor Your Site When You Want & Report Of Any Changes To Key Areas @cptntommy #BrightonSEO
  156. 156. Including @cptntommy #BrightonSEO
  157. 157. Title Changes @cptntommy #BrightonSEO
  158. 158. Description Changes @cptntommy #BrightonSEO
  159. 159. Word Count Increases/Decreases @cptntommy #BrightonSEO
  160. 160. Robots Directives @cptntommy #BrightonSEO
  161. 161. Canonicals @cptntommy #BrightonSEO
  162. 162. So Basically The REALLY Important Shit In The HTML @cptntommy #BrightonSEO
  163. 163. So I Wrote Some Code @cptntommy #BrightonSEO
  164. 164. As With All Code, Required A Bit Of Research @cptntommy #BrightonSEO
  165. 165. @cptntommy #BrightonSEO
  166. 166. And With A Bit Of Luck, @cptntommy #BrightonSEO
  167. 167. We Now Have A Way To Monitor Basic Areas Of Sites! @cptntommy #BrightonSEO
  168. 168. So. @cptntommy #BrightonSEO
  169. 169. There Is About 200 Lines Of Code @cptntommy #BrightonSEO
  170. 170. @cptntommy #BrightonSEO
  171. 171. And I Don’t Have Time To Go Through The Full Thing @cptntommy #BrightonSEO
  172. 172. But @cptntommy #BrightonSEO
  173. 173. There Are A Few Interesting Snippets I’d Like To Share @cptntommy #BrightonSEO
  174. 174. We Launch Headless Chrome & Puppeteer As Highlighted A Minute Ago @cptntommy #BrightonSEO
  175. 175. const browser = await puppeteer.launch(); const page = await browser.newPage(); @cptntommy #BrightonSEO
  176. 176. Provide A List Of URLs For Puppeteer To Go And Play With @cptntommy #BrightonSEO
  177. 177. try {data = fs.readFileSync('/Users/tomp ool/Desktop/PuppeteerRender ing/PageMonitor/urls.txt','utf 8');} @cptntommy #BrightonSEO
  178. 178. And Then Pull Relevant Meta Data @cptntommy #BrightonSEO
  179. 179. For Example @cptntommy #BrightonSEO
  180. 180. Meta Title @cptntommy #BrightonSEO
  181. 181. try {title = await page.title();} catch (e1) {title = 'n/a';} @cptntommy #BrightonSEO
  182. 182. Then Create An Array Of All The Meta Data @cptntommy #BrightonSEO
  183. 183. let retArray = [date,url,title,description ,canonical,robots,wordC ount]; @cptntommy #BrightonSEO
  184. 184. And Pushed This To A txt File @cptntommy #BrightonSEO
  185. 185. The Script Then Loops Through All Provided URLs @cptntommy #BrightonSEO
  186. 186. And Checks For Differences In The Returned Data @cptntommy #BrightonSEO
  187. 187. If There Are Any Differences, These Get Saved In Another txt File @cptntommy #BrightonSEO
  188. 188. That I Can Check Whenever @cptntommy #BrightonSEO
  189. 189. So I Can See What Has Changed From Yesterday/When I Last Ran The Code. @cptntommy #BrightonSEO
  190. 190. This Required Me To Run The Code Each Day @cptntommy #BrightonSEO
  191. 191. (That I Forgot To Do) @cptntommy #BrightonSEO
  192. 192. So I Went One Step Further @cptntommy #BrightonSEO
  193. 193. Chucked It On A Raspberry Pi @cptntommy #BrightonSEO
  194. 194. And Set Up A CronJob To Automatically Run The Script At The Same Time @cptntommy #BrightonSEO
  195. 195. Every Day @cptntommy #BrightonSEO
  196. 196. And Then @cptntommy #BrightonSEO
  197. 197. (This Was The Longest Bit) @cptntommy #BrightonSEO
  198. 198. Email Me If Anything Changed @cptntommy #BrightonSEO
  199. 199. This Is By No Means A Finished Product, And Is Still An Ongoing Project @cptntommy #BrightonSEO
  200. 200. These Usages Of Chrome Puppeteer @cptntommy #BrightonSEO
  201. 201. Barely Scratch The Surface Of What Is Possible @cptntommy #BrightonSEO
  202. 202. So, To Recap @cptntommy #BrightonSEO
  203. 203. Today We Have Covered @cptntommy #BrightonSEO
  204. 204. Headless Chrome @cptntommy #BrightonSEO
  205. 205. Puppeteer @cptntommy #BrightonSEO
  206. 206. Basic Scripts Using Node.js @cptntommy #BrightonSEO
  207. 207. And Automation Of All Of These To Save You Valuable Time @cptntommy #BrightonSEO
  208. 208. And Hopefully, Allow You To @cptntommy #BrightonSEO
  209. 209. And Hopefully, Allow You To @cptntommy #BrightonSEO
  210. 210. THANKS! @cptntommy #BrightonSEO

×