Successfully reported this slideshow.
Your SlideShare is downloading. ×

BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site!

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 210 Ad

BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site!

Download to read offline

Presentation delivered on 12.04.19, at April's edition of Brighton SEO. Contains an introduction, basic & more advanced usages of Chrome Puppeteer, Headless Chrome & how you can use it to monitor your site!

Presentation delivered on 12.04.19, at April's edition of Brighton SEO. Contains an introduction, basic & more advanced usages of Chrome Puppeteer, Headless Chrome & how you can use it to monitor your site!

Advertisement
Advertisement

More Related Content

Similar to BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site! (13)

Recently uploaded (20)

Advertisement

BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site!

  1. 1. slideshare.net/Tom-Pool How To Use Chrome Puppeteer To Fake Googlebot And Monitor Your Site Tom Pool // BlueArray // @cptntommy
  2. 2. Who Am I? @cptntommy #BrightonSEO
  3. 3. @cptntommy #BrightonSEO
  4. 4. Look After Technical Output Of The Agency @cptntommy #BrightonSEO
  5. 5. Always Trying To Find Ways To Make My Teams Job Easier @cptntommy #BrightonSEO
  6. 6. So I Was Watching Google I/O 18 (Which Is Awesome BTW) @cptntommy #BrightonSEO
  7. 7. And I Saw A Really Really Really Cool Talk @cptntommy #BrightonSEO
  8. 8. Eric Bidelman @cptntommy #BrightonSEO
  9. 9. This Got Me Thinking @cptntommy #BrightonSEO
  10. 10. I Can Use This To Help Me With My Job! @cptntommy #BrightonSEO
  11. 11. So I Went Away & Did A Shit Ton Of Research @cptntommy #BrightonSEO
  12. 12. That Included @cptntommy #BrightonSEO
  13. 13. Headless Chrome @cptntommy #BrightonSEO
  14. 14. Chrome @cptntommy #BrightonSEO
  15. 15. And A Little Bit Of Coding @cptntommy #BrightonSEO
  16. 16. (Not Much!) @cptntommy #BrightonSEO
  17. 17. I Want All Of You To At Least Take @cptntommy #BrightonSEO
  18. 18. A Small Piece Of Knowledge From This @cptntommy #BrightonSEO
  19. 19. I’ll Also Tweet Out This Deck @cptntommy #BrightonSEO
  20. 20. So... @cptntommy #BrightonSEO
  21. 21. What Is Headless Chrome? @cptntommy #BrightonSEO
  22. 22. @cptntommy #BrightonSEO
  23. 23. @cptntommy #BrightonSEO
  24. 24. @cptntommy #BrightonSEO
  25. 25. @cptntommy #BrightonSEO
  26. 26. Headless Chrome = None Of That Shit @cptntommy #BrightonSEO
  27. 27. @cptntommy #BrightonSEO
  28. 28. @cptntommy #BrightonSEO
  29. 29. Google Chrome Is Running, But With No User Interface @cptntommy #BrightonSEO
  30. 30. So It Is ‘Headless’ @cptntommy #BrightonSEO
  31. 31. Why Should You Even Care? @cptntommy #BrightonSEO
  32. 32. You Can: @cptntommy #BrightonSEO
  33. 33. Scrape The Shit Out Of (JS) Websites @cptntommy #BrightonSEO
  34. 34. Copy The DOM, & Paste To A Text File @cptntommy #BrightonSEO
  35. 35. Compare Source Code With DOM & Export Differences @cptntommy #BrightonSEO
  36. 36. Generate Screenshots of Pages @cptntommy #BrightonSEO
  37. 37. Crawl Single Page Applications @cptntommy #BrightonSEO
  38. 38. I Know, JS Is Evil, But It Ain’t Going Away! @cptntommy #BrightonSEO
  39. 39. Screaming Frog Does Have JS Rendering Features. Utilises (Something Like) Headless Chrome @cptntommy #BrightonSEO
  40. 40. Google Can Render JS, But It Is In No Way Perfect, Or Even That Effective @cptntommy #BrightonSEO
  41. 41. Countless Case Studies @cptntommy #BrightonSEO
  42. 42. Crawl Single Page Applications @cptntommy #BrightonSEO
  43. 43. Automate WebPage Checks @cptntommy #BrightonSEO
  44. 44. Used For Webpage Testing (Clicking On Buttons, Filling In Forms, General Fuckery) @cptntommy #BrightonSEO
  45. 45. Great For Emulating User Behaviour! @cptntommy #BrightonSEO
  46. 46. Great For Seeing How Much Shit A Website Can Take Before It Breaks! @cptntommy #BrightonSEO
  47. 47. The Problem Is... @cptntommy #BrightonSEO
  48. 48. You Have To Run Basic Headless Chrome From Command Line @cptntommy #BrightonSEO
  49. 49. @cptntommy #BrightonSEO
  50. 50. /Applications/Google Chrome.app/Contents/MacOS/Google Chrome @cptntommy #BrightonSEO
  51. 51. /Applications/Google Chrome.app/Contents/MacOS/Google Chrome --headless @cptntommy #BrightonSEO
  52. 52. /Applications/Google Chrome.app/Contents/MacOS/Google Chrome --headless --remote-debugging- port=9222 @cptntommy #BrightonSEO
  53. 53. /Applications/Google Chrome.app/Contents/MacOS/Google Chrome --headless --remote-debugging- port=9222 --disable-gpu @cptntommy #BrightonSEO
  54. 54. /Applications/Google Chrome.app/Contents/MacOS/Google Chrome --headless --remote-debugging- port=9222 --disable-gpu https://www.bluearray.co.uk @cptntommy #BrightonSEO
  55. 55. Now @cptntommy #BrightonSEO
  56. 56. I Really Really Love Using Command Line @cptntommy #BrightonSEO
  57. 57. @cptntommy #BrightonSEO
  58. 58. But This Really Really Made Me Cry @cptntommy #BrightonSEO
  59. 59. So How Do I Make It Easy? @cptntommy #BrightonSEO
  60. 60. Like I Said - I’m Always Trying To Make My Job Easier @cptntommy #BrightonSEO
  61. 61. And This Was Not Easy! @cptntommy #BrightonSEO
  62. 62. So I Went Away & Did A Bigger Shit Ton Of Research @cptntommy #BrightonSEO
  63. 63. Eric Bidelman @cptntommy #BrightonSEO
  64. 64. What Is Chrome Puppeteer? @cptntommy #BrightonSEO
  65. 65. @cptntommy #BrightonSEO
  66. 66. BlahBlahBlahBlahBlahBlahBlah BlahBlahBlahBlahBlahBlahBlahBlahBlah BlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlah BlahBlahBlahBlahBlahBlahBlahBlahBlahBlah BlahBlahBlahBlahBlahBlahBlah @cptntommy #BrightonSEO
  67. 67. OOOOOOOO API @cptntommy #BrightonSEO
  68. 68. Node Can Be Used For Making Applications @cptntommy #BrightonSEO
  69. 69. And It Can Also Be Used To help Control Headless Chrome @cptntommy #BrightonSEO
  70. 70. And Trust Me It’s Easy! @cptntommy #BrightonSEO
  71. 71. So How Can I Get Chrome Puppeteer? @cptntommy #BrightonSEO
  72. 72. If You Want To Run Tests On Your Local Machine @cptntommy #BrightonSEO
  73. 73. You Have To Install NPM & Node.js @cptntommy #BrightonSEO
  74. 74. @cptntommy #BrightonSEO
  75. 75. Someone’s Made This Easy! @cptntommy #BrightonSEO
  76. 76. So If You Are On PC @cptntommy #BrightonSEO
  77. 77. It’s Pretty Straightforward @cptntommy #BrightonSEO
  78. 78. Just Install From The Node.js Websites @cptntommy #BrightonSEO
  79. 79. bit.ly/pc-pup-brighton19 @cptntommy #BrightonSEO
  80. 80. If You Are On Mac @cptntommy #BrightonSEO
  81. 81. (Like Me) @cptntommy #BrightonSEO
  82. 82. It’s Not That Easy @cptntommy #BrightonSEO
  83. 83. bit.ly/pupbrighton19 @cptntommy #BrightonSEO
  84. 84. You Wanna Open Up Terminal @cptntommy #BrightonSEO
  85. 85. @cptntommy #BrightonSEO
  86. 86. ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/ install/master/install)" @cptntommy #BrightonSEO
  87. 87. This Installs Homebrew, That Makes Everything E-Z @cptntommy #BrightonSEO
  88. 88. @cptntommy #BrightonSEO
  89. 89. @cptntommy #BrightonSEO
  90. 90. When This Has Done Its Thing @cptntommy #BrightonSEO
  91. 91. You Have To Install 2 More Things, And We’ll Be Ready To Rock @cptntommy #BrightonSEO
  92. 92. brew install node @cptntommy #BrightonSEO
  93. 93. @cptntommy #BrightonSEO
  94. 94. And Then @cptntommy #BrightonSEO
  95. 95. npm i puppeteer @cptntommy #BrightonSEO
  96. 96. Now You Are All Good! @cptntommy #BrightonSEO
  97. 97. You Can Now Run Chrome Puppeteer On Your Machine! @cptntommy #BrightonSEO
  98. 98. For Example @cptntommy #BrightonSEO
  99. 99. If I Wanted To Take A Screenshot Of A Single Webpage @cptntommy #BrightonSEO
  100. 100. There Is A Bunch Of Code Coming Up @cptntommy #BrightonSEO
  101. 101. That Can All Be Seen In The Following Link (I’ll Also Tweet It) @cptntommy #BrightonSEO
  102. 102. https://bit.ly/Brighton SEO19 @cptntommy #BrightonSEO
  103. 103. @cptntommy #BrightonSEO
  104. 104. let browser = await puppeteer.launch({headless: true}); @cptntommy #BrightonSEO
  105. 105. let page = await browser.newPage(); @cptntommy #BrightonSEO
  106. 106. await page.goto('https://www. bluearray.co.uk/'); @cptntommy #BrightonSEO
  107. 107. await page.screenshot({ @cptntommy #BrightonSEO
  108. 108. await page.screenshot({ path: './testimg.jpg', @cptntommy #BrightonSEO
  109. 109. await page.screenshot({ path: './testimg.jpg', type: 'jpeg'}); @cptntommy #BrightonSEO
  110. 110. await page.close(); await browser.close(); @cptntommy #BrightonSEO
  111. 111. File Is Saved As Screenshot.js @cptntommy #BrightonSEO
  112. 112. So To Run This Small Piece Of Code @cptntommy #BrightonSEO
  113. 113. Go To Terminal (In Same Folder As Code), And Type In @cptntommy #BrightonSEO
  114. 114. Node Screenshot.js @cptntommy #BrightonSEO
  115. 115. And Then, 5 Seconds later, @cptntommy #BrightonSEO
  116. 116. @cptntommy #BrightonSEO
  117. 117. If You Wanted To See The Browser Do These Steps @cptntommy #BrightonSEO
  118. 118. let browser = await puppeteer.launch({headless: True}); @cptntommy #BrightonSEO
  119. 119. let browser = await puppeteer.launch({headless: False}); @cptntommy #BrightonSEO
  120. 120. You Can Also Provide A List Of URLs @cptntommy #BrightonSEO
  121. 121. @cptntommy #BrightonSEO And Get A Shit Ton Of Screenshots!
  122. 122. Now I’m Sure You Can See Where This Is Headed @cptntommy #BrightonSEO
  123. 123. Faking Googlebot! @cptntommy #BrightonSEO
  124. 124. With A Few Tweaks to The Code @cptntommy #BrightonSEO
  125. 125. await page.setUserAgent ('Googlebot'); @cptntommy #BrightonSEO
  126. 126. Googlebot’s User Agent Is Not Just ‘Googlebot’ @cptntommy #BrightonSEO
  127. 127. It’s Fuck*** Huge @cptntommy #BrightonSEO
  128. 128. Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.ht@cptntommy #BrightonSEO
  129. 129. And Then You Gotta Set Googlebot’s Viewport @cptntommy #BrightonSEO
  130. 130. await page.setViewport @cptntommy #BrightonSEO
  131. 131. await page.setViewport ({width: 1024, height: 1024}); @cptntommy #BrightonSEO
  132. 132. FYI This Is Not Really Googlebot @cptntommy #BrightonSEO
  133. 133. As Unfortunately @cptntommy #BrightonSEO
  134. 134. Can’t Change Chrome Version That Puppeteer Uses To 41 :( @cptntommy #BrightonSEO
  135. 135. As Chrome Puppeteer Was Released After Chrome 41 (*Not Backwards Compatible) @cptntommy #BrightonSEO
  136. 136. However! @cptntommy #BrightonSEO
  137. 137. Can Be Persuasive In Getting A Client To Ensure Their Content Is SSR’d (If Needed) @cptntommy #BrightonSEO
  138. 138. Chrome Puppeteer Can Be Installed On The Server @cptntommy #BrightonSEO
  139. 139. We Can Then Provide Puppeteer With A List Of URLs, And It Can Work Through Them All @cptntommy #BrightonSEO
  140. 140. And Show How They Would Appear To Google, Instead Of @cptntommy #BrightonSEO
  141. 141. In The Case Of Some JS Sites @cptntommy #BrightonSEO
  142. 142. @cptntommy #BrightonSEO
  143. 143. A Blank Page @cptntommy #BrightonSEO
  144. 144. Which Is Cool & A Nice Trick @cptntommy #BrightonSEO
  145. 145. But The Really Cool Stuff Is Yet To Come @cptntommy #BrightonSEO
  146. 146. So Who Here Has Heard Of (Or Used) ContentKing? @cptntommy #BrightonSEO
  147. 147. It’s Fairly Awesome @cptntommy #BrightonSEO
  148. 148. Allows You To Monitor A Site In Real-Time @cptntommy #BrightonSEO
  149. 149. With It Letting you Know Of Any Issues @cptntommy #BrightonSEO
  150. 150. Meta Changes, New 404 Errors, Updated Links…. @cptntommy #BrightonSEO
  151. 151. BUT @cptntommy #BrightonSEO
  152. 152. Like Most Good Tools, It Costs Money @cptntommy #BrightonSEO
  153. 153. Maybe You Don’t Wanna Eat Into Your Budget @cptntommy #BrightonSEO
  154. 154. This Next Example Shows How We Can Use Puppeteer @cptntommy #BrightonSEO
  155. 155. Monitor Your Site When You Want & Report Of Any Changes To Key Areas @cptntommy #BrightonSEO
  156. 156. Including @cptntommy #BrightonSEO
  157. 157. Title Changes @cptntommy #BrightonSEO
  158. 158. Description Changes @cptntommy #BrightonSEO
  159. 159. Word Count Increases/Decreases @cptntommy #BrightonSEO
  160. 160. Robots Directives @cptntommy #BrightonSEO
  161. 161. Canonicals @cptntommy #BrightonSEO
  162. 162. So Basically The REALLY Important Shit In The HTML @cptntommy #BrightonSEO
  163. 163. So I Wrote Some Code @cptntommy #BrightonSEO
  164. 164. As With All Code, Required A Bit Of Research @cptntommy #BrightonSEO
  165. 165. @cptntommy #BrightonSEO
  166. 166. And With A Bit Of Luck, @cptntommy #BrightonSEO
  167. 167. We Now Have A Way To Monitor Basic Areas Of Sites! @cptntommy #BrightonSEO
  168. 168. So. @cptntommy #BrightonSEO
  169. 169. There Is About 200 Lines Of Code @cptntommy #BrightonSEO
  170. 170. @cptntommy #BrightonSEO
  171. 171. And I Don’t Have Time To Go Through The Full Thing @cptntommy #BrightonSEO
  172. 172. But @cptntommy #BrightonSEO
  173. 173. There Are A Few Interesting Snippets I’d Like To Share @cptntommy #BrightonSEO
  174. 174. We Launch Headless Chrome & Puppeteer As Highlighted A Minute Ago @cptntommy #BrightonSEO
  175. 175. const browser = await puppeteer.launch(); const page = await browser.newPage(); @cptntommy #BrightonSEO
  176. 176. Provide A List Of URLs For Puppeteer To Go And Play With @cptntommy #BrightonSEO
  177. 177. try {data = fs.readFileSync('/Users/tomp ool/Desktop/PuppeteerRender ing/PageMonitor/urls.txt','utf 8');} @cptntommy #BrightonSEO
  178. 178. And Then Pull Relevant Meta Data @cptntommy #BrightonSEO
  179. 179. For Example @cptntommy #BrightonSEO
  180. 180. Meta Title @cptntommy #BrightonSEO
  181. 181. try {title = await page.title();} catch (e1) {title = 'n/a';} @cptntommy #BrightonSEO
  182. 182. Then Create An Array Of All The Meta Data @cptntommy #BrightonSEO
  183. 183. let retArray = [date,url,title,description ,canonical,robots,wordC ount]; @cptntommy #BrightonSEO
  184. 184. And Pushed This To A txt File @cptntommy #BrightonSEO
  185. 185. The Script Then Loops Through All Provided URLs @cptntommy #BrightonSEO
  186. 186. And Checks For Differences In The Returned Data @cptntommy #BrightonSEO
  187. 187. If There Are Any Differences, These Get Saved In Another txt File @cptntommy #BrightonSEO
  188. 188. That I Can Check Whenever @cptntommy #BrightonSEO
  189. 189. So I Can See What Has Changed From Yesterday/When I Last Ran The Code. @cptntommy #BrightonSEO
  190. 190. This Required Me To Run The Code Each Day @cptntommy #BrightonSEO
  191. 191. (That I Forgot To Do) @cptntommy #BrightonSEO
  192. 192. So I Went One Step Further @cptntommy #BrightonSEO
  193. 193. Chucked It On A Raspberry Pi @cptntommy #BrightonSEO
  194. 194. And Set Up A CronJob To Automatically Run The Script At The Same Time @cptntommy #BrightonSEO
  195. 195. Every Day @cptntommy #BrightonSEO
  196. 196. And Then @cptntommy #BrightonSEO
  197. 197. (This Was The Longest Bit) @cptntommy #BrightonSEO
  198. 198. Email Me If Anything Changed @cptntommy #BrightonSEO
  199. 199. This Is By No Means A Finished Product, And Is Still An Ongoing Project @cptntommy #BrightonSEO
  200. 200. These Usages Of Chrome Puppeteer @cptntommy #BrightonSEO
  201. 201. Barely Scratch The Surface Of What Is Possible @cptntommy #BrightonSEO
  202. 202. So, To Recap @cptntommy #BrightonSEO
  203. 203. Today We Have Covered @cptntommy #BrightonSEO
  204. 204. Headless Chrome @cptntommy #BrightonSEO
  205. 205. Puppeteer @cptntommy #BrightonSEO
  206. 206. Basic Scripts Using Node.js @cptntommy #BrightonSEO
  207. 207. And Automation Of All Of These To Save You Valuable Time @cptntommy #BrightonSEO
  208. 208. And Hopefully, Allow You To @cptntommy #BrightonSEO
  209. 209. And Hopefully, Allow You To @cptntommy #BrightonSEO
  210. 210. THANKS! @cptntommy #BrightonSEO

Editor's Notes

  • Like many of us, I’m constantly trying to find any new ways to make my (and my teams) jobs easier
  • So this awesome guy - Eric Bidelman - is a software engineer at Google, and works on headless chrome, lighthouse & dev tools.
  • I can use chrome puppeteer to help me with my job
  • So I went away and did a literal shit ton of research, that is worth sharing.
  • So I went away and did a literal shit ton of research, that is worth sharing.
  • So I went away and did a literal shit ton of research, that is worth sharing.
  • So I went away and did a literal shit ton of research, that is worth sharing.
  • So I went away and did a literal shit ton of research, that is worth sharing.
  • So I went away and did a literal shit ton of research, that is worth sharing.
  • So I went away and did a literal shit ton of research, that is worth sharing.
  • So I went away and did a literal shit ton of research, that is worth sharing.
  • So I went away and did a literal shit ton of research, that is worth sharing.
  • So I went away and did a literal shit ton of research, that is worth sharing.
  • So, the first thing i was looking for was a basic definition.
  • Contrary to what i wanted to believe, it did not involve any decapitation
  • So when you open up Google Chrome normally, you get a wonderful User Interface with bookmarks
  • And a search bar, plugins, buttons, tabs
  • And usable functionality.
  • With headless chrome, you get none of that shit.
  • So here I am running headless chrome
  • And we can see that it is in the background, but I have no Chrome windows open.
  • So Google Chrome is Running, but with NO User Interface.
  • SO it is running without the UX/UI head
  • Why should you even care about this sort of stuff though?
  • Through this research journey, I found out that you can do a bunch of stuff with it!
  • Scrape the literal shit out of Javascript websites (as well as basic HTML scraping)
  • You can copy the DOM, and then paste it into a text file, with which you canm
  • Compare the source code of the site with the DOM, and then export differences. This can allow you to identify any potential rendering issues.
  • Can use it to generate screenshots of
  • And effectively crawl single page applications
  • JS Can be a bit of a pain to work with, but unfortunately, it is not going away!
  • So Screaming Frog (and a majority of crawling softwares), utilise something like headless chrome to emulate a browser, and provide JS rendering features.
  • And we all know about issues that Google can have with crawling JS, ranging from having slight issues with rendering, to completely drawing a blank.
  • So there have been a bunch of JS indexing and rendering case studies over the past couple of years.
  • So it can help you crawl these guys.
  • We can also use Headless Chrome to automate web page checks, and I provide an in depth investigation to this later on in this deck.
  • AND it can be used for general webpage testing. Including clicking on stuff, filling in forms, general fuckery with the mouse and keyboard.
  • It is really good for emulating user behaviour. So great for pretending to be a user, and browsing around a site.
  • SO it is basically really great for seeing exactly how much shit a website can take before it breaks!
  • However, the problem with running all of these tasks is
  • You have to run basic headless chrome through the command line interface
  • So first you gotta install some dependencies, and have a shit ton of errors hit you in the face, and you gotta know where chrome is stored on your local machine...
  • Then you gotta run directly from that location
  • Then specify headless chrome to launch
  • Then open a port to use
  • Then you gotta disable GPU
  • Then you can add a single URL, or a URL list into the command line
  • Now then
  • I really really really love using command line
  • In fact so much so that I spoke about it at Brighton last year
  • But doing all of this shit really really really really made me wanna cry
  • So how do I make utilsiing headless chrome, which is freaking awesome - easy?
  • Like I said a few minutes ago, I’m always trying to find ways to make my job easier
  • And doing all of these boring ass steps was really really not easy. At All.
  • So I went away and did a bigger shit ton of research.
  • So, in this talk at Google IO, Eric mentions something called Google Puppeteer ()shoutout eric
  • So what is Chrome Puppeteer?
  • Doing a simple Google Search for Chrome Puppeteer reveals all.
  • But the stuff I’m interested in is this. A Node Library, and
  • Oooooooooo an API
  • So Node - for those that do not have dev experience, can be used for making some pretty kick-ass applications
  • It can also be used to help control headless chrome in an easy to digest and utilise package
  • So Node - for those that do not have dev experience, can be used for making some pretty kick-ass applications
  • So how can you actually get chrome pupppeteer?
  • If you want to run tests on your local machine, you have to install a few things first.
  • Node.js - which is a runtime environment, and NPM which is a package manager for node.
  • Chill out though, it’s fairly straightforward
  • Someone a while ago has made this easy
  • So If you are on PC it’s fairly simple to get and install,
  • You’ve just gotta install these things from the Node JS website
  • I’ve linked to a guide here - that takes you through step by step.
  • If, like me, you are on a Mac
  • If, like me, you are on a Mac
  • Its not that easy.
  • There’s a wicked awesome guide here that takes you through step by step what you need to do.
  • So you wanna start off by opening up terminal
  • And then typing in a few lines of shit
  • This installs homebrew, that makes everything even ez-er
  • This installs homebrew, that makes everything even ez-er
  • This installs homebrew, that makes everything even ez-er
  • So when homebrew is downloaded - it shouldnt take too long - a max of 5 mins
  • So You Have To Install 2 More Things, And We’ll Be Ready To Rock. These are npm and node.
  • So just type in this. It installs node through homebrew, directly onto your machine with no fuckery.
  • So this installs node and npm, you’ll get a nice progress bar tellling you how far along it is
  • Then you wanna use npm to install the latest version of puppeteer.
  • Now that’s it, you are all good and groovy!
  • You can
  • So for example.
  • If I wanted to take a screenshotof a single page
  • So just type in this, and you should be good to go.
  • So just type in this, and you should be good to go.
  • So just type in this, and you should be good to go.
  • You’ll need to code some stuff up - but I’ve put everything together into a single google doc, that makes it simple & easy to understand what each bit does. Exmplain that you are going to go through it.
  • So we are starting up a headless browser, in true headless mode, so you won’t see what goes on (running in the background)
  • And then we are opening up a new tab/page
  • And then we specify exactly what URL we want to go to. So in this instance, we are testing the BlueArray Hoempage
  • Then we are taking a screenshot. We have to specify 2 things to allow the code to work correctly
  • So the path, so where and what we want the file to be saved as
  • And then saving as a specific filetype. Can fuck around with this, and get the ideal filetype that is good for you.
  • And then we close the page, and then close the broswer.
  • And then we close the page, and then close the broswer.
  • Go to terminal, make sure you are in the same folder as your code, and type in
  • Go to terminal, make sure you are in the same folder as your code, and type in
  • Node screenshot.js.
  • And then a couple of seconds later, you’ll see
  • A nice screenshot get added to your folder with your code in
  • If you wanted to see the browser test this exactly for you,
  • Just change the headless mode to false. This is great for seeing exactly what the browser sees, and looks pretty cool, having a chrome window doing all sorts of shit in front of you!
  • Just change the headless mode to false. This is great for seeing exactly what the browser sees, and looks pretty cool, having a chrome window doing all sorts of shit in front of you!
  • You can also modify the script slightly to run through a list of provided URLs
  • And then get a bunch of screenshots!
  • Now I’m sure that you guys can see where this is headed
  • Faking Googlebot and seeing what they would see
  • So with a few little tweaks to the code that we have for the first example
  • Adding in a user agent string, and setting it to what Googlebot use
  • FYI Googlebot user agent string is not ‘Googlebot’ it is fucking massive
  • FYI Googlebot user agent string is not ‘Googlebot’ it is fuckinhg massive
  • And wouldn’t fit on the slide
  • Node screenshot.js. Screenshot.js is the name of the file.
  • Using the await page set viewport option
  • So we have to specify the width and the height of the viewport that we want to use
  • This isn’t reallt Googlebot, just a decent attempt at emulation
  • AS unfortunately
  • As puppeteer was launched way after Chrome 41, we cannot specify it to use this version of Chrome :*(
  • As puppeteer was launched way after Chrome 41, we cannot specify it to use this version of Chrome :*(
  • However
  • This can be persuasive in getting a client to ensure that their content is Rendered Server Side, as opposed to client side, if needed
  • This can be persuasive in getting a client to ensure that their content is Rendered Server Side, as opposed to client side, if needed
  • We can then provide a list of URLs that we want to get screenshotted
  • And show how they would appear to Google through puppeteer rendering, instead of
  • In the case of some rather shit JS sites
  • Absolutely fuck all
  • Nothing - a blank page
  • Which is pretty cool, and allows for bulk page testing
  • But the really cool stuff is yet to come!
  • So who here has heard of, or even used Content King?
  • It’s a fairly awesome piece of software
  • That allows you to monitor a site in -real time ish,
  • With it alerting you of any issues such as
  • Meta data changes, New pages that 404, Updated links, redirects, indexable and non-indexable pages….
  • However!
  • Like most really good tools, it costs money
  • Maybe You Don’t Wanna Eat Into Your Budget For Content King for a personal project site, or you don’t need the level of detail that those guys provide for a smaller, shitter site?
  • This Next Example Shows How We Can Use puppeteer to
  • Monitor a chosen site when you want, and report of any changes to key areas
  • Including some key areas, such as
  • Meta title changes
  • Meta description updates
  • Any increase or decrease in the word count of the page.
  • Pull out any robots directives, and highlights any differences between them
  • Any differences in canonical elements
  • So basically the really important shit from a HTML webpage
  • So I wrote some code
    So I’ll be tweeting this out after for those who are interested..
  • As with all coding, this required a bit of research
  • Ahem stackoverflow ahem
  • And with a little bit of luck
  • We now have a way to monitor these basic areas for web pages
  • This is how it works
  • There is about 200 lines of code in total
  • Heres a small snapshot
  • An i don’t have time to go through the full thing today,
  • but
  • There are a few really interesting snippets that I’d really like to share, that can come in handy
  • So we launch headless chrome as highlighted a few minutes ago
  • Like so. So we launch the browser, and then create a new page within the browser, awaiting for further instruction...
  • And then we provide a list of URLs for Puppeteer to go and fuck around with
  • So here we are quoting the file that we will use for this program, we parse (or read it) using a couple more lines, that don’t really look that exciting!
  • And then we pull in teh relevant meta data that I mentioned
  • SO, for example
  • Gonna show you guys how we pull in meta titles
  • So we are just pulling the title from the page. If there isn’t one - we get an error, so add in this - n/a
  • And then create an array of all the meta data - so a nice, formatted list of data that we can use later on within the script
  • So this just tells the script to treat all this data as one line, that we can then refer back to later
  • And we then pushed all this data to a text file
  • The Script then loops through every URL that is provided, pullingout all data for each
  • It then checks for differences in the data - so compares this run with the previous one.
  • If there are any differences between the two sets of data, these get saved within a changes.txt file
  • That i can then check whenever
  • So I can see what has changed from yesterday, or whenever I last ran the code
  • This required me to run the code each day manually
  • That I completely forgot to do
  • So, I went one step further, to make my life even easier
  • Chucked the code on a Raspberry Pi
  • And set up a cron job within my local machine to automatically run the script at the same time
  • Every day
  • And then
  • This was the bit that took the most amount of time by faarrr
  • Send an email to me if there were any changes.
  • Send an email to me if there were any changes.
  • Imgh
  • Imgh
  • Imgh
  • Imgh
  • Imgh
  • Imgh
  • Imgh
  • Imgh
  • Imgh
  • Imgh

×