Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
the SEO’s guide to: !SCRAPING!EVERYTHING!  @eppievojt!  digital marketing consultant, JPL!
NEXT LEVEL!XPATH-ING!  Use Case 1:  Does site x link to any page on  eppie.net?
NEXT LEVEL!XPATH-ING!  Scrape partial       What we know:"  matches using        1)  Link will contain"                   ...
DO YOU LINK!TO ME?!  //a[contains(@href,http://www.eppie.net’)]             PROBLEM: FAILS TO ACCOUNT FOR CASE SENSITIVITY
Add translate() to normalize case//a[contains(translate(@href,   ABCDEFGHIJKLMNOPQRSTUVWXYZ,abcdefghijklmno   pqrstuvwxyz)...
How you can use this:Get notified when a link is removed+ Make contact to potentially save dropping link (friendly  reminde...
NEXT LEVEL!XPATH-ING!  Use Case 2:  Find every external link from cnn.com
NEXT LEVEL!XPATH-ING!                        What we know:"  Combine attribute  selectors to more     1)  External links a...
SCRAPE ALL!EXTERNAL LINKS!  //a[contains(@href,http://) and not    (contains(@href,cnn.com))]
How you can use this:Identify if a page is too spammed out to bother with by   pulling external link countsFind expired or...
LINK TYPE!IDENTIFICATION!  Use Case 3:  How are they ranking? What kind of links  do they have?
LINK TYPE!IDENTIFICATION!  XPath’s ancestor    What we know:"  axis lets us        A link inside a containing element with...
LINK TYPE!IDENTIFICATION!  "//a[@href=h,p://randfishkin.com/blog]/    ancestor::*[contains(@id|    @class,comment)]"       ...
Why you might use this:Analyze competitors’ strategies for acquiring linksFind what types of links are being used to get g...
REGEX TO!THE RESCUE!  Use Case 4:  I’ve scraped some data, now I need to  extract some small portion of it that  XPath can...
REGEX TO!THE RESCUE!  Use regular                     Example:  expressions to  pattern match      Extract all @mentions o...
REGEX TO!THE RESCUE!
REGEX TO!THE RESCUE!
REGEX TO!THE RESCUE!
REGEX TO!THE RESCUE!
EXTRACT!@ MENTIONS!       /(?:^|s)@([A-z0-9_]+)/gi
Why you might use this:Pull contact information from a web site (Twitter username,  email address) to improve outreach eff...
BEYOND THE !SPREADSHEET!  Use Case 5:  I want to chain processes together,  process lots of data, or allow multiple  users...
BEYOND THE !SPREADSHEET!  Scraping outside   PHP Scraping Overview:  the spreadsheet                     1)    CURL target...
BEYOND THE !SPREADSHEET! Simple PHP Scraper Class: http://www.scrapeeverything.com
SHOW!SOME LOVE!  I’m @eppievojt and I work for @jplcreative "  eppie.net  linkdetective.com  jplcreative.com
Upcoming SlideShare
Loading in …5
×

The SEO's Guide to Scraping Everything

32,360 views

Published on

Published in: Technology, Design
  • I pasted a website that might be helpful to you: ⇒ www.WritePaper.info ⇐ Good luck!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/EkQBQ ◀ ◀ ◀ ◀
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (Unlimited) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download Full EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ACCESS WEBSITE for All Ebooks ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

The SEO's Guide to Scraping Everything

  1. the SEO’s guide to: !SCRAPING!EVERYTHING! @eppievojt! digital marketing consultant, JPL!
  2. NEXT LEVEL!XPATH-ING! Use Case 1: Does site x link to any page on eppie.net?
  3. NEXT LEVEL!XPATH-ING! Scrape partial What we know:" matches using 1)  Link will contain" http://www.eppie.net in the " XPath’s “contains” href attribute" function to find 2)  Some people like to hurt the internet inexact data. by capitalizing URLs, so we’ll need to account for that" 3)  People who link to you don’t care about your desire for canonicalization
  4. DO YOU LINK!TO ME?! //a[contains(@href,http://www.eppie.net’)] PROBLEM: FAILS TO ACCOUNT FOR CASE SENSITIVITY
  5. Add translate() to normalize case//a[contains(translate(@href, ABCDEFGHIJKLMNOPQRSTUVWXYZ,abcdefghijklmno pqrstuvwxyz),http://www.eppie.net’)] DO YOU LINK! TO ME?!
  6. How you can use this:Get notified when a link is removed+ Make contact to potentially save dropping link (friendly reminder, buy expiring domain, recreate dead resource)Integrate into link outreach process+ Get notification when link goes live DO YOU LINK! TO ME?!
  7. NEXT LEVEL!XPATH-ING! Use Case 2: Find every external link from cnn.com
  8. NEXT LEVEL!XPATH-ING! What we know:" Combine attribute selectors to more 1)  External links all contain http://" accurately target 2)  Internal links can also use http://" useful information 3)  So we need to exclude http:// links to the current domain
  9. SCRAPE ALL!EXTERNAL LINKS! //a[contains(@href,http://) and not (contains(@href,cnn.com))]
  10. How you can use this:Identify if a page is too spammed out to bother with by pulling external link countsFind expired or expiring domains being linked to from authority sites. Purchase and rebuild or redirect those sites.Broken link building automation SCRAPE ALL! EXTERNAL LINKS!
  11. LINK TYPE!IDENTIFICATION! Use Case 3: How are they ranking? What kind of links do they have?
  12. LINK TYPE!IDENTIFICATION! XPath’s ancestor What we know:" axis lets us A link inside a containing element with leverage semantic an id or class name including the word “comment,” “footer,” or “blogroll” is markup to ID link highly suggestive of type types.
  13. LINK TYPE!IDENTIFICATION! "//a[@href=h,p://randfishkin.com/blog]/ ancestor::*[contains(@id| @class,comment)]" ment- Wa s Rand com ay to spa mming his w E the top ? This + 0S y... tells the stor
  14. Why you might use this:Analyze competitors’ strategies for acquiring linksFind what types of links are being used to get good anchor textImprove workflow: Ignore placed links (comments, directory submissions, article submissions, blog networks, etc) and work on a smaller subset of EARNED links for manual analysis SCRAPE ALL! EXTERNAL LINKS!
  15. REGEX TO!THE RESCUE! Use Case 4: I’ve scraped some data, now I need to extract some small portion of it that XPath can’t do on its own (easily)
  16. REGEX TO!THE RESCUE! Use regular Example: expressions to pattern match Extract all @mentions of a specific user from a tweet or page structured text
  17. REGEX TO!THE RESCUE!
  18. REGEX TO!THE RESCUE!
  19. REGEX TO!THE RESCUE!
  20. REGEX TO!THE RESCUE!
  21. EXTRACT!@ MENTIONS! /(?:^|s)@([A-z0-9_]+)/gi
  22. Why you might use this:Pull contact information from a web site (Twitter username, email address) to improve outreach effortsExtract code fragments (like Analytics IDs and AdSense IDs) for improved competitive research REGEX TO! THE RESCUE!
  23. BEYOND THE !SPREADSHEET! Use Case 5: I want to chain processes together, process lots of data, or allow multiple users to leverage what I build.
  24. BEYOND THE !SPREADSHEET! Scraping outside PHP Scraping Overview: the spreadsheet 1)  CURL target page allows for more 2)  Convert to DOM Object complex systems 3)  Run Xpath Queries 4)  Store Data or Hit API to be built.
  25. BEYOND THE !SPREADSHEET! Simple PHP Scraper Class: http://www.scrapeeverything.com
  26. SHOW!SOME LOVE! I’m @eppievojt and I work for @jplcreative " eppie.net linkdetective.com jplcreative.com

×