Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to Leverage APIs for SEO #TTTLive2019

1,938 views

Published on

Learn the basic of APIs and how they can be leveraged for SEO and marketing. Chalk full of Python code examples.

The URL to the GitHub gist link on slide 54 has changed to the following:
https://gist.github.com/pshapiro/a86dc340f57c38fc22d0545ddec1fc9e

Published in: Marketing

How to Leverage APIs for SEO #TTTLive2019

  1. 1. How Anyone Can Leverage APIs for SEO Paul Shapiro
  2. 2. #TTTLIVE19 Paul Shapiro Partner, Director of Strategy & Innovation / SEO Practice Lead @ Catalyst
  3. 3. #TTTLIVE19
  4. 4. #TTTLIVE19
  5. 5. #TTTLIVE19 WTF is an API!? (RESTful Web API)
  6. 6. #TTTLIVE19 Application Programming Interface
  7. 7. #TTTLIVE19 Basically, APIs provide you a way to interface with an external web service. This enables automation, permits you to incorporate 3rd party systems into your own application, and to expand both systems by combining those services and features.
  8. 8. #TTTLIVE19 So, how does this work exactly?
  9. 9. #TTTLIVE19 SERVER HTTP is the protocol that facilitates communication between the client computer and server computer via requests and responses
  10. 10. #TTTLIVE19 CRUD Operations: Create Read Update Delete Operation SQL HTTP RESTful Web Services Create INSERT PUT / POST POST Read (Retrieve) SELECT GET GET Update (Modify) UPDATE PUT / POST / PAT CH PUT Delete (Destroy) DELETE DELETE DELETE https://en.wikipedia.org/wiki/Create,_read,_update_and_de lete
  11. 11. #TTTLIVE19 The interaction between client and server can be facilitated via several structured methods (sometimes referred to as verbs).
  12. 12. #TTTLIVE19 GET and POST are the most common methods and commonly used in conjunction with web APIs.
  13. 13. #TTTLIVE19
  14. 14. #TTTLIVE19 • “GET is used to request data from a specified resource.” • “POST is used to send data to a server to create/update a resource.” https://www.w3schools.com/tags/ref_httpmethods.asp
  15. 15. #TTTLIVE19 • “PUT is used to send data to a server to create/update a resource.” • “DELETE method deletes the specified resource.” https://www.w3schools.com/tags/ref_httpmethods.asp
  16. 16. #TTTLIVE19
  17. 17. #TTTLIVE19 APIs are a little bit like this antiquated ordering system… 1. You need to look at available inventory. You look at Spice Company’s catalogue via the GET method. This gives them a list of products they can order. 2. Once you know what you would like to purchase, your internal system marks it down according to some pre- defined business logic (in the form of item numbers and corresponding quantities) 3. Your program places and order sending this payload to the corresponding API endpoint using the POST method and you receive the product at your physical address sometime after.
  18. 18. #TTTLIVE19 { "accountId": "8675309", "shipAddress": { "name": "Bob SpiceyMcSpiceFace", "address": "237 South Broad Street", "city": "Philadelphia", "state": "PA" } "order": { "itemNumber": 86, "quantity": 5 } }
  19. 19. #TTTLIVE19 API Endpoint: http://suggestqueries.google.com/complete/search?output=toolbar&hl=en& q=board%20games Variable, encoded Simple API example via GET request
  20. 20. #TTTLIVE19 Response (XML):
  21. 21. #TTTLIVE19 Parse the XML board games board games for kids board games for adults board games near me board games online board games list board games walmart board games boston board games 2018 board games for toddlers
  22. 22. #TTTLIVE19 Answer The Public? Ubersuggest? Keywordtool.io http://suggestqueries.google.co m/complete/search?output=tool bar&hl=en&q=board%20games • q=board%20games%20can • q=board%20games%20vs • q=how%20board%20games
  23. 23. #TTTLIVE19 API Endpoint: http://api.grepwords.com/lookup?apikey=api_key_string&q=keyword String is unique to you, like password (authentication) Variable, changes and often looped Simple API example via GET request (with authentication)
  24. 24. #TTTLIVE19 http://api.grepwords.com/lookup?apikey=secret&q=board+games Response (JSON): [{"keyword":"board games","updated_cpc":"2018-04-30","updated_cmp":"2018-04- 30","updated_lms":"2018-04-30","updated_history":"2018-04- 30","lms":246000,"ams":246000,"gms":246000,"competition":0.86204091185173,"competeti on":0.86204091185173,"cmp":0.86204091185173,"cpc":0.5,"m1":201000,"m1_month":"2018- 02","m2":246000,"m2_month":"2018-01","m3":450000,"m3_month":"2017- 12","m4":368000,"m4_month":"2017-11","m5":201000,"m5_month":"2017- 10","m6":201000,"m6_month":"2017-09","m7":201000,"m7_month":"2017- 08","m8":201000,"m8_month":"2017-07","m9":201000,"m9_month":"2017- 06","m10":201000,"m10_month":"2017-05","m11":201000,"m11_month":"2017- 04","m12":201000,"m12_month":"2017-03"}] Simple API example via GET request
  25. 25. #TTTLIVE19 Parse the JSON keyword gms board games 246,000
  26. 26. #TTTLIVE19 https://www.catalystdigital.com /techseoboost/
  27. 27. #TTTLIVE19 Full Python Script (GrepWords/JSON): import requests import json boardgames = ["Gaia Project", "Great Western Trail", "Spirit Island"] for x in boardgames: apiurl = "http://api.grepwords.com/lookup?apikey=key&q=" + x r = requests.get(apiurl) parsed_json = json.loads(r.text) print(parsed_json[0]['gms']) 1 2 3 4 5 6 7 8
  28. 28. #TTTLIVE19
  29. 29. #TTTLIVE19 Full Python Script (Google Autosuggest/XML): import requests import xml.etree.ElementTree as ET boardgames = ["Gaia Project", "Great Western Trail", "Spirit Island"] for x in boardgames: apiurl = "http://suggestqueries.google.com/complete/search?output=toolbar&hl=en&q=" + x r = requests.get(apiurl) tree = ET.fromstring(r.content) for child in tree.iter('suggestion'): print(child.attrib['data'])
  30. 30. #TTTLIVE19
  31. 31. #TTTLIVE19 Combine Them Together? import requests import xml.etree.ElementTree as ET import json boardgames = ["board game", "bgg", "board game geek"] for x in boardgames: suggest_url = "http://suggestqueries.google.com/complete/search?output=toolbar&hl=en&q=" + x r = requests.get(suggest_url) tree = ET.fromstring(r.content) for child in tree.iter('suggestion'): print(child.attrib['data']) grep_url = "http://api.grepwords.com/lookup?apikey=key&q=" + child.attrib['data'] r = requests.get(grep_url) parsed_json = json.loads(r.text) try: print(parsed_json[0]['gms']) except KeyError: print("No data available in GrepWords.")
  32. 32. #TTTLIVE19
  33. 33. #TTTLIVE19 Google Autocomplete ✓
  34. 34. #TTTLIVE19 GrepWords ✓
  35. 35. #TTTLIVE19 Other API Examples
  36. 36. #TTTLIVE19 WebPageTest.org
  37. 37. #TTTLIVE19 import requests import json import xml.etree.ElementTree as ET import time testurls = ["https://searchwilderness.com/", "https://trafficthinktank.com/", "https://searchengineland.com/"] for x in testurls: apiurl = "http://www.webpagetest.org/runtest.php?fvonly=1&k=KEY&lighthouse=1&f=xml&url=" + x r = requests.get(apiurl) tree = ET.fromstring(r.content) for child in tree.findall('data'): wpturl = child.find('jsonUrl').text print(wpturl) ready = True while ready: r = requests.get(wpturl) parsed_json = json.loads(r.text) try: if(parsed_json['data']['statusCode']==100): print("Not yet ready. Trying again in 20 seconds.") ready = True time.sleep(20) except KeyError: ready = False print(x + "rn") print("Lighthouse Average First Contentful Paint: " + str(parsed_json['data']['average']['firstView']['chromeUserTiming.firstContentfulPaint']))
  38. 38. #TTTLIVE19 SEMRush
  39. 39. #TTTLIVE19 import csv import requests domain = “trafficthinktank.com" key = "YOUR API KEY" api_url = "https://api.semrush.com/?type=domain_organic&key=" + key + "&display_filter=%2B%7CPh%7CCo%7Cseo&display_limit=10&export_columns=Ph,Po,Pp,Pd,Nq,Cp, Ur,Tr,Tc,Co,Nr,Td&domain=" + domain + "&display_sort=tr_desc&database=us" with requests.Session() as s: download = s.get(api_url) decoded_content = download.content.decode('utf-8') cr = csv.reader(decoded_content.splitlines(), delimiter=';') my_list = list(cr) for column in my_list: print(column[0]) # Keyword print(column[1]) # Position print(column[4]) # Search Volume print(column[6]) # URL
  40. 40. #TTTLIVE19 Google Analytics?
  41. 41. #TTTLIVE19 Sample Code: https://developers.google.com/analyti cs/devguides/reporting/core/v4/quicks tart/installed-py JSON Payload Help: https://ga-dev- tools.appspot.com/query-explorer/
  42. 42. #TTTLIVE19 Moz (Linkscape)
  43. 43. #TTTLIVE19 from mozscape import Mozscape import pandas as pd import numpy as np import requests import time def divide_chunks(l, n): for i in range(0, len(l), n): yield l[i:i + n] client = Mozscape('access_id', 'sectet_key') csv = pd.read_csv('./all_outlinks.csv', skiprows=1) links = csv[csv['Type'] == 'AHREF'] # filter out CDNs, self-references, and other known cruft links = csv[~csv['Destination'].str.match('https?://boardgamegeek.com.*')] Domains = links['Destination'].replace(to_replace="(.*://)?([^/?]+).*", value=r"12", regex=True) x = list(divide_chunks(Domains.unique().tolist(), 5)) df = pd.DataFrame(columns=['pda','upa','url']) for vals in x: da_pa = client.urlMetrics(vals, Mozscape.UMCols.domainAuthority | Mozscape.UMCols.pageAuthority) i = 0 for y in da_pa: y['url'] = vals[i] i = i+1 df = df.append(y, ignore_index=True) print("Processing a batch of 5 URLs. Total URLs: " + str(len(Domains.unique()))) time.sleep(5) print(df) https://github.com/seomoz/SEOmozAPISamples /tree/master/python
  44. 44. #TTTLIVE19 Search Console
  45. 45. #TTTLIVE19 Schedule to run monthly with Cron and backup to SQL database: https://searchwilderness.com/gwmt- data-python/ JR Oakes’ BigQuery vision: http://bit.ly/2vmjDe8
  46. 46. #TTTLIVE19 Webhose.io
  47. 47. #TTTLIVE19 import requests import json import datetime import urllib.parse apikey = "KEY" search = 'title:"board games" -shipping -sale site_type:news language:english' query = urllib.parse.quote(search) time_diff = -30 time = int((dt.datetime.now(dt.timezone.utc) + dt.timedelta(time_diff)).timestamp()) apiurl = "http://webhose.io/filterWebContent?token=" + apikey + "&format=json&ts=" + str(time) + "&sort=crawled&q=" + query r = requests.get(apiurl) parsed_json = json.loads(r.text) for i in range(int(parsed_json['totalResults'])): try: print(parsed_json['posts'][i]['title']) print(parsed_json['posts'][i]['thread']['social']['facebook']) except IndexError: print("error occurred")
  48. 48. #TTTLIVE19 Reddit
  49. 49. #TTTLIVE19 https://searchwilderness.com/ reddit-python-code/
  50. 50. #TTTLIVE19 Wayback Machine
  51. 51. #TTTLIVE19 import requests import json domain = "trafficthinktank.com" apiurl = "https://web.archive.org/cdx/search/cdx?url=" + domain + "&matchType=domain&fl=original,timestamp&collapse=urlkey&filter=mi metype:text/html&filter=!original:.*%3A80.*&filter=!original:.*.(p ng%7Cjs%7Ccss%7Cjpg%7Csvg%7Cjpeg%7Cgif%7Cxml%7Crss%7CPNG%7CJS%7CCS S%7CJPG%7CSVG%7CJPEG%7CGIF%7CXML%7CRSS%7Ctxt%7CTXT%7Cico%7CICO%7Cp df%7CPDF).*&output=json" r = requests.get(apiurl) parsed_json = json.loads(r.text) for x in range(int(len(parsed_json))): print(parsed_json[x][0])
  52. 52. #TTTLIVE19 Other APIs • STAT / Rank Tracking • Google Natural Language Processing • Various Machine Learning Services • DeepCrawl / Botify / Cloud Crawlers • Stripe (for payment) • Map / Geolocation Data (Google Maps/Foursquare) • Slack • Whois data
  53. 53. #TTTLIVE19 Putting things together and making magic
  54. 54. #TTTLIVE19 1. Take outlink report from Screaming Frog 2. Distills URLs to Domains 3. Runs Moz Linkscape API against the list for PA & DA 4. Checks HTTP Status Code 5. Runs WHOIS API to see if domain is available https://gist.github.com/pshapiro/819cd172f f8fe576f2a4e1f74395ec47
  55. 55. #TTTLIVE19 https://github.com/MLTSEO/MLTS
  56. 56. #TTTLIVE19
  57. 57. Thanks! TTT: @Paul Shapiro Twitter: @fighto Blog: SearchWilderness.com

×