Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

YQL:: Select * from Internet

8,013 views

Published on

Published in: Technology
  • Be the first to comment

YQL:: Select * from Internet

  1. 1. YQLSELECT * FROM InternetDeveloper Talk Token Unicorn
  2. 2. Derek Gathright
  3. 3. Derek Gathright• Engineer @ Yahoo
  4. 4. Derek Gathright• Engineer @ Yahoo• Tweet from @derek
  5. 5. Derek Gathright• Engineer @ Yahoo• Tweet from @derek• Blog @ http://derekville.net
  6. 6. Derek Gathright• Engineer @ Yahoo• Tweet from @derek• Blog @ http://derekville.net• Everything else @ http://derek.io
  7. 7. On December 4th, 1995
  8. 8. On December 4th, 1995“Netscape and Sun announce JavaScript, across-platform object scripting language” http://web.archive.org/web/20070916144913/http://wp.netscape.com/newsref/pr/newsrelease67.html
  9. 9. How big is the Web?
  10. 10. GINORMOUS!
  11. 11. The Internet: 1995 http://www.jevans.com/pubnetmap.html
  12. 12. The Internet: 2003 http://www.opte.org/maps/
  13. 13. The Internet: 2007 http://xkcd.com/256/
  14. 14. The Internet: 2010 http://xkcd.com/802/
  15. 15. It’s impossible tomeasure the size of theweb because it isconstantly changing,growing, and morphing.
  16. 16. Every second:Twitter gets 600 new tweets
  17. 17. Every minute:YouTube +35 hours of video
  18. 18. Every month: Facebook gets 2.5 billion new photos
  19. 19. Yeah, lots of it is garbage
  20. 20. But theres still a ton ofinteresting stuff out there.
  21. 21. So how do you access it, programmatically?
  22. 22. It is easy enough to‘scrape’ the web (using cURL, wget, etc...), but how do you parse it?
  23. 23. It is easy enough to‘scrape’ the web (using cURL, wget, etc...), but how do you parse it?XPath + DOM Traversal = Yay!
  24. 24. It is easy enough to ‘scrape’ the web (using cURL, wget, etc...), but how do you parse it? XPath + DOM Traversal = Yay!Regular Expressions = Double Yay!
  25. 25. It is easy enough to‘scrape’ the web (using cURL, wget, etc...), but how do you parse it?
  26. 26. “If a regular expressionis longer than 2 inches, find another method” - Douglas Crockford
  27. 27. 4.4 lbs &1,368 pages. No thanks!
  28. 28. The Point?The Web has a ton ofdata, but no easy,hackday-able way toaccess it.
  29. 29. THE WEB NEEDS AN API!
  30. 30. APIs are awesome.You get (mostly) whatever datayou want in (mostly) whateverformat you want and in a (mostly)easy to parse structure.
  31. 31. Example...http://api.twitter.com/1/users/show.json?screen_name=derek
  32. 32. Companies discovered that if you build APIs, developers will come in droves
  33. 33. And it saves you from having todance around on stage like a monkey (if you are confused, Google “monkey dance”)
  34. 34. Yahoo, Google,Facebook, Twitter,Microsoft, NY Times, ...Most web companiesoffer APIs now days.
  35. 35. As neat as they are,they are imperfect.Why?
  36. 36. As neat as they are,they are imperfect.Why?You have to readdocumentation to usethem.
  37. 37. As neat as they are,they are imperfect.Why?You have to readdocumentation to usethem.BOOOOOOO!!!!
  38. 38. Were developers,were lazy,and we want to build stuff, NOW!
  39. 39. Yahoo invented a solution to this problem...
  40. 40. YQL!
  41. 41. "Yahoo! Query Language is anexpressive SQL-like language thatlets you query, filter, and join dataacross Web services.With YQL, apps run faster withfewer lines of code and a smallernetwork footprint." http://developer.yahoo.com/yql/
  42. 42. YQL is...• RESTful• Scaleable• Customizable• ... and lots of other “ables”
  43. 43. How do you use it?
  44. 44. How do you use it?$format = “json”; // or xml;
  45. 45. How do you use it?$format = “json”; // or xml;$base = “http://query.yahooapis.com/v1/public/yql”;
  46. 46. How do you use it?$format = “json”; // or xml;$base = “http://query.yahooapis.com/v1/public/yql”;$url = “{$base}?q={$yql_query}&format={$format}”;
  47. 47. How do you use it?$format = “json”; // or xml;$base = “http://query.yahooapis.com/v1/public/yql”;$url = “{$base}?q={$yql_query}&format={$format}”;$json_string = goGetIt($url); // likely a curl() call
  48. 48. How do you use it?$format = “json”; // or xml;$base = “http://query.yahooapis.com/v1/public/yql”;$url = “{$base}?q={$yql_query}&format={$format}”;$json_string = goGetIt($url); // likely a curl() call$data = json_decode($json);
  49. 49. How do you use it?$format = “json”; // or xml;$base = “http://query.yahooapis.com/v1/public/yql”;$url = “{$base}?q={$yql_query}&format={$format}”;$json_string = goGetIt($url); // likely a curl() call$data = json_decode($json); Or use any of the libraries written for your favorite language/framework
  50. 50. YQL Queries SELECT {fields} FROM {table}WHERE {conditions}
  51. 51. SELECT * FROMweather.forecast WHERE location=90210
  52. 52. SELECT * FROM data.html.cssselect WHEREurl=“http://yahoo.com” AND css=“body a”;
  53. 53. SELECT height,width,url FROM search.images WHERE query=“kitteh” ANDmimetype LIKE “%jpeg%”
  54. 54. SELECT * FROM google.search WHERE q=“pizza”
  55. 55. SELECT status.text FROM twitter.user.timeline WHERE screen_name=“derek”
  56. 56. SELECT * FROM foursquare.history WHEREusername=“foo” AND password=“bar”
  57. 57. SELECT * FROM rss WHERE url IN (SELECT title FROM atom WHERE url="http:// spreadsheets.google.com/feeds/list/ pg_T0Mv3iBwIJoc82J1G8aQ/od6/public/ basic") LIMIT 10 | unique(field="title")
  58. 58. Where’s the magic?
  59. 59. Data Tables13 categories & counting, including...• Geo • Social• Flickr • Upcoming• Local • Weather• Maps • Yahoo (Search)• Meme • YMail• Music • YQL (Storage)
  60. 60. Open Data Tables900+ community contributed tables inhundreds of categories, including... Amazon • Netflix• Craigslist • NY Times• Facebook • SimpleGeo• Foursquare • SPARQL• Google • Twitter• HackerNews • Wordpress• LastFM • YouTube
  61. 61. google.search
  62. 62. google.searchhttp://www.datatables.org/google/google.search.xml
  63. 63. google.searchhttp://www.datatables.org/google/google.search.xml twitter.user.timeline
  64. 64. google.search http://www.datatables.org/google/google.search.xml twitter.user.timelinehttp://www.datatables.org/twitter/twitter.user.timeline.xml
  65. 65. google.search http://www.datatables.org/google/google.search.xml twitter.user.timelinehttp://www.datatables.org/twitter/twitter.user.timeline.xml foursquare.history
  66. 66. google.search http://www.datatables.org/google/google.search.xml twitter.user.timelinehttp://www.datatables.org/twitter/twitter.user.timeline.xml foursquare.history http://www.datatables.org/foursquare/history.xml
  67. 67. http://www.datatables.org/craigslist/craigslist.search.xml
  68. 68. http://www.datatables.org/craigslist/craigslist.search.xml
  69. 69. http://www.datatables.org/craigslist/craigslist.search.xml
  70. 70. http://www.datatables.org/craigslist/craigslist.search.xml
  71. 71. &query={query}http://www.datatables.org/craigslist/craigslist.search.xml
  72. 72. YQL != Voodoo MagicIt is just rewriting a YQLquery into one (or many) HTTP calls for you.
  73. 73. USE "http://www.datatables.org/nyt/nyt.bestsellers.xml"AS nyt.bestsellers;USE "https://github.com/gcb/yql.opentable/raw/master/text.concat.xml"AS text.concat;SELECT text FROM text.concat WHERE text.key1 = "http://www.amazon.com/dp/" AND (text.key2) IN ( SELECT isbns.isbn.isbn10 FROM nyt.bestsellers WHERE apikey=yourAPIKey );// Generates strings like “http://www.amazon.com/dp/031603617X”
  74. 74. USE "https://github.com/gcb/yql.opentable/raw/master/text.concat.xml"AS text.concat;
  75. 75. https://github.com/gcb/yql.opentable/raw/master/text.concat.xml
  76. 76. https://github.com/gcb/yql.opentable/raw/master/text.concat.xml
  77. 77. <execute>• Execute arbitrary JavaScript in Rhino (a JS engine)• E4X Support (XML literals in JS)• Speak protocols and handle authentication; Basic auth, OAuth, XAuth, XMLRPC, ...• Best feature? View-source!
  78. 78. Summary• YQL is very useful for... • Scraping • Creating an API where one doesn’t exist • Converting XML -> JSON, & vice-versa • JSONP for JS-only apps • Many HTTP requests -> single HTTP request • Server-side JS processing
  79. 79. NY Times Data Tables• nyt.article.search • nyt.newswire• nyt.bestsellers • nyt.people.activities• nyt.bestsellers.history • nyt.people.followers• nyt.bestsellers.search • nyt.people.following• nyt.movies.critics • nyt.people.newsfeed• nyt.movies.picks • nyt.people.profiles• nyt.movies.reviews • nyt.people.users
  80. 80. Get started @ http://developer.yahoo.com/yql Thanks!Questions? Find, Tweet, or Email me. @derek or drg@yahoo-inc.com

×