Querying the Web

811 views

Published on

A discussion of the various ways that data on the web can be published and queried. Why SQL is not the right tool for this.

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
811
On SlideShare
0
From Embeds
0
Number of Embeds
22
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Querying the Web

  1. 1. Querying the Web SlipstreamUSA :: April 2, 2008
  2. 2. Querying the Web <ul><ul><li>“ Information wants to be free” </li></ul></ul><ul><ul><ul><li>Stewart Brand, Whole Earth Catalogue </li></ul></ul></ul><ul><ul><ul><li>May 1985 </li></ul></ul></ul><ul><ul><li>“ Data is the Next Intel Inside” </li></ul></ul><ul><ul><ul><li>Tim O’Reilly </li></ul></ul></ul><ul><ul><ul><li>September 2005 </li></ul></ul></ul><ul><ul><li>“ The internet is my hard drive” </li></ul></ul><ul><ul><ul><li>Bruce Schneier </li></ul></ul></ul><ul><ul><ul><li>February 2008 </li></ul></ul></ul>
  3. 3. Freebase
  4. 4. Freebase
  5. 5. Freebase
  6. 6. Freebase
  7. 7. Freebase <ul><li>Metaweb Query Language </li></ul><ul><li>Request: </li></ul><ul><ul><li>{ &quot;type&quot; : &quot;/medicine/physician&quot;, </li></ul></ul><ul><ul><li>&quot;name&quot; : “Michael Maher“ } </li></ul></ul><ul><li>Response: </li></ul><ul><ul><li>{ &quot;code&quot;: &quot;/api/status/ok&quot;, &quot;result&quot;: { &quot;type&quot;: &quot;/medicine/physician&quot;, &quot;name&quot;: “Michael Maher&quot;, “gender”: “Male”, </li></ul></ul><ul><ul><li>“ education”: “Leeds University”} </li></ul></ul><ul><ul><li>} </li></ul></ul><ul><li>JSON </li></ul>
  8. 8. REST <ul><ul><li>REpresentational State Transfer </li></ul></ul><ul><ul><li>Less rigourous equivalent of SOAP </li></ul></ul><ul><ul><li>Data are considered to be resources </li></ul></ul><ul><ul><li>Every resource has a unique address </li></ul></ul><ul><ul><li>Layered over http: </li></ul></ul><ul><ul><ul><li>Client/Server separation </li></ul></ul></ul><ul><ul><ul><li>Stateless </li></ul></ul></ul><ul><ul><ul><li>Cacheable </li></ul></ul></ul><ul><ul><li>Request: </li></ul></ul><ul><ul><ul><li>GET http://rest.georgejames.com/product/Serenji/ </li></ul></ul></ul><ul><ul><li>Response: </li></ul></ul><ul><ul><ul><li>Name=Serenji </li></ul></ul></ul><ul><ul><ul><li>Price=195.00 </li></ul></ul></ul><ul><ul><ul><li>OrderCode=H1001 </li></ul></ul></ul>
  9. 9. Amazon S3 <ul><ul><li>S3 :: Simple Storage Service </li></ul></ul><ul><ul><li>Online storage space </li></ul></ul><ul><ul><li>$0.15 per Gbyte per month for storage </li></ul></ul><ul><ul><li>~ $0.20 per Gbyte data transfer </li></ul></ul><ul><ul><li>Storage request: </li></ul></ul><ul><ul><ul><li>PUT http://s3.amazonaws.com/[bucket-name]/[key-name] </li></ul></ul></ul><ul><ul><li>Retrieval request: </li></ul></ul><ul><ul><ul><li>GET http://s3.amazonaws.com/[bucket-name]/[key-name] </li></ul></ul></ul>
  10. 10. Amazon SimpleDB <ul><ul><li>Storage request: </li></ul></ul><ul><ul><li>https://sdb.amazonaws.com/?Action=PutAttributes &Attribute.0.Name=Color&Attribute.0.Value=Blue &Attribute.1.Name=Size&Attribute.1.Value=Med &Attribute.2.Name=Price&Attribute.2.Value=14.99 &AWSAccessKeyId=[valid access key id] &DomainName=MyDomain &ItemName=Item123 </li></ul></ul><ul><ul><li>Retrieval request: </li></ul></ul><ul><ul><li>https://sdb.amazonaws.com/ ?Action=GetAttributes &AWSAccessKeyId=[valid access key id] &DomainName=MyDomain &ItemName=Item123 </li></ul></ul><ul><ul><li>Retrieval response: </li></ul></ul><ul><ul><li><GetAttributesResult> <Attribute><Name>Color</Name><Value>Blue</Value></Attribute> <Attribute><Name>Size</Name><Value>Med</Value></Attribute> <Attribute><Name>Price</Name><Value>14.99</Value></Attribute> </GetAttributesResult> </li></ul></ul>
  11. 11. Astoria
  12. 12. Astoria in action <ul><li>Request: </li></ul><ul><ul><li>http://astoria.sandbox.live.com/northwind/northwind.rse/Categories </li></ul></ul><ul><li>Response: </li></ul>
  13. 13. Astoria in action <ul><li>Request: </li></ul><ul><ul><li>http://astoria.sandbox.live.com/northwind/northwind.rse/Customers </li></ul></ul><ul><li>Response: </li></ul>
  14. 14. Astoria in action <ul><li>Request: </li></ul><ul><ul><li>/Customers[FRANK] </li></ul></ul><ul><li>Response: </li></ul>
  15. 15. Astoria in action <ul><li>Request: </li></ul><ul><ul><li>/Customers[FRANK]/Orders </li></ul></ul><ul><li>Response: </li></ul>
  16. 16. Astoria in action <ul><li>A variety of response formats: </li></ul><ul><ul><li>POX </li></ul></ul><ul><ul><li>Web3S (Web, Structured, Schema’d and Searchable) </li></ul></ul><ul><ul><li>ATOM </li></ul></ul><ul><ul><li>JSON </li></ul></ul><ul><li>JSON request: </li></ul><ul><ul><li>/Customers[FRANK]?$format=json </li></ul></ul><ul><li>Response: </li></ul>
  17. 17. <ul><ul><li>Where is all this information going to come from? </li></ul></ul>
  18. 18. Crowdsourcing <ul><ul><li>Jeff Howe, Wired Magazine, June 2006 </li></ul></ul><ul><ul><li>Delegating an activity to a large number of unidentified individuals </li></ul></ul><ul><ul><li>Small finite tasks </li></ul></ul><ul><ul><li>Quantity more important than quality </li></ul></ul><ul><ul><li>The sum is greater than the parts </li></ul></ul><ul><ul><li>Examples: </li></ul></ul><ul><ul><ul><li>Wikipedia </li></ul></ul></ul>
  19. 19. Crowdsourcing
  20. 20. Crowdsourcing
  21. 21. Google Maps
  22. 22. Google Maps
  23. 23. Crowdsourcing <ul><ul><li>Jeff Howe, June 2006, Wired Magazine </li></ul></ul><ul><ul><li>Delegating an activity to a large number of unidentified individuals </li></ul></ul><ul><ul><li>Small finite tasks </li></ul></ul><ul><ul><li>Quantity more important than quality </li></ul></ul><ul><ul><li>The sum is greater than the parts </li></ul></ul><ul><ul><li>Examples: </li></ul></ul><ul><ul><ul><li>Wikipedia </li></ul></ul></ul><ul><ul><ul><li>Galaxy Zoo </li></ul></ul></ul><ul><ul><ul><li>Amazon Mechanical Turk </li></ul></ul></ul><ul><ul><ul><li>Google route planner </li></ul></ul></ul><ul><ul><li>Consequences: </li></ul></ul><ul><ul><ul><li>Drives down the cost of data </li></ul></ul></ul><ul><ul><ul><li>Ownership may not be the traditional incubents </li></ul></ul></ul><ul><ul><ul><li>Client / user needs to discriminate </li></ul></ul></ul>
  24. 24. What does this mean for you? <ul><ul><li>Data Provider </li></ul></ul><ul><ul><ul><li>Publish data via simple APIs </li></ul></ul></ul><ul><ul><ul><li>You data may have unexpected value </li></ul></ul></ul><ul><ul><ul><li>Innovative usage </li></ul></ul></ul><ul><ul><ul><li>Usage can enhance the quality of your data </li></ul></ul></ul><ul><ul><li>Data Consumer </li></ul></ul><ul><ul><ul><li>Many potential data sources </li></ul></ul></ul><ul><ul><ul><li>Explosive growth in available data </li></ul></ul></ul><ul><ul><ul><li>Quality of the data is potentially lower </li></ul></ul></ul><ul><ul><ul><li>… but is outweighed by quantity and richness </li></ul></ul></ul><ul><ul><li>Technical </li></ul></ul><ul><ul><ul><li>Cache database is an ideal container </li></ul></ul></ul><ul><ul><ul><li>Dynamic / extensible data structure </li></ul></ul></ul><ul><ul><ul><li>Weak data typing </li></ul></ul></ul><ul><ul><ul><li>High performance and scalability </li></ul></ul></ul>
  25. 25. <ul><ul><li>The Internet is the Database </li></ul></ul>
  26. 26. <ul><ul><li>Thank you </li></ul></ul><ul><ul><li>Questions? </li></ul></ul>

×