Querying the Web SlipstreamUSA :: April 2, 2008
Querying the Web “ Information wants to be free” Stewart Brand, Whole Earth Catalogue  May 1985 “ Data is the Next Intel Inside” Tim O’Reilly  September 2005 “ The internet is my hard drive” Bruce Schneier February 2008
Freebase
Freebase
Freebase
Freebase
Freebase Metaweb Query Language Request: {  "type" : "/medicine/physician",  "name" : “Michael Maher“ } Response: { "code": "/api/status/ok",   "result": {   "type": "/medicine/physician",    "name": “Michael Maher",   “gender”: “Male”, “ education”: “Leeds University”} } JSON
REST REpresentational State Transfer Less rigourous equivalent of SOAP Data are considered to be resources Every resource has a unique address Layered over http: Client/Server separation Stateless Cacheable Request: GET http://rest.georgejames.com/product/Serenji/ Response: Name=Serenji Price=195.00 OrderCode=H1001
Amazon S3 S3 :: Simple Storage Service Online storage space $0.15 per Gbyte per month for storage ~ $0.20 per Gbyte data transfer Storage request: PUT http://s3.amazonaws.com/[bucket-name]/[key-name]   Retrieval request: GET http://s3.amazonaws.com/[bucket-name]/[key-name]
Amazon SimpleDB Storage request: https://sdb.amazonaws.com/?Action=PutAttributes &Attribute.0.Name=Color&Attribute.0.Value=Blue &Attribute.1.Name=Size&Attribute.1.Value=Med &Attribute.2.Name=Price&Attribute.2.Value=14.99 &AWSAccessKeyId=[valid access key id] &DomainName=MyDomain  &ItemName=Item123 Retrieval request: https://sdb.amazonaws.com/ ?Action=GetAttributes &AWSAccessKeyId=[valid access key id]  &DomainName=MyDomain  &ItemName=Item123  Retrieval response: <GetAttributesResult> <Attribute><Name>Color</Name><Value>Blue</Value></Attribute> <Attribute><Name>Size</Name><Value>Med</Value></Attribute> <Attribute><Name>Price</Name><Value>14.99</Value></Attribute>  </GetAttributesResult>
Astoria
Astoria in action Request: http://astoria.sandbox.live.com/northwind/northwind.rse/Categories Response:
Astoria in action Request: http://astoria.sandbox.live.com/northwind/northwind.rse/Customers Response:
Astoria in action Request: /Customers[FRANK] Response:
Astoria in action Request: /Customers[FRANK]/Orders Response:
Astoria in action A variety of response formats: POX Web3S (Web, Structured, Schema’d and Searchable) ATOM  JSON JSON request: /Customers[FRANK]?$format=json Response:
Where is all this information going to come from?
Crowdsourcing Jeff Howe, Wired Magazine, June 2006 Delegating an activity to a large number of unidentified individuals Small finite tasks Quantity more important than quality The sum is greater than the parts Examples:  Wikipedia
Crowdsourcing
Crowdsourcing
Google Maps
Google Maps
Crowdsourcing Jeff Howe, June 2006, Wired Magazine Delegating an activity to a large number of unidentified individuals Small finite tasks Quantity more important than quality The sum is greater than the parts Examples:  Wikipedia Galaxy Zoo Amazon Mechanical Turk Google route planner Consequences: Drives down the cost of data Ownership may not be the traditional incubents Client / user needs to discriminate
What does this mean for you? Data Provider Publish data via simple APIs You data may have unexpected value Innovative usage Usage can enhance the quality of your data Data Consumer Many potential data sources Explosive growth in available data Quality of the data is potentially lower … but is outweighed by quantity and richness Technical Cache database is an ideal container Dynamic / extensible data structure Weak data typing High performance and scalability
The Internet is the Database
Thank you Questions?

Querying the Web

  • 1.
    Querying the WebSlipstreamUSA :: April 2, 2008
  • 2.
    Querying the Web“ Information wants to be free” Stewart Brand, Whole Earth Catalogue May 1985 “ Data is the Next Intel Inside” Tim O’Reilly September 2005 “ The internet is my hard drive” Bruce Schneier February 2008
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
    Freebase Metaweb QueryLanguage Request: { &quot;type&quot; : &quot;/medicine/physician&quot;, &quot;name&quot; : “Michael Maher“ } Response: { &quot;code&quot;: &quot;/api/status/ok&quot;, &quot;result&quot;: { &quot;type&quot;: &quot;/medicine/physician&quot;, &quot;name&quot;: “Michael Maher&quot;, “gender”: “Male”, “ education”: “Leeds University”} } JSON
  • 8.
    REST REpresentational StateTransfer Less rigourous equivalent of SOAP Data are considered to be resources Every resource has a unique address Layered over http: Client/Server separation Stateless Cacheable Request: GET http://rest.georgejames.com/product/Serenji/ Response: Name=Serenji Price=195.00 OrderCode=H1001
  • 9.
    Amazon S3 S3:: Simple Storage Service Online storage space $0.15 per Gbyte per month for storage ~ $0.20 per Gbyte data transfer Storage request: PUT http://s3.amazonaws.com/[bucket-name]/[key-name] Retrieval request: GET http://s3.amazonaws.com/[bucket-name]/[key-name]
  • 10.
    Amazon SimpleDB Storagerequest: https://sdb.amazonaws.com/?Action=PutAttributes &Attribute.0.Name=Color&Attribute.0.Value=Blue &Attribute.1.Name=Size&Attribute.1.Value=Med &Attribute.2.Name=Price&Attribute.2.Value=14.99 &AWSAccessKeyId=[valid access key id] &DomainName=MyDomain &ItemName=Item123 Retrieval request: https://sdb.amazonaws.com/ ?Action=GetAttributes &AWSAccessKeyId=[valid access key id] &DomainName=MyDomain &ItemName=Item123 Retrieval response: <GetAttributesResult> <Attribute><Name>Color</Name><Value>Blue</Value></Attribute> <Attribute><Name>Size</Name><Value>Med</Value></Attribute> <Attribute><Name>Price</Name><Value>14.99</Value></Attribute> </GetAttributesResult>
  • 11.
  • 12.
    Astoria in actionRequest: http://astoria.sandbox.live.com/northwind/northwind.rse/Categories Response:
  • 13.
    Astoria in actionRequest: http://astoria.sandbox.live.com/northwind/northwind.rse/Customers Response:
  • 14.
    Astoria in actionRequest: /Customers[FRANK] Response:
  • 15.
    Astoria in actionRequest: /Customers[FRANK]/Orders Response:
  • 16.
    Astoria in actionA variety of response formats: POX Web3S (Web, Structured, Schema’d and Searchable) ATOM JSON JSON request: /Customers[FRANK]?$format=json Response:
  • 17.
    Where is allthis information going to come from?
  • 18.
    Crowdsourcing Jeff Howe,Wired Magazine, June 2006 Delegating an activity to a large number of unidentified individuals Small finite tasks Quantity more important than quality The sum is greater than the parts Examples: Wikipedia
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    Crowdsourcing Jeff Howe,June 2006, Wired Magazine Delegating an activity to a large number of unidentified individuals Small finite tasks Quantity more important than quality The sum is greater than the parts Examples: Wikipedia Galaxy Zoo Amazon Mechanical Turk Google route planner Consequences: Drives down the cost of data Ownership may not be the traditional incubents Client / user needs to discriminate
  • 24.
    What does thismean for you? Data Provider Publish data via simple APIs You data may have unexpected value Innovative usage Usage can enhance the quality of your data Data Consumer Many potential data sources Explosive growth in available data Quality of the data is potentially lower … but is outweighed by quantity and richness Technical Cache database is an ideal container Dynamic / extensible data structure Weak data typing High performance and scalability
  • 25.
    The Internet isthe Database
  • 26.