Your SlideShare is downloading. ×
Querying the Web
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Querying the Web

654
views

Published on

A discussion of the various ways that data on the web can be published and queried. Why SQL is not the right tool for this.

A discussion of the various ways that data on the web can be published and queried. Why SQL is not the right tool for this.

Published in: Business, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
654
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Querying the Web SlipstreamUSA :: April 2, 2008
  • 2. Querying the Web
      • “ Information wants to be free”
        • Stewart Brand, Whole Earth Catalogue
        • May 1985
      • “ Data is the Next Intel Inside”
        • Tim O’Reilly
        • September 2005
      • “ The internet is my hard drive”
        • Bruce Schneier
        • February 2008
  • 3. Freebase
  • 4. Freebase
  • 5. Freebase
  • 6. Freebase
  • 7. Freebase
    • Metaweb Query Language
    • Request:
      • { "type" : "/medicine/physician",
      • "name" : “Michael Maher“ }
    • Response:
      • { "code": "/api/status/ok", "result": { "type": "/medicine/physician", "name": “Michael Maher", “gender”: “Male”,
      • “ education”: “Leeds University”}
      • }
    • JSON
  • 8. REST
      • REpresentational State Transfer
      • Less rigourous equivalent of SOAP
      • Data are considered to be resources
      • Every resource has a unique address
      • Layered over http:
        • Client/Server separation
        • Stateless
        • Cacheable
      • Request:
        • GET http://rest.georgejames.com/product/Serenji/
      • Response:
        • Name=Serenji
        • Price=195.00
        • OrderCode=H1001
  • 9. Amazon S3
      • S3 :: Simple Storage Service
      • Online storage space
      • $0.15 per Gbyte per month for storage
      • ~ $0.20 per Gbyte data transfer
      • Storage request:
        • PUT http://s3.amazonaws.com/[bucket-name]/[key-name]
      • Retrieval request:
        • GET http://s3.amazonaws.com/[bucket-name]/[key-name]
  • 10. Amazon SimpleDB
      • Storage request:
      • https://sdb.amazonaws.com/?Action=PutAttributes &Attribute.0.Name=Color&Attribute.0.Value=Blue &Attribute.1.Name=Size&Attribute.1.Value=Med &Attribute.2.Name=Price&Attribute.2.Value=14.99 &AWSAccessKeyId=[valid access key id] &DomainName=MyDomain &ItemName=Item123
      • Retrieval request:
      • https://sdb.amazonaws.com/ ?Action=GetAttributes &AWSAccessKeyId=[valid access key id] &DomainName=MyDomain &ItemName=Item123
      • Retrieval response:
      • <GetAttributesResult> <Attribute><Name>Color</Name><Value>Blue</Value></Attribute> <Attribute><Name>Size</Name><Value>Med</Value></Attribute> <Attribute><Name>Price</Name><Value>14.99</Value></Attribute> </GetAttributesResult>
  • 11. Astoria
  • 12. Astoria in action
    • Request:
      • http://astoria.sandbox.live.com/northwind/northwind.rse/Categories
    • Response:
  • 13. Astoria in action
    • Request:
      • http://astoria.sandbox.live.com/northwind/northwind.rse/Customers
    • Response:
  • 14. Astoria in action
    • Request:
      • /Customers[FRANK]
    • Response:
  • 15. Astoria in action
    • Request:
      • /Customers[FRANK]/Orders
    • Response:
  • 16. Astoria in action
    • A variety of response formats:
      • POX
      • Web3S (Web, Structured, Schema’d and Searchable)
      • ATOM
      • JSON
    • JSON request:
      • /Customers[FRANK]?$format=json
    • Response:
  • 17.
      • Where is all this information going to come from?
  • 18. Crowdsourcing
      • Jeff Howe, Wired Magazine, June 2006
      • Delegating an activity to a large number of unidentified individuals
      • Small finite tasks
      • Quantity more important than quality
      • The sum is greater than the parts
      • Examples:
        • Wikipedia
  • 19. Crowdsourcing
  • 20. Crowdsourcing
  • 21. Google Maps
  • 22. Google Maps
  • 23. Crowdsourcing
      • Jeff Howe, June 2006, Wired Magazine
      • Delegating an activity to a large number of unidentified individuals
      • Small finite tasks
      • Quantity more important than quality
      • The sum is greater than the parts
      • Examples:
        • Wikipedia
        • Galaxy Zoo
        • Amazon Mechanical Turk
        • Google route planner
      • Consequences:
        • Drives down the cost of data
        • Ownership may not be the traditional incubents
        • Client / user needs to discriminate
  • 24. What does this mean for you?
      • Data Provider
        • Publish data via simple APIs
        • You data may have unexpected value
        • Innovative usage
        • Usage can enhance the quality of your data
      • Data Consumer
        • Many potential data sources
        • Explosive growth in available data
        • Quality of the data is potentially lower
        • … but is outweighed by quantity and richness
      • Technical
        • Cache database is an ideal container
        • Dynamic / extensible data structure
        • Weak data typing
        • High performance and scalability
  • 25.
      • The Internet is the Database
  • 26.
      • Thank you
      • Questions?