Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Python Web Interaction

3,394 views

Published on

Dev8D presentation showing my top 10 Python libraries for interacting with the web.

Published in: Technology
  • Be the first to comment

Python Web Interaction

  1. 1. Rob
Sanderson
 
‐
rsanderson@lanl.gov
 
‐
azaroth42@gmail.com
 
‐
@azaroth42
 Digital
Library
Prototyping
Team
 Los
Alamos
NaBonal
Laboratory,
 USA
 http://www.flickr.com/photos/42311564@N00/2355590274/ Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  2. 2. Overview Top 10 Libraries for Web Interaction •  urllib •  urllib2 •  urlparse •  httplib •  lxml •  rdflib •  json/simplejson •  mod_python, mod_wsgi •  bpython Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  3. 3. urllib >>> import urllib >>> urllib.quote('~azaroth/s?q=http://foo.com/') '%7Eazaroth/s%3Fq%3Dhttp%3A//foo.com/' >>> urllib.unquote('%7Eazaroth/s%3Fq%3Dhttp%3A//foo.com/') '~azaroth/s?q=http://foo.com/' >>> fh = urllib.urlopen('http://www.google.com/') >>> html = fh.read() >>> fh.close() >>> fh.getcode() 200 >>> fh.headers.dict['content-type'] 'text/html; charset=ISO-8859-1' Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  4. 4. urllib2 >>> import urllib2 >>> ph = urllib2.ProxyHandler( {'http' : 'http://proxyout.lanl.gov:8080/'}) >>> opener = urllib2.build_opener(ph) >>> urllib2.install_opener(opener) >>> # From now on, all requests will go through proxy >>> r = urllib2.Request('http://www.google.com/') >>> r.add_header('Referrer', 'http://www.somewhere.net') >>> fh = urllib2.urlopen(r) >>> html = fh.read() >>> fh.close() >>> # fh is the same as urllib's for headers/status Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  5. 5. urlparse >>> import urlparse >>> pr = urlparse.urlparse( 'https://www.google.com/search?q=foo&bar=bz#frag') >>> pr.scheme 'https' >>> pr.hostname 'www.google.com' >>> pr.path '/search' >>> pr.query 'q=foo&bar=bz' >>> pr.fragment 'frag' Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  6. 6. httplib >>> import httplib >>> cxn = httplib.HTTPConnection('www.google.com') >>> hdrs = {'Accept' : 'application/rdf+xml'} >>> path = "/search?q=some+search+query" >>> cxn.request("HEAD", path, headers=hdrs) >>> resp = cxn.getresponse() >>> resp.status 200 >>> resp_hdrs = dict(resp.getheaders()) >>> resp_hdrs['content-type'] # :( 'text/html; charset=ISO-8859-1' >>> data = resp.read() >>> cxn.close() Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  7. 7. lxml $ easy_install lxml >>> from lxml import etree >>> et = etree.XML('<a b="B"> A <c>C</c> </a>') >>> et.text ' A ' >>> et.attrib['b'] 'B' >>> for elem in et.iterchildren(): ... print elem <Element c at 16d1ed0> >>> html = etree.parse(StringIO.StringIO("<html><p>hi"), parser=etree.HTMLParser()) >>> html.xpath('/html/body/p') [<Element p at 16e00f0>] Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  8. 8. rdflib $ easy_install rdflib >>> import rdflib as rdf >>> inp = rdf.URLInputSource( 'http://xmlns.com/foaf/spec/20100101.rdf') >>> inp2 = rdf.StringInputSource("<a> <b> <c> .") >>> graph = rdf.ConjunctiveGraph() >>> graph.parse(inp) >>> sparql = "SELECT ?l WHERE {?w rdfs:label ?l . }" >>> res = graph.query(sparql, initNs={'rdfs':rdf.RDFS.RDFSNS})) >>> res.selected[0] rdf.Literal(u'Given name') >>> nt = graph.serialize(format='nt') Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  9. 9. json / simplejson >>> try: import simplejson as json ... except ImportError: import json >>> data = {'o' : (True, None, 1.0), "ints" : [1,2,3]} >>> json.dumps(data) '{"o": [true, null, 1.0], "ints": [1, 2, 3]}' >>> json.dumps(data, separators=(',', ':')) # compact '{"o":[true,null,1.0],"ints":[1,2,3]}' >>> json.loads('[1,2,"foo",null]') [1, 2, u'foo', None] Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  10. 10. mod_python, mod_wsgi import cgitb from mod_python import apache from mod_python.util import FieldStorage def handler(req): try: form = FieldStorage(req) # dict-like object for query path = req.uri req.status = 200 req.content_type = "text/plain" req.send_http_header() req.write(path) except: req.content_type = "text/html" cgitb.Hook(file=req).handle() return apache.OK Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London
  11. 11. bpython $ easy_install bpython $ bpython Python for Web Interaction Rob Sanderson Dev8D, Feb 24-27 2010, London

×