Writing a Crawler with Python and TDD
Upcoming SlideShare
Loading in...5
×
 

Writing a Crawler with Python and TDD

on

  • 2,046 views

 

Statistics

Views

Total Views
2,046
Views on SlideShare
2,042
Embed Views
4

Actions

Likes
0
Downloads
15
Comments
0

2 Embeds 4

http://www.linkedin.com 3
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Writing a Crawler with Python and TDD Writing a Crawler with Python and TDD Presentation Transcript

  • Writing a Crawler with Python and TDDThursday, July 21, 2011 1
  • The ProblemThursday, July 21, 2011 2
  • CAPThursday, July 21, 2011 3
  • Thursday, July 21, 2011 4
  • Labs LegacyThursday, July 21, 2011 5
  • Gremlins No ManualsThursday, July 21, 2011 6
  • data model? xml rdf Labs LegacyThursday, July 21, 2011 7
  • Come lo verifico?Thursday, July 21, 2011 8
  • Cosa vi aspetta • Catalogo, come funziona • Il processo che ho usato • Le librerie che ho usato • Gli strumenti che ho usato • Il risultatoThursday, July 21, 2011 9
  • The CatalogThursday, July 21, 2011 10
  • Thursday, July 21, 2011 11
  • Thursday, July 21, 2011 11
  • Thursday, July 21, 2011 11
  • Thursday, July 21, 2011 11
  • Thursday, July 21, 2011 12
  • Thursday, July 21, 2011 13
  • Thursday, July 21, 2011 14
  • The programThursday, July 21, 2011 15
  • The Program / Usage$ list-datasets http://catalog.org/descriptionDescritption at: http://catalog.org/descriptionSearch url is: http://catalog.org/rdf/?count=&q=Total number of records: 10504Url for all records: http://catalog.org/rdf/?count=10234&q=https://catalog.org/data/SRTM_CGIAR_JRC_v4/Z_10_1.TIFhttps://catalog.org/data/SRTM_CGIAR_JRC_v4/Z_10_17.TIFhttps://catalog.org/data/SRTM_CGIAR_JRC_v4/Z_10_19.TIF...Thursday, July 21, 2011 16
  • The Program / Installation $ python setup.py installThursday, July 21, 2011 17
  • The DevelopmentThursday, July 21, 2011 18
  • Tools import lxml import re import urllib2 import setuptools virtualenv pip noseThursday, July 21, 2011 19
  • Test Driven Development mantras 5minThursday, July 21, 2011 20
  • pipThursday, July 21, 2011 21
  • $ pip install SomePackage $ pip uninstall SomeUnwantedPackageThursday, July 21, 2011 22
  • virtualenvThursday, July 21, 2011 23
  • $ virtualenv env $ ls -1 env/bin/ env/bin/pip env/bin/python ...Thursday, July 21, 2011 24
  • $ env/bin/pip install nose $ env/bin/pip install lxml $ env/bin/pip freeze > stable-req.txt $ cat stable-req.txt $ env/bin/pip install -r stable-req.txtThursday, July 21, 2011 25
  • noseThursday, July 21, 2011 26
  • Standard xUnit Library import unittest Wordiness class TestSomething(unittest.TestCase): def test_math_should_work(self): self.assertEquals(3, 1+2) if __name__ == __main__: unittest.main() Nose from nose.tools import assert_equals class TestSomething: def test_math_should_work(_): assert_equals(3, 1+2)Thursday, July 21, 2011 27
  • Automatic Discovery • Nose: $ nosetest • Python <= 2.6: not supported • Python >=2.7 $ python -m unittestThursday, July 21, 2011 28
  • Nose - SkipTest • @SkipTest #function decorator • raise SkipTest() #within the codeThursday, July 21, 2011 29
  • xmllint and curl $ curl http://google.com/opensearch | xmllint --format -Thursday, July 21, 2011 30
  • Vim 7.3 • Customizable: :nnoremaps ,t :wa | !nosetests • Omni-completion: C-xC-o • Colored Column: :set cc+=1 • Two simple commands for indenting XMLs: :%s/></>r</g gg=GThursday, July 21, 2011 31