Writing a Crawler with Python and TDD

2,666 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,666
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Writing a Crawler with Python and TDD

  1. 1. Writing a Crawler with Python and TDDThursday, July 21, 2011 1
  2. 2. The ProblemThursday, July 21, 2011 2
  3. 3. CAPThursday, July 21, 2011 3
  4. 4. Thursday, July 21, 2011 4
  5. 5. Labs LegacyThursday, July 21, 2011 5
  6. 6. Gremlins No ManualsThursday, July 21, 2011 6
  7. 7. data model? xml rdf Labs LegacyThursday, July 21, 2011 7
  8. 8. Come lo verifico?Thursday, July 21, 2011 8
  9. 9. Cosa vi aspetta • Catalogo, come funziona • Il processo che ho usato • Le librerie che ho usato • Gli strumenti che ho usato • Il risultatoThursday, July 21, 2011 9
  10. 10. The CatalogThursday, July 21, 2011 10
  11. 11. Thursday, July 21, 2011 11
  12. 12. Thursday, July 21, 2011 11
  13. 13. Thursday, July 21, 2011 11
  14. 14. Thursday, July 21, 2011 11
  15. 15. Thursday, July 21, 2011 12
  16. 16. Thursday, July 21, 2011 13
  17. 17. Thursday, July 21, 2011 14
  18. 18. The programThursday, July 21, 2011 15
  19. 19. The Program / Usage$ list-datasets http://catalog.org/descriptionDescritption at: http://catalog.org/descriptionSearch url is: http://catalog.org/rdf/?count=&q=Total number of records: 10504Url for all records: http://catalog.org/rdf/?count=10234&q=https://catalog.org/data/SRTM_CGIAR_JRC_v4/Z_10_1.TIFhttps://catalog.org/data/SRTM_CGIAR_JRC_v4/Z_10_17.TIFhttps://catalog.org/data/SRTM_CGIAR_JRC_v4/Z_10_19.TIF...Thursday, July 21, 2011 16
  20. 20. The Program / Installation $ python setup.py installThursday, July 21, 2011 17
  21. 21. The DevelopmentThursday, July 21, 2011 18
  22. 22. Tools import lxml import re import urllib2 import setuptools virtualenv pip noseThursday, July 21, 2011 19
  23. 23. Test Driven Development mantras 5minThursday, July 21, 2011 20
  24. 24. pipThursday, July 21, 2011 21
  25. 25. $ pip install SomePackage $ pip uninstall SomeUnwantedPackageThursday, July 21, 2011 22
  26. 26. virtualenvThursday, July 21, 2011 23
  27. 27. $ virtualenv env $ ls -1 env/bin/ env/bin/pip env/bin/python ...Thursday, July 21, 2011 24
  28. 28. $ env/bin/pip install nose $ env/bin/pip install lxml $ env/bin/pip freeze > stable-req.txt $ cat stable-req.txt $ env/bin/pip install -r stable-req.txtThursday, July 21, 2011 25
  29. 29. noseThursday, July 21, 2011 26
  30. 30. Standard xUnit Library import unittest Wordiness class TestSomething(unittest.TestCase): def test_math_should_work(self): self.assertEquals(3, 1+2) if __name__ == __main__: unittest.main() Nose from nose.tools import assert_equals class TestSomething: def test_math_should_work(_): assert_equals(3, 1+2)Thursday, July 21, 2011 27
  31. 31. Automatic Discovery • Nose: $ nosetest • Python <= 2.6: not supported • Python >=2.7 $ python -m unittestThursday, July 21, 2011 28
  32. 32. Nose - SkipTest • @SkipTest #function decorator • raise SkipTest() #within the codeThursday, July 21, 2011 29
  33. 33. xmllint and curl $ curl http://google.com/opensearch | xmllint --format -Thursday, July 21, 2011 30
  34. 34. Vim 7.3 • Customizable: :nnoremaps ,t :wa | !nosetests • Omni-completion: C-xC-o • Colored Column: :set cc+=1 • Two simple commands for indenting XMLs: :%s/></>r</g gg=GThursday, July 21, 2011 31

×