Successfully reported this slideshow.
Your SlideShare is downloading. ×

Writing a Crawler with Python and TDD

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 34 Ad
Advertisement

More Related Content

Advertisement
Advertisement

Writing a Crawler with Python and TDD

  1. 1. Writing a Crawler with Python and TDD Thursday, July 21, 2011 1
  2. 2. The Problem Thursday, July 21, 2011 2
  3. 3. CAP Thursday, July 21, 2011 3
  4. 4. Thursday, July 21, 2011 4
  5. 5. Labs Legacy Thursday, July 21, 2011 5
  6. 6. Gremlins No Manuals Thursday, July 21, 2011 6
  7. 7. data model? xml rdf Labs Legacy Thursday, July 21, 2011 7
  8. 8. Come lo verifico? Thursday, July 21, 2011 8
  9. 9. Cosa vi aspetta • Catalogo, come funziona • Il processo che ho usato • Le librerie che ho usato • Gli strumenti che ho usato • Il risultato Thursday, July 21, 2011 9
  10. 10. The Catalog Thursday, July 21, 2011 10
  11. 11. Thursday, July 21, 2011 11
  12. 12. Thursday, July 21, 2011 11
  13. 13. Thursday, July 21, 2011 11
  14. 14. Thursday, July 21, 2011 11
  15. 15. Thursday, July 21, 2011 12
  16. 16. Thursday, July 21, 2011 13
  17. 17. Thursday, July 21, 2011 14
  18. 18. The program Thursday, July 21, 2011 15
  19. 19. The Program / Usage $ list-datasets http://catalog.org/description Descritption at: http://catalog.org/description Search url is: http://catalog.org/rdf/?count=&q= Total number of records: 10504 Url for all records: http://catalog.org/rdf/? count=10234&q= https://catalog.org/data/SRTM_CGIAR_JRC_v4/Z_10_1.TIF https://catalog.org/data/SRTM_CGIAR_JRC_v4/Z_10_17.TIF https://catalog.org/data/SRTM_CGIAR_JRC_v4/Z_10_19.TIF ... Thursday, July 21, 2011 16
  20. 20. The Program / Installation $ python setup.py install Thursday, July 21, 2011 17
  21. 21. The Development Thursday, July 21, 2011 18
  22. 22. Tools import lxml import re import urllib2 import setuptools virtualenv pip nose Thursday, July 21, 2011 19
  23. 23. Test Driven Development mantras 5min Thursday, July 21, 2011 20
  24. 24. pip Thursday, July 21, 2011 21
  25. 25. $ pip install SomePackage $ pip uninstall SomeUnwantedPackage Thursday, July 21, 2011 22
  26. 26. virtualenv Thursday, July 21, 2011 23
  27. 27. $ virtualenv env $ ls -1 env/bin/ env/bin/pip env/bin/python ... Thursday, July 21, 2011 24
  28. 28. $ env/bin/pip install nose $ env/bin/pip install lxml $ env/bin/pip freeze > stable-req.txt $ cat stable-req.txt $ env/bin/pip install -r stable-req.txt Thursday, July 21, 2011 25
  29. 29. nose Thursday, July 21, 2011 26
  30. 30. Standard xUnit Library import unittest Wordiness class TestSomething(unittest.TestCase): def test_math_should_work(self): self.assertEquals(3, 1+2) if __name__ == '__main__': unittest.main() Nose from nose.tools import assert_equals class TestSomething: def test_math_should_work(_): assert_equals(3, 1+2) Thursday, July 21, 2011 27
  31. 31. Automatic Discovery • Nose: $ nosetest • Python <= 2.6: not supported • Python >=2.7 $ python -m unittest Thursday, July 21, 2011 28
  32. 32. Nose - SkipTest • @SkipTest #function decorator • raise SkipTest() #within the code Thursday, July 21, 2011 29
  33. 33. xmllint and curl $ curl http://google.com/opensearch | xmllint --format - Thursday, July 21, 2011 30
  34. 34. Vim 7.3 • Customizable: :nnoremaps ,t :wa | !nosetests • Omni-completion: C-xC-o • Colored Column: :set cc+=1 • Two simple commands for indenting XMLs: :%s/></>r</g gg=G Thursday, July 21, 2011 31

×