Lightning talk I gave at the London PyData Meetup in October 2015. In it I describe an approach to generate huge amounts of tests by automatically generating tests based on test data.
The code is available at: https://github.com/bolsote/ukpcre
3. A typical British postcode has a clearly defined
structure...
...but many possible variations
Writing a correct regular expression is feasible, but tricky
So, so many cases to test against
13. Since you ask nicely...
Let's use pytest to generate tests from the test dataset
Note for testing nerds: Yes, this is just a semi-magic form
of table-driven tests.
14. TESTING STRATEGY (I)
1. Check if the postcode matches
2. Check it matches the right thing
3. Check the first part is matched correctly
4. Check the second part is matched correctly
15. class ValidPostcodeTests:
def test_match(self, data, postcode, first, second):
assert pattern.match(data)
def test_groups(self, data, postcode, first, second):
assert pattern.match(data).group('postcode') == postcode
assert pattern.match(data).group('first') == first
assert pattern.match(data).group('second') == second
18. TESTING STRATEGY (III)
Test for variations as well:
Lowercase postcodes
Postcodes without space separator
Partial postcodes
Any combination of those
19. transforms = {
"lowercase": partial(
_transform_factory,
lambda k: k.lower() if k else k
),
"nospace": partial(
_transform_factory,
lambda k: k.replace(' ', '') if k else k
),
}
20. Once you have all the combinations, generate tests.
class DataGenerator:
def generate_data(self):
for item in self.dataset:
for transform in self._get_transform_combinations():
yield self._transform_data(item, *transform)
def generate_ids(self):
for item in self.dataset:
for transform in self._get_transform_combinations():
yield "{}-{}".format(
item[0].replace(" ", "_"),
"_".join(transform) or "original"
)
21. Finally, hook that into pytest
def pytest_generate_tests(metafunc):
if metafunc.cls:
gen = DataGenerator(metafunc.cls.postcodes)
metafunc.parametrize(
'data,postcode,first,second',
list(gen.generate_data()),
ids=list(gen.generate_ids()),
scope='class'
)
25. First, extract the useful bits in a useful format
def get_postcodes(datadir):
for fn in listdir(datadir):
with open(datadir + fn) as f:
for row in reader(f):
yield row[0]
if __name__ == '__main__':
datadir = 'tests/data/Data/CSV/'
postcodes = list(get_postcodes(datadir))
with open("tests/OSpostcodes.db", "wb") as f:
pickle.dump(postcodes, f, pickle.HIGHEST_PROTOCOL)
26. Then, generate millions of tests
def pytest_generate_tests(metafunc):
if pytest.config.getoption("osdb"):
with open("tests/OSpostcodes.db", "rb") as f:
metafunc.parametrize("postcode", pickle.load(f))
else:
metafunc.parametrize("postcode", pytest.skip([]))
def test_OS_postcodes(postcode):
assert pattern.match(postcode)