The arduous process of producing a digital text of Descartes' letters, including mathematical formulas. It was a subtask of the CKCC project at the Huygens Institute. Lessons learned.
With Erik-Jan Bos, Utrecht.
22. convert.pl
100 KB of program code text
=
25 densely typed pages
=
3427 lines
of which
2175 real code lines
Code/Input = 1/32
23.
24. 1/3 of the tasks need 2/3 of the code
formulas: (2) 37 %
headers, openers, closers: (3) 16 %
meta and images: (3) 11 %
run time of same tasks
formulas: (2) 29 %
headers, openers, closers: (3) 6%
meta and images (3) 10 %
total run time (25) 40 sec
25. 1. Unicode is your friend
2. Split into many subtasks
3. task = configuration + workflow
4. Count and check
5. Performance matters
6. Do not give up automation
26.
27. (2a) that can be
run separately
(2b) that can be
reordered easily
28.
29.
30. was 30+ seconds
is now 2.07 seconds
many new subtasks based on same template
(gain = 15 * 30 = 7.5 min per run)
many, many runs before everything is OK
(gain = 100 * 7.5 = 12.5 hours CPU-time)
31. we used a lot of expert knowledge
which has all been transferred to
- the source
- consolidated extra inputs
so the conversion is still repeatable and modifiable
corrections hints hints hints CKCC
source formulas meta closers results
conversion program
Editor's Notes
closer detection
split into (many) tasksenable isolated tasksinput consolidated correctionsoutput material for feedbackbuild in checksshow statisticsmake performance optimisationsreduce duplicate codemaintain automation