3. Acest material face parte din raportul pe țară pregătit de către
Societatea Academică din România (SAR) despre instituțiile
responsabile de achiziții publice în domeniul construcțiilor elaborat cu
sprijinul celui de-al Șaptelea Program Cadru al Uniunii Europene (PC 7)
pentru cercetare - Științe socio-economice și umaniste
(proiect: ANTICORRP - Global Trends and European Responses to the
Challenge of Corruption, număr Acord de finanțare UE: 290529)
http://anticorrp.eu
11. Ingredients:
Thousands of BAD CSV linesSome Legalese texts
5,745,405 CSV lines
44 CSV files
(4 more added in the meantime on the platform)
CAPTCHA codes @ SEAP
Bad data. Really bad.
14. 4,632,901 XML dumps
(and counting)
e-licitatii.ro SOAP service
(built by UTI)
4x2
2x2 CPU cores @ 100% load
2x1
A huge disk I/O
>250 € / mo. fixed fee
15.
16. The dark side: the errata story 42,436
errata documents
What we though it would be like:
● 12,426 RON 12,800 RON→
● S.C. Open Data S.R.L. S.C. Open Data S.A.→
● CPV code changes
● Contract titles
17. The dark side: the errata story 42,436
errata documents
What it was like:
● 9,342,000 RON 31,140,000 RON→
● 9,342,000 RON 14,531,650 RON (same contract)→
● 'Realizare telescaun debraiabil' 'Realizare telescaun nedebraiabil'→
22. Lessons learned
Part I: Where we failed
● We tried to use too many NEW tools at a time
● Logged too much data => increased disk I/O
● Didn't read the docs (laws)
23. Lessons learned
Part II: Where we did good
● Didn't use Windows
● Big data SSD→
● VPS: KVM > OpenVZ
● Learn to use basic tools:
● Coreutils
● Shell scripts
● GNU sed / awk
● Use a good text/code editor. Seriously.
● Know your datasets. Sometimes building > using.
● We automated some tasks with Pentaho Data Integration toolkit
24. Lessons learned
Part II: Where we did good
● Didn't use Windows
● Big data SSD→
● VPS: KVM > OpenVZ
● Learn to use basic tools:
● Coreutils
● Shell scripts
● GNU sed / awk
● Use a good text/code editor. Seriously.
● Know your datasets. Sometimes building > using.
● We automated some tasks with Pentaho Data Integration toolkit