Miten rakentaminen, teollisuus ja palvelut kehittyivät? Yliaktuaari Eljas Tuo...Tilastokeskus
More Related Content
Similar to Error detection for the statistics of external trade in goods, Garðar Páll Gíslason, Violeta Calian, Auður Ólína Svavarsdóttir, Bryndís Bjarnadóttir and Kolbrún Ýr Jóhannsdóttir, Statistics Iceland
Similar to Error detection for the statistics of external trade in goods, Garðar Páll Gíslason, Violeta Calian, Auður Ólína Svavarsdóttir, Bryndís Bjarnadóttir and Kolbrún Ýr Jóhannsdóttir, Statistics Iceland (20)
2. The goal of our project
• to automatize the detection and correction of data errors to a larger
extent
• to provide up-to-date and flexible tools to the analysts and experts in
the subject matter
10/18/2019 Error detection for the statistics of external trade in goods 2
3. Production process of foreign trade statistic
• Data input and validation of the expected IT structural requirements
• Error detection and localization, by verifying content and statistical
consistency of data
• Error correction, which includes: data imputation and editing, both
automatic and interactive
• Advanced validation procedures
• Testing and assessment of validation rules and imputation methods
10/18/2019 Error detection for the statistics of external trade in goods 3
4. The repository of error detection rules
• Github:
• https://github.com/hagstofan/TranStoR/tree/master/examples
Sql rules= starting point: expert, older software, Eurostat
R rules = target: obtained by the new R-tool TranStoR
10/18/2019 Error detection for the statistics of external trade in goods 4
5. Translation to R
• Our rules are not difficult:
• id IN (SELECT id FROM DATABASE1 F JOIN DATABASE2 M ON M.EINK = F.EINK
AND M.TLSK=F.TLSK AND M.LAND=F.LAND WHERE (M.EINK ='M' AND ((CASE
ÞYNGD WHEN 0 THEN CIF_VERÐ ELSE CIF_VERÐ/ÞYNGD END)<(VM_M-VM_S)
OR (CASE ÞYNGD WHEN 0 THEN CIF_VERÐ ELSE CIF_VERÐ/ÞYNGD
END)>(VM_S+VM_M))) OR (M.EINK ='X' AND ((CASE ÞYNGD WHEN 0 THEN
FOB_VERÐ ELSE CIF_VERÐ/ÞYNGD END)<(VM_M-VM_S) OR (CASE ÞYNGD
WHEN 0 THEN FOB_VERÐ ELSE FOB_VERÐ/ÞYNGD END)>(VM_S+VM_M))))
10/18/2019 Error detection for the statistics of external trade in goods 5
6. Error detection tool
• The validate R-package and
companions +
the rules +
the data
• http: https://github.com/data-
cleaning
10/18/2019 Error detection for the statistics of external trade in goods 6
7. Future work
• Validatetools
• ValidatReports
• Validate
• Error correction: imputation package and decrease manual
work
10/18/2019 Error detection for the statistics of external trade in goods 7
9. Appendix: examples of rules
• fields which have NULL values or have values which do not belong to
pre-defined admissable sets
• outliers in unit prices when compared to their past 12 months values
• outliers in unit prices by comparing them with their past 12 months
values (19% of warnings in the old system)
10/18/2019 Error detection for the statistics of external trade in goods 9
10. Appendix: examples of rules
SQL
• FLOW=X
• FLOW is NULL
• FLOW IN (some vector)
10/18/2019 Error detection for the statistics of external trade in goods 10
R
• FLOW == X
• is.null(FLOW)=TRUE
• FLOW %in% c(some vector)