SlideShare a Scribd company logo
1 of 45
Download to read offline
Rainbio
workflow to build up the database
Rainer Zaiss
IRD AMAP
Rainbio meeting
Aix-en-Provence 18-22 may 2015
Rainbio database Child databases
5567
1596
4529
3565
12874
449509
55919
94829
14510
147504
62380
2054
AO
BK
BS
DH
GD
JW
KW
MS
OH
TS
UB
VD
0 50000 100000 150000 200000 250000 300000 350000 400000 450000
Number of observations - child databases
Total: 854 836 observations
ID DATABASE Owner
AO Anne Black-Overgraad
BK Barbara McKinder
BS Bonaventure Sonkรฉ
DH David Harris
GD Gilles Dauby
JW Jan Wieringa
KW KEW GBIF
MS Marc Sosef
OH Olivier Hardy
TS Tariq Stevart
UB UB
VD Vincent Droissart
Child databases
Rainbio database: unique ID of an observation (line)
ID DATABASE + unique ID of observation in child database
Exemples: BK1239, MSBR0000008978882BGB310957, JW64476
Rainbio database RAINBIO profile
Rainbio profile
TYPE COLUMN TYPE EXEMPLES UPDATE DESCRIPTION
UNIQUE IDS
TRACKING OF DUPLICATES
idrb integer serial AUTO Internal unique ID of the observation in Rainbio database
idc character GD000001 Unique identifier of record in child dataset
idsc character JW1234545,BK12334 SCRIPT Duplicates
tax character AUTO Taxon name (fam,gen+esp+rank01+nam01+rank02+gen02)
TAXON
INFORMATION
fam character Family to which specimen belongs
gen character Genus to which specimen belongs
esp character Epithet of species to which specimen belongs
rank01 character var., subsp., โ€ฆ Rank of the first infrataxonomic name
nam01 character First infrataxonomic name
rank02 character var., subsp., โ€ฆ Rank of the second infrataxonomic name
nam02 character Second infrataxonomic name
DETERMINATION
INFORMATION
detok character D, OK Determination status of the specimen: D: doubtful; OK: no doubt indicated
detnam character Name of the identifier
dety integer, Year of the identification
detm integer, Month of the identification
detd integer, Day of the identification
LOCATION
INFORMATION
iso3 character AUTO ISO3 country code
country character Country where the specimen was collected
iso3lonlat character AUTO ISO3 country code from geographical join with country layer
maj_area character Major area of the country where the specimen was collected
loc_notes character Locality notes telling where the specimen was collected
ddlon real, Longitude at which the specimen was collected in DD
ddlat real, Latitude at which the specimen was collected in DD
accuracy integer, 0,1, โ€ฆ, 8 BRAHMS code, Accuracy of the georeference of the specimen
alt integer, Altitude of the specimen in meters
Rainbio profile Other columns
Rainbio database RAINBIO profile
Rainbio profile
TYPE COLUMN TYPE EXEMPLES UPDATE DESCRIPTION
DETERMINATION
INFORMATION
colnam character Collector name including initials
prefix character Prefix of the number the collector gave to the specimen
nbr integer, Number the collector gave to the specimen
suffix character Suffix of the number the collector gave to the specimen
colnamsup character Additional collectors associated with the specimen
coly integer, Year when the specimen was collected
colm integer, Month when the specimen was collected
cold integer, Day when the specimen was collected
DESCRIPTIVE
INFORMATION
kind_col character
Herb, Sili, Observation,
Plot_data
Kind of collection: Herb: herbarium voucher; sili: silica gel voucher; etc
dups character Acronyms of herbaria were duplicates are found (komma-delimited)
description character Description of the specimen in the field
pheno_fl character st, Fl, Fr Phenological state of the specimen: st: sterile; Fl: flowering; Fr: fruiting
pheno_fr character yes, no
habitat character Habitat in which specimen was collected
habit character Tr, Li, Sh, He, Ep Habit of the specimen: Tr: tree; Li: liana; Sh: shrub; He: herb; Ep: epiphyte
QUALITY FLAGS
verif_iso3 character AUTO Code verification ISO3 country code (column iso3 and iso3lonlat)
verif_coast character AUTO Code verification coastline
verif_distance integer, AUTO Distance in meters
fktax integer, AUTO ID taxon name Jan reference table
verif_fktax character AUTO Code verification taxon name
calc_accuracy integer, AUTO BRAHMS code, accuracy of the georeference of the specimen calculated from ddlon and ddlat
Rainbio profile Other columns
Rainbio database Data integration
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
Rainbio database Data integration
Number of empty cells and non-empty cells by child databases
TYPE COLUMNS value AO BK BS DH GD JW KW MS OH TS UB VD RAINBIO %
TAXON
INFORMATION
fam
OK 5567 1596 4529 3565 12874 445828 55908 94829 14510 147503 60289 2054 849052 99%
NULL 3681 11 1 2091 5784 1%
gen
OK 5567 1596 4529 3565 12874 436337 55513 94829 14508 141277 55613 2054 828262 97%
NULL 13172 406 2 6227 6767 26574 3%
esp
OK 5380 1571 4512 3565 12874 399557 54075 94344 13744 120826 43315 2054 755817 88%
NULL 187 25 17 49952 1844 485 766 26678 19065 99019 12%
rank01
OK 533 186 18 31049 6863 12641 354 17644 2317 224 71829 8%
NULL 5567 1596 3996 3379 12856 418460 49056 82188 14156 129860 60063 1830 783007 92%
nam01
OK 533 186 18 31049 6863 12641 354 17644 2317 224 71829 8%
NULL 5567 1596 3996 3379 12856 418460 49056 82188 14156 129860 60063 1830 783007 92%
rank02
OK 13 18 1366 416 329 22 22 2186 0%
NULL 5567 1596 4529 3552 12856 448143 55919 94413 14181 147482 62358 2054 852650 100%
nam02
OK 13 18 1366 416 329 22 22 2186 0%
NULL 5567 1596 4529 3552 12856 448143 55919 94413 14181 147482 62358 2054 852650 100%
DETERMINATION
INFORMATION
detok
OK 5567 1587 4529 2 12874 1791 94829 90 1618 2054 124941 15%
NULL 9 3563 447718 55919 14420 147504 60762 729895 85%
detnam
OK 2372 713 2019 253070 31284 37886 5890 94508 52674 2054 482470 56%
NULL 3195 883 4529 1546 12874 196439 24635 56943 8620 52996 9706 372366 44%
dety
OK 1334 571 1924 238194 6449 32162 78330 52896 411860 48%
NULL 4233 1025 4529 1641 12874 211315 49470 62667 14510 69174 9484 2054 442976 52%
detm
OK 94 412 1112 122547 6449 13640 52911 197165 23%
NULL 5473 1184 4529 2453 12874 326962 49470 81189 14510 147504 9469 2054 657671 77%
detd
OK 86 56 1103 64906 6449 5328 52911 130839 15%
NULL 5481 1540 4529 2462 12874 384603 49470 89501 14510 147504 9469 2054 723997 85%
Rainbio database Data integration
Number of empty cells and non-empty cells by child databases
TYPE COLUMNS value AO BK BS DH GD JW KW MS OH TS UB VD RAINBIO %
LOCATION
INFORMATION
country
OK 5507 1587 4529 3565 12874 449509 55919 94829 14510 147503 62380 2054 854766 100%
NULL 60 9 1 70 0%
maj_area
OK 1751 445 1486 12874 342314 55819 47874 130551 14432 607546 71%
NULL 3816 1151 3043 3565 107195 100 46955 14510 16953 47948 2054 247290 29%
loc_notes
OK 4493 1500 4490 12874 439779 41135 85639 13358 143388 61965 2035 810656 95%
NULL 1074 96 39 3565 9730 14784 9190 1152 4116 415 19 44180 5%
ddlon
OK 4440 1352 4524 3565 12874 359934 55919 60054 14510 129244 45284 2054 693754 81%
NULL 1127 244 5 89575 34775 18260 17096 161082 19%
ddlat
OK 4440 1351 4524 3565 12874 359934 55919 60054 14510 129244 45284 2054 693753 81%
NULL 1127 245 5 89575 34775 18260 17096 161083 19%
accuracy
OK 5567 1596 4529 3516 12874 294256 55919 7593 10 21860 2054 409774 48%
NULL 49 155253 87236 14500 147504 40520 445062 52%
COLLECTION
INFORMATION
colnam
OK 5023 1596 4529 3565 12874 449482 55919 94829 14510 147503 62352 2054 854236 100%
NULL 544 27 1 28 600 0%
prefix
OK 1313 216 136 11 12873 38015 3122 6481 14510 5718 7265 72 89732 10%
NULL 4254 1380 4393 3554 1 411494 52797 88348 141786 55115 1982 765104 90%
nbr
OK 4639 1491 4502 3558 12868 436782 54716 89608 14468 146630 61750 1793 832805 97%
NULL 928 105 27 7 6 12727 1203 5221 42 874 630 261 22031 3%
suffix
OK 116 156 95 53 8152 1197 2335 41 2165 6220 13 20543 2%
NULL 5451 1440 4434 3512 12874 441357 54722 92494 14469 145339 56160 2041 834293 98%
colnamsup
OK 894 280 1244 2044 12511 151515 7803 29 52 47900 41997 1013 267282 31%
NULL 4673 1316 3285 1521 363 297994 48116 94800 14458 99604 20383 1041 587554 69%
coly
OK 4589 1429 2026 3530 12874 435134 48764 81186 142901 60843 1775 795051 93%
NULL 978 167 2503 35 14375 7155 13643 14510 4603 1537 279 59785 7%
colm
OK 2239 1340 1239 3519 12872 420762 48750 75935 140981 60848 1770 770255 90%
NULL 3328 256 3290 46 2 28747 7169 18894 14510 6523 1532 284 84581 10%
cold
OK 1997 1119 1627 3515 12874 392536 48750 61978 135265 60848 1745 722254 84%
NULL 3570 477 2902 50 56973 7169 32851 14510 12239 1532 309 132582 16%
Rainbio database Data integration
Number of empty cells and non-empty cells by child databases
TYPE COLUMNS value AO BK BS DH GD JW KW MS OH TS UB VD RAINBIO %
DESCRIPTIVE
INFORMATION
kind_col
OK 5567 1596 4529 3565 12874 449509 94829 14510 62380 2054 651413 76%
NULL 55919 147504 203423 24%
dups
OK 5208 1566 4497 449469 55919 39402 89931 645992 76%
NULL 359 30 32 3565 12874 40 55427 14510 57573 62380 2054 208844 24%
description
OK 2265 983 249261 45027 33279 7866 118563 35700 1164 494108 58%
NULL 3302 613 4529 3565 12874 200248 10892 61550 6644 28941 26680 890 360728 42%
pheno_fl
OK 802 1034 2305 8475 6836 21826 41278 5%
NULL 4765 562 2224 3565 12874 449509 55919 86354 14510 140668 40554 2054 813558 95%
pheno_fr
OK 288 435 3893 1480 9822 15918 2%
NULL 5279 1161 4529 3565 12874 449509 55919 90936 14510 146024 52558 2054 838918 98%
habitat
OK 839 716 1048 12874 242986 40738 44872 7743 4080 47375 403271 47%
NULL 4728 880 3481 3565 206523 15181 49957 6767 143424 15005 2054 451565 53%
habit
OK 853 12874 13727 2%
NULL 5567 743 4529 3565 449509 55919 94829 14510 147504 62380 2054 841109 98%
Rainbio database Values out of range csv files
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
Values out of range
1. Skip, correct or leave them like they are โ€ฆ
2. Export as csv files
Rainbio database Values out of range csv files
1. utf8_error_data_alt_8000_or_0_v05.csv
Altitude out of range <0 OR >8000 (37 records)
2. utf8_error_data_coly_dety_v05.csv
Year of collect < Year of determination (437 records)
3. utf8_error_data_coly_v05.csv
Year of collect > 2014 or Year of collect < 1200 (10 records)
4. utf8_error_data_lonlat_v05.csv
longitude = 0 and latitude= 0 or longitude >= 180 or longitude <= -180 or
latitude >= 90 or latitude <= -90 (1895 records)
FOLLOW UP: leave them like they are or โ€ฆ..
Rainbio database Values out of range csv files
Location : database_csv/errors
Encoding : UTF8
You have to select Unicode (UTF8)
to open the files in Excel.
Otherwise you will have problems with
the conversion of special characters like รฉ, ร , โ€ฆ
Exemple
If you do not select UTF8: Lรƒยฉon J.
If you select UTF8: Lรฉon J.
Rainbio database Standardization: collector names
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
Janโ€™s auteurs
reference
table
+
โ€œNEWโ€
names
Collector names (column colnam and colnamsup)
1. Link collector names with reference table from Jan
2. Track changes in CSV file to better the standardization
Rainbio database Standardization: collector names
Link name of main collector with reference table from Jan
Standards
Column โ€œcolnamโ€ and โ€œcolnamsupโ€
1. Format: Name, Initials Prefix
Examples: Brenan, J.P.M., Forestor, H. de
2. Keep accented version
Example: Bovรฉ, N.
Column โ€œcolnamsupโ€
3. Separator of names: ;
Example: Arends, J.C.; Bruijn, J. de
FOLLOW UP: leave them like they are or โ€ฆ.. Some strange names remain like 'Peterโ€˜, [ B.J. Brutt], Agric Department, โ€ฆ
Rainbio database Standardization: collector names
CSV file to better the standardization of collector names
fkdb c_colnam colnam colnam_modify refcolnam iso3 type
JW(3) Achomfo Nangasudo, NB Achomfo Nangasudo, N.B. Achomfo Nangasudo, N.B. Achomfo Nangasudo, N.B.
GHA|JW(3) COLSUP
MS(1)||BS(1) Achoundong G. Achoundong, G. Achoundong, G. Achoundong, G. CMR|MS(1)||CMR|BS(1) COL||COL
JW(22) Achoundong, G Achoundong, G. Achoundong, G. Achoundong, G. CMR|JW(22) COL
KW(18) Achoundong, G. Achoundong, G. Achoundong, G. Achoundong, G. CMR|KW(18) COL
AO(2)||JW(1)||JW(1)||JW(1)||JW(1)||JW(13)||JW(2)||JW(3)||KW(1)||KW(1)||TS(1)||TS(44)
G. Achoundong||Achoundong, G; Freddy & Enow||Zapfack, L; Achoundong, G; Onana, J-M; Elad, ME; Aggi; Ndumbe, P & Nguembock, F||Achoundong, G & Nana, Z||Achoun
Achoundong, G. Achoundong, G. Achoundong, G. CMR|AO(2)||CMR|JW(1)||CMR|JW(1)||CMR|JW(1)||GIN|JW(1)||
COLSUP
MS(211) Achten L.T. Achten, L.T.M. Achten, L.T.M. Achten, L.T.M. COD|MS(211) COL
JW(22) Achten, LTM Achten, L.T.M. Achten, L.T.M. Achten, L.T.M. COD|JW(22) COL
TS(2) L. Achten Achten, L.T.M. Achten, L.T.M. Achten, L.T.M. COD|TS(2) COL
AO(2)||KW(2) Martin Achu||Achu, M. Achu, M. Achu, M. CMR|AO(2)||CMR|KW(2) COLSUP
JW(1)||JW(1)||JW(2)||JW(5)||JW(5)
Chouaibou, K; Toh, C; Biye, EH; Tadjouteu, F; Rheede, C van de; Iwanaka; Achu, PF & Garcia, J||Njie, F; Chouaibou, K; Gwellem Abula, J; Wanduku, D; Muma Ngu, N; Fomba, V
Achu, P.F. Achu, P.F. Achu, P.F. CMR|JW(1)||CMR|JW(1)||CMR|JW(2)||CMR|JW(5)||CMR|JW(5)
COLSUP
JW(16)||JW(2)||JW(39) Mackinder, BA; Nana, V & Achuo, F||Achuo, F; Ndene, R & Okon, FI||Mackinder, BA; Nana, V; Achuo, F; Abwe, E & Morgan, B
Achuo, F. Achuo, F. Achuo, F. CMR|JW(16)||CMR|JW(2)||CMR|JW(39) COLSUP
MS(11) Acocks J.P.H. Acocks, J.P.H. Acocks, J.P.H. Acocks, J.P.H. NAM|MS(1),ZAF|MS(10) COL
JW(406)||JW(1) Acocks, JPH Acocks, J.P.H. Acocks, J.P.H. Acocks, J.P.H. NAM|JW(4),ZAF|JW(402)||ZAF|JW(1) COL||COLSUP
Name: utf8_error_data_colnam_colnamsup_vXX.csv (22620 lines)
Location : database_csv/nams
Encoding : UTF8
Initial name in
child database
Child database
(number of lines)
Current name in
Rainbio database
Name in Janโ€™s table
Empty if not in Janโ€™s table
COL: main col.
COLSUP: add. Col.
ISO3 country code +
first column
If you would like to help to better the standardization:
1. Do not touch the column in red
2. Just enter the name (Name, Initials Prefix) that you would like to retain in the column in green
Rainbio database Link taxonomy reference table
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
Janโ€™s auteurs
reference
table
+
โ€œNEWโ€
names
Janโ€™s taxonomy
reference
table
+
โ€œSpecial casesโ€
Barbara
Link taxonomy reference table
1. Link taxon name of the child database to
Janโ€™s taxonomy reference table to get the valid name
2. Export CSV files to track errors
Quality flag
taxonomy
46269
41993
Jan's reference table
valid names
all observations
After data migration
taxon names
child databases
0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000
Number of taxa by taxonomy quality flag
OK OK BARBARA TAXNAM NO MATCH MORE THAN ONE VALID IDTAX FOR TAXNAM FKTAX NO VALID NAME MATCH
42648
Rainbio database Link taxonomy reference table
Link taxon name of child databases to Janโ€™s taxonomy reference table
Taxonomy link quality flag
Rainbio database Link taxonomy reference table
1. Taxonomy quality flag: TAXNAM NO MATCH
Name: utf8_error_data_tax_match_ref_tab_tax_vXX
Location : database_csv/tax_match
Code
child databases
TAXNAM NO MATCH:
JAN: names from Jan
Current name
in Rainbio database
Columns to store the taxon name in the Rainbio database
FOLLOW UP: leave them like they are or correct themโ€ฆ
1. Do not touch the column in red
2. Enter the correct information in the columns in green or enter the valid taxon id to create a special case
The file holds all names in use in the Rainbio database from Janโ€™s reference table +
413 non matching names (TAXNAM NO MATCH)
Valid ID
from Jan
verif_fktax fkdb idvalid tax gen esp rank01 nam01 rank02 nam02
JAN 13880 Acalypha indica
JAN 38314 Acalypha integrifolia
TAXNAM NO MATCH TS Acalypha integrifolia var. crateriana Acalypha integrifolia var. crateriana
JAN 17536 Acalypha intermedia
Taxon name of child database not in Janโ€™s taxonomy reference table
Rainbio database Link taxonomy reference table
1. Taxonomy quality flag: FKTAX NO VALID NAME MATCH
Name: utf8_error_ref_tax_link_idtax_valid_name_no_match_vXX
Location : database_csv/tax_match
14 cases in Rainbio database
Problem in Janโ€™s taxonomy reference
Simple or more complicated loops in synonyms relationship
idtax relation
7595 FKTAX FLAGGED AS HOMONYME
304123 SYN OF SYN OF ID: 304123,304127,304123||SYN OF SYN OF ID: 375298,304127,304123
313097 SYN OF SYN OF ID: 346134,49195,313097||SYN OF SYN OF ID: 346135,49195,313097
ID
from Jan
Relationship inside Janโ€™s table
FOLLOW UP: leave them like they are, correct them in Janโ€™s taxonomy reference table or create some special casesโ€ฆ
Rainbio database Link taxonomy reference table
1. Taxonomy quality flag: MORE THAN ONE VALID IDTAX FOR TAXNAM
Name: utf8_error_ref_tax_link_idtax_tax_not_unique_vXX.csv
Location : database_csv/tax_match
191 names in Rainbio database
Problem in Janโ€™s taxonomy reference
Same name at least twice in Janโ€™s reference table with different relationships
FOLLOW UP: leave them like they are, correct them in Janโ€™s taxonomy reference table or create some special casesโ€ฆ
Taxon name Relationship inside Janโ€™s table
idtax tax relation idvalid
197 Schmidelia javensis f. genuinus SYN OF VALID: 77982,197||SYN OF SYN OF ID: 156194,230960,197 197
197 Schmidelia velutina SYN OF VALID: 78014,197||SYN OF VALID: 103535,103391 197
103391 Schmidelia velutina SYN OF VALID: 78014,197||SYN OF VALID: 103535,103391 197
ID that we use
Valid ID
from Jan
Rainbio database Quality flags geographical coordinates
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
Janโ€™s auteurs
reference
table
+
โ€œNEWโ€
names
Janโ€™s taxonomy
reference
table
+
โ€œSpecial casesโ€
Barbara
Quality flag
taxonomy
Quality flag
geographical
coordinates
Quality flags geographical coordinates
Columns verif_iso3 and verif_coast
1. Location (lon / lat) inside continent
2. Location (lon / lat) = ISO3 code
Rainbio database Quality flags geographical coordinates
1. Check if the geographical coordinates are on the African continent
verif_coast
Number of observations
%
COORDINATES OUT OF RANGE 1895 0.2%
NO COORDINATES 161167 18.9%
OK 689757 80.7%
ERROR 2017 0.2%
Location of geographical coordinates
Rainbio database Quality flags geographical coordinates
2. Compare ISO3 country code of observation with ISO3 of geographical coordinates (Lon/Lat)
ISO3: GAB
verif_iso3
Number of observations
%
COORDINATES OUT OF RANGE 1895 0.2%
NO COORDINATES 161167 18.9%
ISO3 NOT IN AFRICA 1 0.0%
OK 684773 80.1%
NEIGHBOUR 6213 0.7%
ERROR 786 0.1%
OK: ISO3 Lon/Lat = ISO3 observation
Location of geographical coordinates
ISO3 : XXX ISO3 country code of observation
Rainbio database Quality flags geographical coordinates
2. Compare ISO3 country code of observation with ISO3 of geographical coordinates (Lon/Lat)
verif_iso3
Number of observations
%
COORDINATES OUT OF RANGE 1895 0.2%
NO COORDINATES 161167 18.9%
ISO3 NOT IN AFRICA 1 0.0%
OK 684773 80.1%
NEIGHBOUR 6213 0.7%
ERROR 786 0.1%
NEIGHBOUR: ISO3 Lon/Lat = ISO3 observation neighboring country
Draw a line from the location of the geographical coordinates
to the nearest boundary of the neighboring country
Location of geographical coordinates
ISO3 : XXX ISO3 country code of observation
ISO3: COG
Rainbio database Quality flags geographical coordinates
ISO3: COG
ISO3: GNQ
verif_iso3
Number of observations
%
COORDINATES OUT OF RANGE 1895 0.2%
NO COORDINATES 161167 18.9%
ISO3 NOT IN AFRICA 1 0.0%
OK 684773 80.1%
NEIGHBOUR 6213 0.7%
ERROR 786 0.1%
NEIGHBOUR: ISO3 Lon/Lat = ISO3 observation neighboring country
Draw a line from the location of the geographical coordinates
to the nearest boundary of the neighboring country
Location of geographical coordinates
ISO3 : XXX ISO3 country code of observation
ERROR: ISO3 Lon/Lat โ‰  ISO3 observation neighboring country
Draw a line from the location of the geographical coordinates
to the centroid of the country associated with the observation
2. Compare ISO3 country code of observation with ISO3 of geographical coordinates (Lon/Lat)
Rainbio database Quality flags geographical coordinates
2. Shapefiles quality flag geographical coordinates
AO BK BS DH JW
GD
KW MS OH TS VD
UB
Name: error_tab_data_verif_iso3=error_verif_coast=error.shp
error_tab_data_verif_iso3=error_verif_coast=ok.shp
Location : database_csv/shp
Encoding : UTF8
Verif_iso3 = ERROR and verif_coast = ERROR or OK
verif_iso3 verif_coast
Number of observations
After data integration After elimination of duplicates
% difference
COORDINATES OUT OF RANGE COORDINATES OUT OF RANGE 1895 0.2% -35 1860 0.2%
NO COORDINATES NO COORDINATES 161167 18.9% -17959 143208 18.4%
ISO3 NOT IN AFRICA OK 1 0.0% 0 1 0.0%
OK OK 683236 80.1% -54265 628971 80.6%
ERROR OK 357 0.0% -38 319 0.0%
ERROR ERROR 429 0.1% -28 401 0.1%
FOLLOW UP: exclude them or correct themโ€ฆ
Rainbio database Quality flags geographical coordinates
2. Shapefiles quality flag geographical coordinates
AO BK BS DH JW
GD
KW MS OH TS VD
UB
Name: error_tab_data_verif_iso3=neighbour_verif_coast=error.shp
error_tab_data_verif_iso3=neighbour_verif_coast=ok.shp
Location : database_csv/shp
Encoding : UTF8
FOLLOW UP: use distance as a filter, exclude them or
correct themโ€ฆ
Verif_iso3 = NEIGHBOUR and verif_coast = ERROR or OK
verif_iso3 verif_coast
Number of observations
After data integration After elimination of duplicates
% difference
COORDINATES OUT OF RANGE COORDINATES OUT OF RANGE 1895 0.2% -35 1860 0.2%
NO COORDINATES NO COORDINATES 161167 18.9% -17959 143208 18.4%
ISO3 NOT IN AFRICA OK 1 0.0% 0 1 0.0%
OK OK 683236 80.1% -54265 628971 80.6%
ERROR OK 357 0.0% -38 319 0.0%
ERROR ERROR 429 0.1% -28 401 0.1%
NEIGHBOUR OK 6163 0.7% -751 5412 0.7%
NEIGHBOUR ERROR 50 0.0% -4 46 0.0%
Rainbio database Quality flags geographical coordinates
2. Shapefiles quality flag geographical coordinates
AO BK BS DH JW
GD
KW MS OH TS VD
UB
Name: error_tab_data_verif_iso3=ok_verif_coast=error.shp
Location : database_csv/shp
Encoding : UTF8
FOLLOW UP: use distance as a filter, exclude them or
correct themโ€ฆ
Verif_iso3 = OK and verif_coast = ERROR
verif_iso3 verif_coast
Number of observations
After data integration After elimination of duplicates
% difference
COORDINATES OUT OF RANGE COORDINATES OUT OF RANGE 1895 0.2% -35 1860 0.2%
NO COORDINATES NO COORDINATES 161167 18.9% -17959 143208 18.4%
ISO3 NOT IN AFRICA OK 1 0.0% 0 1 0.0%
OK OK 683236 80.1% -54265 628971 80.6%
ERROR OK 357 0.0% -38 319 0.0%
ERROR ERROR 429 0.1% -28 401 0.1%
NEIGHBOUR OK 6163 0.7% -751 5412 0.7%
NEIGHBOUR ERROR 50 0.0% -4 46 0.0%
OK ERROR 1537 0.2% -85 1452 0.2%
Rainbio database Duplicates
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
Janโ€™s auteurs
reference
table
+
โ€œNEWโ€
names
Janโ€™s taxonomy
reference
table
+
โ€œSpecial casesโ€
Barbara
Quality flag
taxonomy
Quality flag
geographical
coordinates
Observations
Duplicate
OK
Identification of duplicates
1. Same observation
2. Location (lon / lat) =
ISO3 code
Rainbio database Duplicates
ISO3 country code
surname main collector
prefix
collection number
year of collect
suffix
Unique observation
Columns unique observation (6 columns)
Rainbio profile (32 columns)
What is an unique observation ?
Same surname of main collector +
Same year of collect +
Same ISO3 country code +
Same prefix + collection number + suffix
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
colnam
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
DETERMINATION
INFORMATION
LOCATION
INFORMATION
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
TAXON
INFORMATION
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
colnam
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
TAXON
INFORMATION
DETERMINATION
INFORMATION
LOCATION
INFORMATION
Rainbio database Duplicates
ISO3 country code
surname main collector
prefix
collection number
year of collect
suffix
Unique observation
identify
identical lines
Rank identical lines
within a grading system
Export as
csv file
Keep the best line and
remove the others
remove one
column
Final stage 6 columns
Initial stage 32 columns
Location : database_csv/duplicates
Encoding : UTF8
First iteration : intra-child database
Second iteration : Rainbio database
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
colnam
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
DETERMINATION
INFORMATION
LOCATION
INFORMATION
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
TAXON
INFORMATION
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
colnam
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
TAXON
INFORMATION
DETERMINATION
INFORMATION
LOCATION
INFORMATION
Rainbio database Duplicates
Location : database_csv/duplicates/duplicates.xlsx
IDENTIFY DUPLICATES SKIP
COLUMN STEP1 STEP2 STEP3 STEP4 STEP5 STEP6 STEP7 STEP8 STEP9 STEP10 STEP11 STEP12 STEP13 STEP14 STEP15 STEP16 STEP17 STEP18 STEP19 STEP20 STEP21 STEP22 STEP23 STEP40 STEP41 STEP42 STEP43
fam
gen
tax
detok VALUE
detnam VALUE VALUE VALUE VALUE
dety
detm
detd
iso3
country NBR CHAR
maj_area VALUE
loc_notes VALUE VALUE VALUE VALUE VALUE VALUE VALUE
ddlon
ddlat
accuracy VALUE
alt MIN
colnam SURNAME SURNAME SURNAME SURNAME SURNAME SURNAME CHECK SURNAME SURNAME SURNAME SURNAME SURNAME SURNAME
prefix VALUE
nbr
suffix VALUE DESC
colnamsup NBR CHAR
coly MAX
colm MAX VALUE
cold MAX VALUE VALUE
kind_col VALUE
dups VALUE
description VALUE
pheno_fl VALUE
pheno_fr VALUE
habitat VALUE
habit VALUE
CHECK COORDINATES IF LOCATION NAME IS DIFFERENT 16_01_CHECK 17_01_CHECK 18_01_CHECK 19_01_CHECK 20_01_CHECK
CHECK COORDINATES IF DISTANCE > 0 meters 16_02_CHECK 17_02_CHECK 18_02_CHECK 19_02_CHECK 20_02_CHECK 21_02_CHECK 22_02_CHECK 40_02_CHECK 41_02_CHECK 42_02_CHECK 43_02_CHECK
CHECK DATE OFCOLLECT IFMORETHAN ONE FULL DATE (YYY-MM-DD) IS AVAILABLE 18_03_CHECK 19_03_CHECK 20_03_CHECK 21_03_CHECK 22_03_CHECK 40_03_CHECK 41_03_CHECK 42_03_CHECK 43_03_CHECK
CHECK COLLECTOR NAME 16_04_CHECK 17_04_CHECK 18_04_CHECK 19_04_CHECK 20_04_CHECK 40_04_CHECK 41_04_CHECK 42_04_CHECK 43_04_CHECK
CHECK DETERMINATOR NAME 40_06_CHECK 41_06_CHECK 42_06_CHECK 43_06_CHECK
USE THE VERIFICATION TO BETTER THE STANDARDIZATION OFCOLLECTEUR NAMES 15_00_CHECK 21_00_CHECK 22_00_CHECK 23_00_CHECK 43_00_CHECK
DB RANK
DET
DB RANK
DET
SKIP
DB RANK
DET
DB RANK
DET
DB RANK
LOC
DB RANK
LOC
DB RANK
LOC
DB RANK
LOC
DB RANK
LOC
DB RANK
LOC
DB RANK
LOC
Detailed documentation Excel file
Rainbio database Duplicates
Location : database_csv/duplicates/
Name: dups_rb15_00.csv
Exemple CSV file step 15 in rainbio database
cdb: iteration intra-child database
rb: iteration rainbio database
15: step Excel file _00: all identified duplicates
Step 15
Sidwell, K.||Sidwell
We keep the first line
Line separator: ||
We remove all other lines
d_fkdb d_calc_accuracy d_alt d_colnam d_prefix d_nbr d_suffix d_colnamsup d_coldate
JW,MS 5||5 99999||99999 Sidwell, K.||Sidwell ###||### 165||165 ###||### ###||### 1992-10-24||1992-10-24
BK,MS 5||5 1248||1248 Dubois, L.||Dubois, J. ###||### 1494||1494 ###||### ###||### 1949-09-01||1949-09-01
BS,MS 5||5 99999||99999 Savory, H.J.||Savory, L. FHI||FHI 25148||25148 ###||### Keay, R.W.J.||### 1948-12-24||1948-12-24
BS,MS 5||5 99999||99999 Savory, H.J.||Savory, L. FHI||FHI 25138||25138 ###||### Keay, R.W.J.||### 1948-12-23||1948-12-23
JW,MS 3||3 99999||99999 Muller, T.||Muller ###||### 1981||1981 ###||### Pope, G.V.; Russell, E.||### 1971-12-19||1971-12-19
JW,MS 4||4 99999||99999 Thompson, S.A.||Thompson ###||### 1626||1626 ###||### Rawlins, J.E.||### 1984-07-16||1984-07-16
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
colnam
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
TAXON
INFORMATION
DETERMINATION
INFORMATION
LOCATION
INFORMATION
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
Rainbio database Duplicates
How to rank identical lines within a grading system in order to keep the best line
Step 16 name of location
Step 17 geographical coordinates
1 OK
2 NEIGHBOUR
3 ISO3 NOT IN AFRICA
4 ERROR
5 COORDINATES OUT OF RANGE
6 LATITUDE MISSING
7 NO COORDINATES
1 VD
2 AO
3 BK
3 BS
3 DH
3 GD
3 JW
3 KW
3 MS
3 OH
3 TS
3 UB
8
7
6
5
4
3
2
1
3
CALCULATED
ACCURACY
CODE
RANKING
1
ISO3
VERIFICATION
RANKING
2
CHILD
DATABASE
LOCATION
RANKING
Step 16 and 17
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
SURNAME
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
LOCATION
INFORMATION
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
TAXON
INFORMATION
DETERMINATION
INFORMATION
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
SURNAME
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
TAXON
INFORMATION
DETERMINATION
INFORMATION
LOCATION
INFORMATION
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
Rainbio database Duplicates
How to rank identical lines within a grading system in order to keep the best line
Step 40 Most recent determination
Step 41 genus identical
Step 42 family identical
Step 16 and 17
3
TAXONOMY
RANK
RANKING
0 F|F
1 VAR|ร—
2 VAR|CV
3 VAR|F
4 VAR|UNKNOWN
5 SUBVAR
6 SUBSP|ร—
7 SUBSP|F
8 SUBSP|VAR
9 ร—
10X
11CV
12CVGR
13F
14VAR
15SUBSP
16ESP
17GEN
18FAM
19UNKNOWN
20UNKNOWN|UNKNOWN
4
VALID NAME SOURCE
RANK
1 JAN REFERENCE TABLE
2 CHILD DATABASE
1
CHILD
DATABASE
TAX
RANKING
1 VD
2 AO
3 BK
4 DH
5 BS
5 GD
5 JW
5 KW
5 MS
5 OH
5 TS
5 UB
2
DETERMINATION
DATE
RANKING
1 most recent
78
37
3096
16
68
3295
55
9
33
2
872
17
27
4
1019
2130
7
24
85
489
17
39
3
664
687
200
73
67
5
2878
36
19
4
3282
5570
517
1
1
241
3
1634
322
448
3167
35878
4
125
3
56
68
387
7024
4
326
3
32
179
3
603
517
4
78
194
882
AO
BK
BS
DH
GD
JW
KW
MS
OH
TS
UB
VD
ALL CHILDS
RAINBIO
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Steps involved to eliminate duplicates
all columns descriptive information
location name geographical coordinates
collection date name of determiner or date of determination
family, genus identical, taxon name different family identical, genus and taxon name different
exlude family, genus and taxon name different
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
colnam
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
LOCATION
INFORMATION
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
TAXON
INFORMATION
DETERMINATION
INFORMATION
Rainbio database Duplicates
Rainbio iteration : 57863 lines
Intra-child databases iteration : 13387 lines
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
colnam
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
LOCATION
INFORMATION
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
TAXON
INFORMATION
DETERMINATION
INFORMATION
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
SURNAME
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
TAXON
INFORMATION
DETERMINATION
INFORMATION
LOCATION
INFORMATION
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
Rainbio database Duplicates
STEP 43 determination different on family level
Step 43
File: duplicates/dups_rb43_00_CHECK.csv
1783 cases in rainbio iteration
File: duplicates/dups_cdb43_00_CHECK.csv
882 cases in intra-child database iteration
FOLLOW UP: correct them or use column c_step02 = โ€˜%43%โ€™ to exclude
the observations
Rainbio database Duplicates
Duplicates with different geographical coordinates
Location of geographical coordinates
Bounding box
Example
IDC ISO3 LOC_NOTES DDLON DDLAT COLNAM PREFIX NBR SUFFIX COLY VERIF_ISO3 VERIF_COAS TAX_TAX
TS100579839 CMR Mefou Proposed National Park. Mefou National Park, Ndanen 2. 9.61889 4.99 Etuge, M. 5271 2004 OK OK Tylophora conspicua
KWK000199925 CMR Mefou National Park, Ndanen 2. 11.58 3.63 Etuge, M. 5271 2004 OK OK Tylophora
JW1056921 CMR Mefou National Park, Ndanen 2. 11.5833 3.61667 Etuge, M. 5271 2004 OK OK Tylophora conspicua
64828 (7.6%) observations having 69367 duplicates
59797 (92%) of duplicated observations are georeferenced
Rainbio database :
Total: 854836 observations
Coordinates from different sources for 53163 (89%)
georeferenced duplicated observations
Coordinates are different for 15397 observations (29%)
Exemple: Mefou National Park, Ndanen 2
Rainbio database Duplicates
Bounding boxes of duplicated observations with coordinates from different sources
Bounding box
ALL
Rainbio database Duplicates
Bounding boxes of duplicated observations with coordinates from different sources
Bounding box
After duplicates iterations and ranking (quality flag location)
ALL
Coordinates are different but at least they are in the same country
Rainbio database Duplicates
Bounding boxes of duplicated observations with coordinates from different sources
Bounding box
After duplicates iterations and ranking (quality flag location)
After elimination bbox width < 10km and bbox height < 10km
ALL
FOLLOW UP: โ€ฆ
use column d_x_meters and d_y_meters < 10000 as filter
Rainbio database Duplicates
Duplicates with different geographical coordinates
Location of geographical coordinates
Bounding box
Example
Exemple: Mefou National Park, Ndanen 2
Current location in Rainbio database
Current location in Rainbio database is wrong !!
FOLLOW UP: Change the database ranking in order to select
the โ€œbestโ€ location
IDC ISO3 LOC_NOTES DDLON DDLAT COLNAM PREFIX NBR SUFFIX COLY VERIF_ISO3 VERIF_COAS TAX_TAX
TS100579839 CMR Mefou Proposed National Park. Mefou National Park, Ndanen 2. 9.61889 4.99 Etuge, M. 5271 2004 OK OK Tylophora conspicua
KWK000199925 CMR Mefou National Park, Ndanen 2. 11.58 3.63 Etuge, M. 5271 2004 OK OK Tylophora
JW1056921 CMR Mefou National Park, Ndanen 2. 11.5833 3.61667 Etuge, M. 5271 2004 OK OK Tylophora conspicua
Rainbio database Quality flags geographical coordinates
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
Janโ€™s auteurs
reference
table
+
โ€œNEWโ€
names
Janโ€™s taxonomy
reference
table
+
โ€œSpecial casesโ€
Barbara
Quality flag
taxonomy
Quality flag
geographical
coordinates
Observations
Duplicate
OK
EXPORT OF CSV FILES
RAINBIO
DATABASE
VALUES
OUT OF RANGE
STEPS
DUPLICATES
PROBLEMS
TAXONOMY LINK
SHAPEFILES
LOCATION ERRORS
NEW VERSION
EXPORT CSV FILE
STANDARDIZATION
COLLECTEORS
Rainbio database Rainbio CSV file
Name: utf8_rainbio_vXX.csv
Location : database_csv/db_csv
First series of columns :
Second series of columns :
Rainbio profile + quality flags (47 columns)
Columns from taxonomy reference table starting with โ€œtax_โ€ (16 columns)
Values from Janโ€™s table if we have the link
otherwise values from the child database
Third series of columns : Identification of duplicates starting with โ€œd_โ€ (54 columns)
NULL if we do not have any duplicated valus
- If the values in the duplicated records are identical
XX||YY values from duplicated records if they are different
Fourth series of columns : Identification of duplicates starting with โ€œc_โ€ (6 columns)
Steps to identify duplicates without elimination
Exemple : step 43
observations with duplicates but determination on family level is different,
just flag them but do not treat them as a duplicated record.
562683
51442
19981
4732
119086 66304
sp.
gen.
fam.
no determination
0 100000 200000 300000 400000 500000 600000 700000 800000
Number of observations
Number of observations by determination rank
location OK no coordinates
location error location error duplicates
difference determination family level duplicates
Rainbio database Version 6
Taxonomy
Rank All Location OK Others
family 383 359 24
genus 4520 3750 770
sp. 34116 26259 7857
Locations All Location OK Others *
Locations 179349 104121 75228
* iso3, main collector, year of collect, location notes
Locations
Rainbio database Version 6
7857
6790
2728
1780
1334
1061
2474
0
1000
2000
3000
4000
5000
6000
7000
8000
0 5 10 15 20 25 30 35 40 45 50
Number
of
species
Number of georeferenced locations
Number of species by number of georeferenced locations

More Related Content

Similar to Zaiss rainbio database

Stats Final-Traffic Fatalities
Stats Final-Traffic FatalitiesStats Final-Traffic Fatalities
Stats Final-Traffic FatalitiesBetty Trinh
ย 
SOHO Package Analysis
SOHO Package AnalysisSOHO Package Analysis
SOHO Package AnalysisMoaaz Khan Burki
ย 
Statistics about potato production in Turkey 1.pdf
Statistics about potato production in Turkey 1.pdfStatistics about potato production in Turkey 1.pdf
Statistics about potato production in Turkey 1.pdfALADDIN PERTANIAN INTERNASIONAL
ย 
CorrelationMIDWPV
CorrelationMIDWPVCorrelationMIDWPV
CorrelationMIDWPVRichard Maida?
ย 
Alfredo R. Galassi - The Euro CTO Club: The Registry
Alfredo R. Galassi - The Euro CTO Club: The RegistryAlfredo R. Galassi - The Euro CTO Club: The Registry
Alfredo R. Galassi - The Euro CTO Club: The RegistryEuro CTO Club
ย 
Monthly Review April-2023.pptx
Monthly Review April-2023.pptxMonthly Review April-2023.pptx
Monthly Review April-2023.pptxVIVEKSRIVASTAVA691356
ย 
UTP Product Range Rev 04
UTP Product Range Rev 04UTP Product Range Rev 04
UTP Product Range Rev 04Joel Johnson
ย 
Reshaping Data in R
Reshaping Data in RReshaping Data in R
Reshaping Data in RJeffrey Breen
ย 
Asignatura
AsignaturaAsignatura
AsignaturaKity Cano
ย 
Asignatura
AsignaturaAsignatura
AsignaturaKity Cano
ย 
DOC-20221114-WA0004..pdf
DOC-20221114-WA0004..pdfDOC-20221114-WA0004..pdf
DOC-20221114-WA0004..pdfpankhimeena
ย 
Amit Monthly Review Template.pptx
Amit Monthly Review Template.pptxAmit Monthly Review Template.pptx
Amit Monthly Review Template.pptxVIVEKSRIVASTAVA691356
ย 
Number of elements 100Randomly generated elements 791 56 44 .docx
Number of elements 100Randomly generated elements 791 56 44 .docxNumber of elements 100Randomly generated elements 791 56 44 .docx
Number of elements 100Randomly generated elements 791 56 44 .docxvannagoforth
ย 
Konversi toefl ac_ept
Konversi toefl ac_eptKonversi toefl ac_ept
Konversi toefl ac_eptqwlasting
ย 
Mongolian Stock Exchange on Feb. 28, 2018
Mongolian Stock Exchange on Feb. 28, 2018Mongolian Stock Exchange on Feb. 28, 2018
Mongolian Stock Exchange on Feb. 28, 2018Pascal Vinais
ย 
Wbjeem 2014 statistics
Wbjeem 2014 statisticsWbjeem 2014 statistics
Wbjeem 2014 statisticsAnushri Kocher
ย 
04.3 heterogeneous debt portfolios
04.3   heterogeneous debt portfolios04.3   heterogeneous debt portfolios
04.3 heterogeneous debt portfolioscrmbasel
ย 

Similar to Zaiss rainbio database (20)

Farm Area Census Analysis Sample
Farm Area Census Analysis SampleFarm Area Census Analysis Sample
Farm Area Census Analysis Sample
ย 
Stats Final-Traffic Fatalities
Stats Final-Traffic FatalitiesStats Final-Traffic Fatalities
Stats Final-Traffic Fatalities
ย 
SOHO Package Analysis
SOHO Package AnalysisSOHO Package Analysis
SOHO Package Analysis
ย 
Statistics about potato production in Turkey 1.pdf
Statistics about potato production in Turkey 1.pdfStatistics about potato production in Turkey 1.pdf
Statistics about potato production in Turkey 1.pdf
ย 
CorrelationMIDWPV
CorrelationMIDWPVCorrelationMIDWPV
CorrelationMIDWPV
ย 
Alfredo R. Galassi - The Euro CTO Club: The Registry
Alfredo R. Galassi - The Euro CTO Club: The RegistryAlfredo R. Galassi - The Euro CTO Club: The Registry
Alfredo R. Galassi - The Euro CTO Club: The Registry
ย 
Monthly Review April-2023.pptx
Monthly Review April-2023.pptxMonthly Review April-2023.pptx
Monthly Review April-2023.pptx
ย 
UTP Product Range Rev 04
UTP Product Range Rev 04UTP Product Range Rev 04
UTP Product Range Rev 04
ย 
Reshaping Data in R
Reshaping Data in RReshaping Data in R
Reshaping Data in R
ย 
OFC BUDGET.V5
OFC BUDGET.V5OFC BUDGET.V5
OFC BUDGET.V5
ย 
Asignatura
AsignaturaAsignatura
Asignatura
ย 
Asignatura
AsignaturaAsignatura
Asignatura
ย 
Correlation and volatility DJ EuroStoxx50 Oct 2014
Correlation and volatility DJ EuroStoxx50 Oct 2014Correlation and volatility DJ EuroStoxx50 Oct 2014
Correlation and volatility DJ EuroStoxx50 Oct 2014
ย 
DOC-20221114-WA0004..pdf
DOC-20221114-WA0004..pdfDOC-20221114-WA0004..pdf
DOC-20221114-WA0004..pdf
ย 
Amit Monthly Review Template.pptx
Amit Monthly Review Template.pptxAmit Monthly Review Template.pptx
Amit Monthly Review Template.pptx
ย 
Number of elements 100Randomly generated elements 791 56 44 .docx
Number of elements 100Randomly generated elements 791 56 44 .docxNumber of elements 100Randomly generated elements 791 56 44 .docx
Number of elements 100Randomly generated elements 791 56 44 .docx
ย 
Konversi toefl ac_ept
Konversi toefl ac_eptKonversi toefl ac_ept
Konversi toefl ac_ept
ย 
Mongolian Stock Exchange on Feb. 28, 2018
Mongolian Stock Exchange on Feb. 28, 2018Mongolian Stock Exchange on Feb. 28, 2018
Mongolian Stock Exchange on Feb. 28, 2018
ย 
Wbjeem 2014 statistics
Wbjeem 2014 statisticsWbjeem 2014 statistics
Wbjeem 2014 statistics
ย 
04.3 heterogeneous debt portfolios
04.3   heterogeneous debt portfolios04.3   heterogeneous debt portfolios
04.3 heterogeneous debt portfolios
ย 

More from RainerZaiss

IRD FRAME Uganda
IRD FRAME UgandaIRD FRAME Uganda
IRD FRAME UgandaRainerZaiss
ย 
Pl@ntghats IFP
Pl@ntghats IFPPl@ntghats IFP
Pl@ntghats IFPRainerZaiss
ย 
Faunafri svg open
Faunafri svg openFaunafri svg open
Faunafri svg openRainerZaiss
ย 
Faunafri Santiago
Faunafri SantiagoFaunafri Santiago
Faunafri SantiagoRainerZaiss
ย 
AMAP geoportail
AMAP geoportailAMAP geoportail
AMAP geoportailRainerZaiss
ย 
Ceba geoportail
Ceba geoportailCeba geoportail
Ceba geoportailRainerZaiss
ย 
SPECIFY - un nouveau systรจme de gestion pour lโ€™Herbier de Nouvelle-Calรฉdonie ...
SPECIFY - un nouveau systรจme de gestion pour lโ€™Herbier de Nouvelle-Calรฉdonie ...SPECIFY - un nouveau systรจme de gestion pour lโ€™Herbier de Nouvelle-Calรฉdonie ...
SPECIFY - un nouveau systรจme de gestion pour lโ€™Herbier de Nouvelle-Calรฉdonie ...RainerZaiss
ย 

More from RainerZaiss (7)

IRD FRAME Uganda
IRD FRAME UgandaIRD FRAME Uganda
IRD FRAME Uganda
ย 
Pl@ntghats IFP
Pl@ntghats IFPPl@ntghats IFP
Pl@ntghats IFP
ย 
Faunafri svg open
Faunafri svg openFaunafri svg open
Faunafri svg open
ย 
Faunafri Santiago
Faunafri SantiagoFaunafri Santiago
Faunafri Santiago
ย 
AMAP geoportail
AMAP geoportailAMAP geoportail
AMAP geoportail
ย 
Ceba geoportail
Ceba geoportailCeba geoportail
Ceba geoportail
ย 
SPECIFY - un nouveau systรจme de gestion pour lโ€™Herbier de Nouvelle-Calรฉdonie ...
SPECIFY - un nouveau systรจme de gestion pour lโ€™Herbier de Nouvelle-Calรฉdonie ...SPECIFY - un nouveau systรจme de gestion pour lโ€™Herbier de Nouvelle-Calรฉdonie ...
SPECIFY - un nouveau systรจme de gestion pour lโ€™Herbier de Nouvelle-Calรฉdonie ...
ย 

Recently uploaded

High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
ย 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
ย 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
ย 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
ย 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
ย 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...HyderabadDolls
ย 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridihmeghakumariji156
ย 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
ย 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
ย 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
ย 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
ย 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
ย 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
ย 
Vadodara ๐Ÿ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara ๐Ÿ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara ๐Ÿ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara ๐Ÿ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
ย 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
ย 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
ย 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
ย 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
ย 
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
ย 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
ย 

Recently uploaded (20)

High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
ย 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
ย 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
ย 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
ย 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
ย 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
ย 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
ย 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
ย 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ย 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
ย 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
ย 
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
ย 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
ย 
Vadodara ๐Ÿ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara ๐Ÿ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara ๐Ÿ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara ๐Ÿ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book now
ย 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
ย 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
ย 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
ย 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
ย 
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | โ‚น,9500 Pay Cash 8005736733 Free Home...
ย 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
ย 

Zaiss rainbio database

  • 1. Rainbio workflow to build up the database Rainer Zaiss IRD AMAP Rainbio meeting Aix-en-Provence 18-22 may 2015
  • 2. Rainbio database Child databases 5567 1596 4529 3565 12874 449509 55919 94829 14510 147504 62380 2054 AO BK BS DH GD JW KW MS OH TS UB VD 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 Number of observations - child databases Total: 854 836 observations ID DATABASE Owner AO Anne Black-Overgraad BK Barbara McKinder BS Bonaventure Sonkรฉ DH David Harris GD Gilles Dauby JW Jan Wieringa KW KEW GBIF MS Marc Sosef OH Olivier Hardy TS Tariq Stevart UB UB VD Vincent Droissart Child databases Rainbio database: unique ID of an observation (line) ID DATABASE + unique ID of observation in child database Exemples: BK1239, MSBR0000008978882BGB310957, JW64476
  • 3. Rainbio database RAINBIO profile Rainbio profile TYPE COLUMN TYPE EXEMPLES UPDATE DESCRIPTION UNIQUE IDS TRACKING OF DUPLICATES idrb integer serial AUTO Internal unique ID of the observation in Rainbio database idc character GD000001 Unique identifier of record in child dataset idsc character JW1234545,BK12334 SCRIPT Duplicates tax character AUTO Taxon name (fam,gen+esp+rank01+nam01+rank02+gen02) TAXON INFORMATION fam character Family to which specimen belongs gen character Genus to which specimen belongs esp character Epithet of species to which specimen belongs rank01 character var., subsp., โ€ฆ Rank of the first infrataxonomic name nam01 character First infrataxonomic name rank02 character var., subsp., โ€ฆ Rank of the second infrataxonomic name nam02 character Second infrataxonomic name DETERMINATION INFORMATION detok character D, OK Determination status of the specimen: D: doubtful; OK: no doubt indicated detnam character Name of the identifier dety integer, Year of the identification detm integer, Month of the identification detd integer, Day of the identification LOCATION INFORMATION iso3 character AUTO ISO3 country code country character Country where the specimen was collected iso3lonlat character AUTO ISO3 country code from geographical join with country layer maj_area character Major area of the country where the specimen was collected loc_notes character Locality notes telling where the specimen was collected ddlon real, Longitude at which the specimen was collected in DD ddlat real, Latitude at which the specimen was collected in DD accuracy integer, 0,1, โ€ฆ, 8 BRAHMS code, Accuracy of the georeference of the specimen alt integer, Altitude of the specimen in meters Rainbio profile Other columns
  • 4. Rainbio database RAINBIO profile Rainbio profile TYPE COLUMN TYPE EXEMPLES UPDATE DESCRIPTION DETERMINATION INFORMATION colnam character Collector name including initials prefix character Prefix of the number the collector gave to the specimen nbr integer, Number the collector gave to the specimen suffix character Suffix of the number the collector gave to the specimen colnamsup character Additional collectors associated with the specimen coly integer, Year when the specimen was collected colm integer, Month when the specimen was collected cold integer, Day when the specimen was collected DESCRIPTIVE INFORMATION kind_col character Herb, Sili, Observation, Plot_data Kind of collection: Herb: herbarium voucher; sili: silica gel voucher; etc dups character Acronyms of herbaria were duplicates are found (komma-delimited) description character Description of the specimen in the field pheno_fl character st, Fl, Fr Phenological state of the specimen: st: sterile; Fl: flowering; Fr: fruiting pheno_fr character yes, no habitat character Habitat in which specimen was collected habit character Tr, Li, Sh, He, Ep Habit of the specimen: Tr: tree; Li: liana; Sh: shrub; He: herb; Ep: epiphyte QUALITY FLAGS verif_iso3 character AUTO Code verification ISO3 country code (column iso3 and iso3lonlat) verif_coast character AUTO Code verification coastline verif_distance integer, AUTO Distance in meters fktax integer, AUTO ID taxon name Jan reference table verif_fktax character AUTO Code verification taxon name calc_accuracy integer, AUTO BRAHMS code, accuracy of the georeference of the specimen calculated from ddlon and ddlat Rainbio profile Other columns
  • 5. Rainbio database Data integration AO BK BS DH GD JW KW MS OH TS UB VD RAINBIO DATABASE
  • 6. Rainbio database Data integration Number of empty cells and non-empty cells by child databases TYPE COLUMNS value AO BK BS DH GD JW KW MS OH TS UB VD RAINBIO % TAXON INFORMATION fam OK 5567 1596 4529 3565 12874 445828 55908 94829 14510 147503 60289 2054 849052 99% NULL 3681 11 1 2091 5784 1% gen OK 5567 1596 4529 3565 12874 436337 55513 94829 14508 141277 55613 2054 828262 97% NULL 13172 406 2 6227 6767 26574 3% esp OK 5380 1571 4512 3565 12874 399557 54075 94344 13744 120826 43315 2054 755817 88% NULL 187 25 17 49952 1844 485 766 26678 19065 99019 12% rank01 OK 533 186 18 31049 6863 12641 354 17644 2317 224 71829 8% NULL 5567 1596 3996 3379 12856 418460 49056 82188 14156 129860 60063 1830 783007 92% nam01 OK 533 186 18 31049 6863 12641 354 17644 2317 224 71829 8% NULL 5567 1596 3996 3379 12856 418460 49056 82188 14156 129860 60063 1830 783007 92% rank02 OK 13 18 1366 416 329 22 22 2186 0% NULL 5567 1596 4529 3552 12856 448143 55919 94413 14181 147482 62358 2054 852650 100% nam02 OK 13 18 1366 416 329 22 22 2186 0% NULL 5567 1596 4529 3552 12856 448143 55919 94413 14181 147482 62358 2054 852650 100% DETERMINATION INFORMATION detok OK 5567 1587 4529 2 12874 1791 94829 90 1618 2054 124941 15% NULL 9 3563 447718 55919 14420 147504 60762 729895 85% detnam OK 2372 713 2019 253070 31284 37886 5890 94508 52674 2054 482470 56% NULL 3195 883 4529 1546 12874 196439 24635 56943 8620 52996 9706 372366 44% dety OK 1334 571 1924 238194 6449 32162 78330 52896 411860 48% NULL 4233 1025 4529 1641 12874 211315 49470 62667 14510 69174 9484 2054 442976 52% detm OK 94 412 1112 122547 6449 13640 52911 197165 23% NULL 5473 1184 4529 2453 12874 326962 49470 81189 14510 147504 9469 2054 657671 77% detd OK 86 56 1103 64906 6449 5328 52911 130839 15% NULL 5481 1540 4529 2462 12874 384603 49470 89501 14510 147504 9469 2054 723997 85%
  • 7. Rainbio database Data integration Number of empty cells and non-empty cells by child databases TYPE COLUMNS value AO BK BS DH GD JW KW MS OH TS UB VD RAINBIO % LOCATION INFORMATION country OK 5507 1587 4529 3565 12874 449509 55919 94829 14510 147503 62380 2054 854766 100% NULL 60 9 1 70 0% maj_area OK 1751 445 1486 12874 342314 55819 47874 130551 14432 607546 71% NULL 3816 1151 3043 3565 107195 100 46955 14510 16953 47948 2054 247290 29% loc_notes OK 4493 1500 4490 12874 439779 41135 85639 13358 143388 61965 2035 810656 95% NULL 1074 96 39 3565 9730 14784 9190 1152 4116 415 19 44180 5% ddlon OK 4440 1352 4524 3565 12874 359934 55919 60054 14510 129244 45284 2054 693754 81% NULL 1127 244 5 89575 34775 18260 17096 161082 19% ddlat OK 4440 1351 4524 3565 12874 359934 55919 60054 14510 129244 45284 2054 693753 81% NULL 1127 245 5 89575 34775 18260 17096 161083 19% accuracy OK 5567 1596 4529 3516 12874 294256 55919 7593 10 21860 2054 409774 48% NULL 49 155253 87236 14500 147504 40520 445062 52% COLLECTION INFORMATION colnam OK 5023 1596 4529 3565 12874 449482 55919 94829 14510 147503 62352 2054 854236 100% NULL 544 27 1 28 600 0% prefix OK 1313 216 136 11 12873 38015 3122 6481 14510 5718 7265 72 89732 10% NULL 4254 1380 4393 3554 1 411494 52797 88348 141786 55115 1982 765104 90% nbr OK 4639 1491 4502 3558 12868 436782 54716 89608 14468 146630 61750 1793 832805 97% NULL 928 105 27 7 6 12727 1203 5221 42 874 630 261 22031 3% suffix OK 116 156 95 53 8152 1197 2335 41 2165 6220 13 20543 2% NULL 5451 1440 4434 3512 12874 441357 54722 92494 14469 145339 56160 2041 834293 98% colnamsup OK 894 280 1244 2044 12511 151515 7803 29 52 47900 41997 1013 267282 31% NULL 4673 1316 3285 1521 363 297994 48116 94800 14458 99604 20383 1041 587554 69% coly OK 4589 1429 2026 3530 12874 435134 48764 81186 142901 60843 1775 795051 93% NULL 978 167 2503 35 14375 7155 13643 14510 4603 1537 279 59785 7% colm OK 2239 1340 1239 3519 12872 420762 48750 75935 140981 60848 1770 770255 90% NULL 3328 256 3290 46 2 28747 7169 18894 14510 6523 1532 284 84581 10% cold OK 1997 1119 1627 3515 12874 392536 48750 61978 135265 60848 1745 722254 84% NULL 3570 477 2902 50 56973 7169 32851 14510 12239 1532 309 132582 16%
  • 8. Rainbio database Data integration Number of empty cells and non-empty cells by child databases TYPE COLUMNS value AO BK BS DH GD JW KW MS OH TS UB VD RAINBIO % DESCRIPTIVE INFORMATION kind_col OK 5567 1596 4529 3565 12874 449509 94829 14510 62380 2054 651413 76% NULL 55919 147504 203423 24% dups OK 5208 1566 4497 449469 55919 39402 89931 645992 76% NULL 359 30 32 3565 12874 40 55427 14510 57573 62380 2054 208844 24% description OK 2265 983 249261 45027 33279 7866 118563 35700 1164 494108 58% NULL 3302 613 4529 3565 12874 200248 10892 61550 6644 28941 26680 890 360728 42% pheno_fl OK 802 1034 2305 8475 6836 21826 41278 5% NULL 4765 562 2224 3565 12874 449509 55919 86354 14510 140668 40554 2054 813558 95% pheno_fr OK 288 435 3893 1480 9822 15918 2% NULL 5279 1161 4529 3565 12874 449509 55919 90936 14510 146024 52558 2054 838918 98% habitat OK 839 716 1048 12874 242986 40738 44872 7743 4080 47375 403271 47% NULL 4728 880 3481 3565 206523 15181 49957 6767 143424 15005 2054 451565 53% habit OK 853 12874 13727 2% NULL 5567 743 4529 3565 449509 55919 94829 14510 147504 62380 2054 841109 98%
  • 9. Rainbio database Values out of range csv files AO BK BS DH GD JW KW MS OH TS UB VD RAINBIO DATABASE Values out of range 1. Skip, correct or leave them like they are โ€ฆ 2. Export as csv files
  • 10. Rainbio database Values out of range csv files 1. utf8_error_data_alt_8000_or_0_v05.csv Altitude out of range <0 OR >8000 (37 records) 2. utf8_error_data_coly_dety_v05.csv Year of collect < Year of determination (437 records) 3. utf8_error_data_coly_v05.csv Year of collect > 2014 or Year of collect < 1200 (10 records) 4. utf8_error_data_lonlat_v05.csv longitude = 0 and latitude= 0 or longitude >= 180 or longitude <= -180 or latitude >= 90 or latitude <= -90 (1895 records) FOLLOW UP: leave them like they are or โ€ฆ..
  • 11. Rainbio database Values out of range csv files Location : database_csv/errors Encoding : UTF8 You have to select Unicode (UTF8) to open the files in Excel. Otherwise you will have problems with the conversion of special characters like รฉ, ร , โ€ฆ Exemple If you do not select UTF8: Lรƒยฉon J. If you select UTF8: Lรฉon J.
  • 12. Rainbio database Standardization: collector names AO BK BS DH GD JW KW MS OH TS UB VD RAINBIO DATABASE Janโ€™s auteurs reference table + โ€œNEWโ€ names Collector names (column colnam and colnamsup) 1. Link collector names with reference table from Jan 2. Track changes in CSV file to better the standardization
  • 13. Rainbio database Standardization: collector names Link name of main collector with reference table from Jan Standards Column โ€œcolnamโ€ and โ€œcolnamsupโ€ 1. Format: Name, Initials Prefix Examples: Brenan, J.P.M., Forestor, H. de 2. Keep accented version Example: Bovรฉ, N. Column โ€œcolnamsupโ€ 3. Separator of names: ; Example: Arends, J.C.; Bruijn, J. de FOLLOW UP: leave them like they are or โ€ฆ.. Some strange names remain like 'Peterโ€˜, [ B.J. Brutt], Agric Department, โ€ฆ
  • 14. Rainbio database Standardization: collector names CSV file to better the standardization of collector names fkdb c_colnam colnam colnam_modify refcolnam iso3 type JW(3) Achomfo Nangasudo, NB Achomfo Nangasudo, N.B. Achomfo Nangasudo, N.B. Achomfo Nangasudo, N.B. GHA|JW(3) COLSUP MS(1)||BS(1) Achoundong G. Achoundong, G. Achoundong, G. Achoundong, G. CMR|MS(1)||CMR|BS(1) COL||COL JW(22) Achoundong, G Achoundong, G. Achoundong, G. Achoundong, G. CMR|JW(22) COL KW(18) Achoundong, G. Achoundong, G. Achoundong, G. Achoundong, G. CMR|KW(18) COL AO(2)||JW(1)||JW(1)||JW(1)||JW(1)||JW(13)||JW(2)||JW(3)||KW(1)||KW(1)||TS(1)||TS(44) G. Achoundong||Achoundong, G; Freddy & Enow||Zapfack, L; Achoundong, G; Onana, J-M; Elad, ME; Aggi; Ndumbe, P & Nguembock, F||Achoundong, G & Nana, Z||Achoun Achoundong, G. Achoundong, G. Achoundong, G. CMR|AO(2)||CMR|JW(1)||CMR|JW(1)||CMR|JW(1)||GIN|JW(1)|| COLSUP MS(211) Achten L.T. Achten, L.T.M. Achten, L.T.M. Achten, L.T.M. COD|MS(211) COL JW(22) Achten, LTM Achten, L.T.M. Achten, L.T.M. Achten, L.T.M. COD|JW(22) COL TS(2) L. Achten Achten, L.T.M. Achten, L.T.M. Achten, L.T.M. COD|TS(2) COL AO(2)||KW(2) Martin Achu||Achu, M. Achu, M. Achu, M. CMR|AO(2)||CMR|KW(2) COLSUP JW(1)||JW(1)||JW(2)||JW(5)||JW(5) Chouaibou, K; Toh, C; Biye, EH; Tadjouteu, F; Rheede, C van de; Iwanaka; Achu, PF & Garcia, J||Njie, F; Chouaibou, K; Gwellem Abula, J; Wanduku, D; Muma Ngu, N; Fomba, V Achu, P.F. Achu, P.F. Achu, P.F. CMR|JW(1)||CMR|JW(1)||CMR|JW(2)||CMR|JW(5)||CMR|JW(5) COLSUP JW(16)||JW(2)||JW(39) Mackinder, BA; Nana, V & Achuo, F||Achuo, F; Ndene, R & Okon, FI||Mackinder, BA; Nana, V; Achuo, F; Abwe, E & Morgan, B Achuo, F. Achuo, F. Achuo, F. CMR|JW(16)||CMR|JW(2)||CMR|JW(39) COLSUP MS(11) Acocks J.P.H. Acocks, J.P.H. Acocks, J.P.H. Acocks, J.P.H. NAM|MS(1),ZAF|MS(10) COL JW(406)||JW(1) Acocks, JPH Acocks, J.P.H. Acocks, J.P.H. Acocks, J.P.H. NAM|JW(4),ZAF|JW(402)||ZAF|JW(1) COL||COLSUP Name: utf8_error_data_colnam_colnamsup_vXX.csv (22620 lines) Location : database_csv/nams Encoding : UTF8 Initial name in child database Child database (number of lines) Current name in Rainbio database Name in Janโ€™s table Empty if not in Janโ€™s table COL: main col. COLSUP: add. Col. ISO3 country code + first column If you would like to help to better the standardization: 1. Do not touch the column in red 2. Just enter the name (Name, Initials Prefix) that you would like to retain in the column in green
  • 15. Rainbio database Link taxonomy reference table AO BK BS DH GD JW KW MS OH TS UB VD RAINBIO DATABASE Janโ€™s auteurs reference table + โ€œNEWโ€ names Janโ€™s taxonomy reference table + โ€œSpecial casesโ€ Barbara Link taxonomy reference table 1. Link taxon name of the child database to Janโ€™s taxonomy reference table to get the valid name 2. Export CSV files to track errors Quality flag taxonomy
  • 16. 46269 41993 Jan's reference table valid names all observations After data migration taxon names child databases 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 Number of taxa by taxonomy quality flag OK OK BARBARA TAXNAM NO MATCH MORE THAN ONE VALID IDTAX FOR TAXNAM FKTAX NO VALID NAME MATCH 42648 Rainbio database Link taxonomy reference table Link taxon name of child databases to Janโ€™s taxonomy reference table Taxonomy link quality flag
  • 17. Rainbio database Link taxonomy reference table 1. Taxonomy quality flag: TAXNAM NO MATCH Name: utf8_error_data_tax_match_ref_tab_tax_vXX Location : database_csv/tax_match Code child databases TAXNAM NO MATCH: JAN: names from Jan Current name in Rainbio database Columns to store the taxon name in the Rainbio database FOLLOW UP: leave them like they are or correct themโ€ฆ 1. Do not touch the column in red 2. Enter the correct information in the columns in green or enter the valid taxon id to create a special case The file holds all names in use in the Rainbio database from Janโ€™s reference table + 413 non matching names (TAXNAM NO MATCH) Valid ID from Jan verif_fktax fkdb idvalid tax gen esp rank01 nam01 rank02 nam02 JAN 13880 Acalypha indica JAN 38314 Acalypha integrifolia TAXNAM NO MATCH TS Acalypha integrifolia var. crateriana Acalypha integrifolia var. crateriana JAN 17536 Acalypha intermedia Taxon name of child database not in Janโ€™s taxonomy reference table
  • 18. Rainbio database Link taxonomy reference table 1. Taxonomy quality flag: FKTAX NO VALID NAME MATCH Name: utf8_error_ref_tax_link_idtax_valid_name_no_match_vXX Location : database_csv/tax_match 14 cases in Rainbio database Problem in Janโ€™s taxonomy reference Simple or more complicated loops in synonyms relationship idtax relation 7595 FKTAX FLAGGED AS HOMONYME 304123 SYN OF SYN OF ID: 304123,304127,304123||SYN OF SYN OF ID: 375298,304127,304123 313097 SYN OF SYN OF ID: 346134,49195,313097||SYN OF SYN OF ID: 346135,49195,313097 ID from Jan Relationship inside Janโ€™s table FOLLOW UP: leave them like they are, correct them in Janโ€™s taxonomy reference table or create some special casesโ€ฆ
  • 19. Rainbio database Link taxonomy reference table 1. Taxonomy quality flag: MORE THAN ONE VALID IDTAX FOR TAXNAM Name: utf8_error_ref_tax_link_idtax_tax_not_unique_vXX.csv Location : database_csv/tax_match 191 names in Rainbio database Problem in Janโ€™s taxonomy reference Same name at least twice in Janโ€™s reference table with different relationships FOLLOW UP: leave them like they are, correct them in Janโ€™s taxonomy reference table or create some special casesโ€ฆ Taxon name Relationship inside Janโ€™s table idtax tax relation idvalid 197 Schmidelia javensis f. genuinus SYN OF VALID: 77982,197||SYN OF SYN OF ID: 156194,230960,197 197 197 Schmidelia velutina SYN OF VALID: 78014,197||SYN OF VALID: 103535,103391 197 103391 Schmidelia velutina SYN OF VALID: 78014,197||SYN OF VALID: 103535,103391 197 ID that we use Valid ID from Jan
  • 20. Rainbio database Quality flags geographical coordinates AO BK BS DH GD JW KW MS OH TS UB VD RAINBIO DATABASE Janโ€™s auteurs reference table + โ€œNEWโ€ names Janโ€™s taxonomy reference table + โ€œSpecial casesโ€ Barbara Quality flag taxonomy Quality flag geographical coordinates Quality flags geographical coordinates Columns verif_iso3 and verif_coast 1. Location (lon / lat) inside continent 2. Location (lon / lat) = ISO3 code
  • 21. Rainbio database Quality flags geographical coordinates 1. Check if the geographical coordinates are on the African continent verif_coast Number of observations % COORDINATES OUT OF RANGE 1895 0.2% NO COORDINATES 161167 18.9% OK 689757 80.7% ERROR 2017 0.2% Location of geographical coordinates
  • 22. Rainbio database Quality flags geographical coordinates 2. Compare ISO3 country code of observation with ISO3 of geographical coordinates (Lon/Lat) ISO3: GAB verif_iso3 Number of observations % COORDINATES OUT OF RANGE 1895 0.2% NO COORDINATES 161167 18.9% ISO3 NOT IN AFRICA 1 0.0% OK 684773 80.1% NEIGHBOUR 6213 0.7% ERROR 786 0.1% OK: ISO3 Lon/Lat = ISO3 observation Location of geographical coordinates ISO3 : XXX ISO3 country code of observation
  • 23. Rainbio database Quality flags geographical coordinates 2. Compare ISO3 country code of observation with ISO3 of geographical coordinates (Lon/Lat) verif_iso3 Number of observations % COORDINATES OUT OF RANGE 1895 0.2% NO COORDINATES 161167 18.9% ISO3 NOT IN AFRICA 1 0.0% OK 684773 80.1% NEIGHBOUR 6213 0.7% ERROR 786 0.1% NEIGHBOUR: ISO3 Lon/Lat = ISO3 observation neighboring country Draw a line from the location of the geographical coordinates to the nearest boundary of the neighboring country Location of geographical coordinates ISO3 : XXX ISO3 country code of observation ISO3: COG
  • 24. Rainbio database Quality flags geographical coordinates ISO3: COG ISO3: GNQ verif_iso3 Number of observations % COORDINATES OUT OF RANGE 1895 0.2% NO COORDINATES 161167 18.9% ISO3 NOT IN AFRICA 1 0.0% OK 684773 80.1% NEIGHBOUR 6213 0.7% ERROR 786 0.1% NEIGHBOUR: ISO3 Lon/Lat = ISO3 observation neighboring country Draw a line from the location of the geographical coordinates to the nearest boundary of the neighboring country Location of geographical coordinates ISO3 : XXX ISO3 country code of observation ERROR: ISO3 Lon/Lat โ‰  ISO3 observation neighboring country Draw a line from the location of the geographical coordinates to the centroid of the country associated with the observation 2. Compare ISO3 country code of observation with ISO3 of geographical coordinates (Lon/Lat)
  • 25. Rainbio database Quality flags geographical coordinates 2. Shapefiles quality flag geographical coordinates AO BK BS DH JW GD KW MS OH TS VD UB Name: error_tab_data_verif_iso3=error_verif_coast=error.shp error_tab_data_verif_iso3=error_verif_coast=ok.shp Location : database_csv/shp Encoding : UTF8 Verif_iso3 = ERROR and verif_coast = ERROR or OK verif_iso3 verif_coast Number of observations After data integration After elimination of duplicates % difference COORDINATES OUT OF RANGE COORDINATES OUT OF RANGE 1895 0.2% -35 1860 0.2% NO COORDINATES NO COORDINATES 161167 18.9% -17959 143208 18.4% ISO3 NOT IN AFRICA OK 1 0.0% 0 1 0.0% OK OK 683236 80.1% -54265 628971 80.6% ERROR OK 357 0.0% -38 319 0.0% ERROR ERROR 429 0.1% -28 401 0.1% FOLLOW UP: exclude them or correct themโ€ฆ
  • 26. Rainbio database Quality flags geographical coordinates 2. Shapefiles quality flag geographical coordinates AO BK BS DH JW GD KW MS OH TS VD UB Name: error_tab_data_verif_iso3=neighbour_verif_coast=error.shp error_tab_data_verif_iso3=neighbour_verif_coast=ok.shp Location : database_csv/shp Encoding : UTF8 FOLLOW UP: use distance as a filter, exclude them or correct themโ€ฆ Verif_iso3 = NEIGHBOUR and verif_coast = ERROR or OK verif_iso3 verif_coast Number of observations After data integration After elimination of duplicates % difference COORDINATES OUT OF RANGE COORDINATES OUT OF RANGE 1895 0.2% -35 1860 0.2% NO COORDINATES NO COORDINATES 161167 18.9% -17959 143208 18.4% ISO3 NOT IN AFRICA OK 1 0.0% 0 1 0.0% OK OK 683236 80.1% -54265 628971 80.6% ERROR OK 357 0.0% -38 319 0.0% ERROR ERROR 429 0.1% -28 401 0.1% NEIGHBOUR OK 6163 0.7% -751 5412 0.7% NEIGHBOUR ERROR 50 0.0% -4 46 0.0%
  • 27. Rainbio database Quality flags geographical coordinates 2. Shapefiles quality flag geographical coordinates AO BK BS DH JW GD KW MS OH TS VD UB Name: error_tab_data_verif_iso3=ok_verif_coast=error.shp Location : database_csv/shp Encoding : UTF8 FOLLOW UP: use distance as a filter, exclude them or correct themโ€ฆ Verif_iso3 = OK and verif_coast = ERROR verif_iso3 verif_coast Number of observations After data integration After elimination of duplicates % difference COORDINATES OUT OF RANGE COORDINATES OUT OF RANGE 1895 0.2% -35 1860 0.2% NO COORDINATES NO COORDINATES 161167 18.9% -17959 143208 18.4% ISO3 NOT IN AFRICA OK 1 0.0% 0 1 0.0% OK OK 683236 80.1% -54265 628971 80.6% ERROR OK 357 0.0% -38 319 0.0% ERROR ERROR 429 0.1% -28 401 0.1% NEIGHBOUR OK 6163 0.7% -751 5412 0.7% NEIGHBOUR ERROR 50 0.0% -4 46 0.0% OK ERROR 1537 0.2% -85 1452 0.2%
  • 28. Rainbio database Duplicates AO BK BS DH GD JW KW MS OH TS UB VD RAINBIO DATABASE Janโ€™s auteurs reference table + โ€œNEWโ€ names Janโ€™s taxonomy reference table + โ€œSpecial casesโ€ Barbara Quality flag taxonomy Quality flag geographical coordinates Observations Duplicate OK Identification of duplicates 1. Same observation 2. Location (lon / lat) = ISO3 code
  • 29. Rainbio database Duplicates ISO3 country code surname main collector prefix collection number year of collect suffix Unique observation Columns unique observation (6 columns) Rainbio profile (32 columns) What is an unique observation ? Same surname of main collector + Same year of collect + Same ISO3 country code + Same prefix + collection number + suffix fam gen tax detok detnam dety detm detd iso3 country maj_area loc_notes ddlon ddlat accuracy alt colnam prefix nbr suffix colnamsup coly colm cold kind_col dups description pheno_fl pheno_fr habitat habit DETERMINATION INFORMATION LOCATION INFORMATION COLLECTION INFORMATION DESCRIPTIVE INFORMATION TAXON INFORMATION fam gen tax detok detnam dety detm detd iso3 country maj_area loc_notes ddlon ddlat accuracy alt colnam prefix nbr suffix colnamsup coly colm cold kind_col dups description pheno_fl pheno_fr habitat habit COLLECTION INFORMATION DESCRIPTIVE INFORMATION TAXON INFORMATION DETERMINATION INFORMATION LOCATION INFORMATION
  • 30. Rainbio database Duplicates ISO3 country code surname main collector prefix collection number year of collect suffix Unique observation identify identical lines Rank identical lines within a grading system Export as csv file Keep the best line and remove the others remove one column Final stage 6 columns Initial stage 32 columns Location : database_csv/duplicates Encoding : UTF8 First iteration : intra-child database Second iteration : Rainbio database fam gen tax detok detnam dety detm detd iso3 country maj_area loc_notes ddlon ddlat accuracy alt colnam prefix nbr suffix colnamsup coly colm cold kind_col dups description pheno_fl pheno_fr habitat habit DETERMINATION INFORMATION LOCATION INFORMATION COLLECTION INFORMATION DESCRIPTIVE INFORMATION TAXON INFORMATION fam gen tax detok detnam dety detm detd iso3 country maj_area loc_notes ddlon ddlat accuracy alt colnam prefix nbr suffix colnamsup coly colm cold kind_col dups description pheno_fl pheno_fr habitat habit COLLECTION INFORMATION DESCRIPTIVE INFORMATION TAXON INFORMATION DETERMINATION INFORMATION LOCATION INFORMATION
  • 31. Rainbio database Duplicates Location : database_csv/duplicates/duplicates.xlsx IDENTIFY DUPLICATES SKIP COLUMN STEP1 STEP2 STEP3 STEP4 STEP5 STEP6 STEP7 STEP8 STEP9 STEP10 STEP11 STEP12 STEP13 STEP14 STEP15 STEP16 STEP17 STEP18 STEP19 STEP20 STEP21 STEP22 STEP23 STEP40 STEP41 STEP42 STEP43 fam gen tax detok VALUE detnam VALUE VALUE VALUE VALUE dety detm detd iso3 country NBR CHAR maj_area VALUE loc_notes VALUE VALUE VALUE VALUE VALUE VALUE VALUE ddlon ddlat accuracy VALUE alt MIN colnam SURNAME SURNAME SURNAME SURNAME SURNAME SURNAME CHECK SURNAME SURNAME SURNAME SURNAME SURNAME SURNAME prefix VALUE nbr suffix VALUE DESC colnamsup NBR CHAR coly MAX colm MAX VALUE cold MAX VALUE VALUE kind_col VALUE dups VALUE description VALUE pheno_fl VALUE pheno_fr VALUE habitat VALUE habit VALUE CHECK COORDINATES IF LOCATION NAME IS DIFFERENT 16_01_CHECK 17_01_CHECK 18_01_CHECK 19_01_CHECK 20_01_CHECK CHECK COORDINATES IF DISTANCE > 0 meters 16_02_CHECK 17_02_CHECK 18_02_CHECK 19_02_CHECK 20_02_CHECK 21_02_CHECK 22_02_CHECK 40_02_CHECK 41_02_CHECK 42_02_CHECK 43_02_CHECK CHECK DATE OFCOLLECT IFMORETHAN ONE FULL DATE (YYY-MM-DD) IS AVAILABLE 18_03_CHECK 19_03_CHECK 20_03_CHECK 21_03_CHECK 22_03_CHECK 40_03_CHECK 41_03_CHECK 42_03_CHECK 43_03_CHECK CHECK COLLECTOR NAME 16_04_CHECK 17_04_CHECK 18_04_CHECK 19_04_CHECK 20_04_CHECK 40_04_CHECK 41_04_CHECK 42_04_CHECK 43_04_CHECK CHECK DETERMINATOR NAME 40_06_CHECK 41_06_CHECK 42_06_CHECK 43_06_CHECK USE THE VERIFICATION TO BETTER THE STANDARDIZATION OFCOLLECTEUR NAMES 15_00_CHECK 21_00_CHECK 22_00_CHECK 23_00_CHECK 43_00_CHECK DB RANK DET DB RANK DET SKIP DB RANK DET DB RANK DET DB RANK LOC DB RANK LOC DB RANK LOC DB RANK LOC DB RANK LOC DB RANK LOC DB RANK LOC Detailed documentation Excel file
  • 32. Rainbio database Duplicates Location : database_csv/duplicates/ Name: dups_rb15_00.csv Exemple CSV file step 15 in rainbio database cdb: iteration intra-child database rb: iteration rainbio database 15: step Excel file _00: all identified duplicates Step 15 Sidwell, K.||Sidwell We keep the first line Line separator: || We remove all other lines d_fkdb d_calc_accuracy d_alt d_colnam d_prefix d_nbr d_suffix d_colnamsup d_coldate JW,MS 5||5 99999||99999 Sidwell, K.||Sidwell ###||### 165||165 ###||### ###||### 1992-10-24||1992-10-24 BK,MS 5||5 1248||1248 Dubois, L.||Dubois, J. ###||### 1494||1494 ###||### ###||### 1949-09-01||1949-09-01 BS,MS 5||5 99999||99999 Savory, H.J.||Savory, L. FHI||FHI 25148||25148 ###||### Keay, R.W.J.||### 1948-12-24||1948-12-24 BS,MS 5||5 99999||99999 Savory, H.J.||Savory, L. FHI||FHI 25138||25138 ###||### Keay, R.W.J.||### 1948-12-23||1948-12-23 JW,MS 3||3 99999||99999 Muller, T.||Muller ###||### 1981||1981 ###||### Pope, G.V.; Russell, E.||### 1971-12-19||1971-12-19 JW,MS 4||4 99999||99999 Thompson, S.A.||Thompson ###||### 1626||1626 ###||### Rawlins, J.E.||### 1984-07-16||1984-07-16 fam gen tax detok detnam dety detm detd iso3 country maj_area loc_notes ddlon ddlat accuracy alt colnam prefix nbr suffix colnamsup coly colm cold kind_col dups description pheno_fl pheno_fr habitat habit TAXON INFORMATION DETERMINATION INFORMATION LOCATION INFORMATION COLLECTION INFORMATION DESCRIPTIVE INFORMATION
  • 33. Rainbio database Duplicates How to rank identical lines within a grading system in order to keep the best line Step 16 name of location Step 17 geographical coordinates 1 OK 2 NEIGHBOUR 3 ISO3 NOT IN AFRICA 4 ERROR 5 COORDINATES OUT OF RANGE 6 LATITUDE MISSING 7 NO COORDINATES 1 VD 2 AO 3 BK 3 BS 3 DH 3 GD 3 JW 3 KW 3 MS 3 OH 3 TS 3 UB 8 7 6 5 4 3 2 1 3 CALCULATED ACCURACY CODE RANKING 1 ISO3 VERIFICATION RANKING 2 CHILD DATABASE LOCATION RANKING Step 16 and 17 fam gen tax detok detnam dety detm detd iso3 country maj_area loc_notes ddlon ddlat accuracy alt SURNAME prefix nbr suffix colnamsup coly colm cold kind_col dups description pheno_fl pheno_fr habitat habit LOCATION INFORMATION COLLECTION INFORMATION DESCRIPTIVE INFORMATION TAXON INFORMATION DETERMINATION INFORMATION
  • 34. fam gen tax detok detnam dety detm detd iso3 country maj_area loc_notes ddlon ddlat accuracy alt SURNAME prefix nbr suffix colnamsup coly colm cold kind_col dups description pheno_fl pheno_fr habitat habit TAXON INFORMATION DETERMINATION INFORMATION LOCATION INFORMATION COLLECTION INFORMATION DESCRIPTIVE INFORMATION Rainbio database Duplicates How to rank identical lines within a grading system in order to keep the best line Step 40 Most recent determination Step 41 genus identical Step 42 family identical Step 16 and 17 3 TAXONOMY RANK RANKING 0 F|F 1 VAR|ร— 2 VAR|CV 3 VAR|F 4 VAR|UNKNOWN 5 SUBVAR 6 SUBSP|ร— 7 SUBSP|F 8 SUBSP|VAR 9 ร— 10X 11CV 12CVGR 13F 14VAR 15SUBSP 16ESP 17GEN 18FAM 19UNKNOWN 20UNKNOWN|UNKNOWN 4 VALID NAME SOURCE RANK 1 JAN REFERENCE TABLE 2 CHILD DATABASE 1 CHILD DATABASE TAX RANKING 1 VD 2 AO 3 BK 4 DH 5 BS 5 GD 5 JW 5 KW 5 MS 5 OH 5 TS 5 UB 2 DETERMINATION DATE RANKING 1 most recent
  • 35. 78 37 3096 16 68 3295 55 9 33 2 872 17 27 4 1019 2130 7 24 85 489 17 39 3 664 687 200 73 67 5 2878 36 19 4 3282 5570 517 1 1 241 3 1634 322 448 3167 35878 4 125 3 56 68 387 7024 4 326 3 32 179 3 603 517 4 78 194 882 AO BK BS DH GD JW KW MS OH TS UB VD ALL CHILDS RAINBIO 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Steps involved to eliminate duplicates all columns descriptive information location name geographical coordinates collection date name of determiner or date of determination family, genus identical, taxon name different family identical, genus and taxon name different exlude family, genus and taxon name different fam gen tax detok detnam dety detm detd iso3 country maj_area loc_notes ddlon ddlat accuracy alt colnam prefix nbr suffix colnamsup coly colm cold kind_col dups description pheno_fl pheno_fr habitat habit LOCATION INFORMATION COLLECTION INFORMATION DESCRIPTIVE INFORMATION TAXON INFORMATION DETERMINATION INFORMATION Rainbio database Duplicates Rainbio iteration : 57863 lines Intra-child databases iteration : 13387 lines fam gen tax detok detnam dety detm detd iso3 country maj_area loc_notes ddlon ddlat accuracy alt colnam prefix nbr suffix colnamsup coly colm cold kind_col dups description pheno_fl pheno_fr habitat habit LOCATION INFORMATION COLLECTION INFORMATION DESCRIPTIVE INFORMATION TAXON INFORMATION DETERMINATION INFORMATION
  • 36. fam gen tax detok detnam dety detm detd iso3 country maj_area loc_notes ddlon ddlat accuracy alt SURNAME prefix nbr suffix colnamsup coly colm cold kind_col dups description pheno_fl pheno_fr habitat habit TAXON INFORMATION DETERMINATION INFORMATION LOCATION INFORMATION COLLECTION INFORMATION DESCRIPTIVE INFORMATION Rainbio database Duplicates STEP 43 determination different on family level Step 43 File: duplicates/dups_rb43_00_CHECK.csv 1783 cases in rainbio iteration File: duplicates/dups_cdb43_00_CHECK.csv 882 cases in intra-child database iteration FOLLOW UP: correct them or use column c_step02 = โ€˜%43%โ€™ to exclude the observations
  • 37. Rainbio database Duplicates Duplicates with different geographical coordinates Location of geographical coordinates Bounding box Example IDC ISO3 LOC_NOTES DDLON DDLAT COLNAM PREFIX NBR SUFFIX COLY VERIF_ISO3 VERIF_COAS TAX_TAX TS100579839 CMR Mefou Proposed National Park. Mefou National Park, Ndanen 2. 9.61889 4.99 Etuge, M. 5271 2004 OK OK Tylophora conspicua KWK000199925 CMR Mefou National Park, Ndanen 2. 11.58 3.63 Etuge, M. 5271 2004 OK OK Tylophora JW1056921 CMR Mefou National Park, Ndanen 2. 11.5833 3.61667 Etuge, M. 5271 2004 OK OK Tylophora conspicua 64828 (7.6%) observations having 69367 duplicates 59797 (92%) of duplicated observations are georeferenced Rainbio database : Total: 854836 observations Coordinates from different sources for 53163 (89%) georeferenced duplicated observations Coordinates are different for 15397 observations (29%) Exemple: Mefou National Park, Ndanen 2
  • 38. Rainbio database Duplicates Bounding boxes of duplicated observations with coordinates from different sources Bounding box ALL
  • 39. Rainbio database Duplicates Bounding boxes of duplicated observations with coordinates from different sources Bounding box After duplicates iterations and ranking (quality flag location) ALL Coordinates are different but at least they are in the same country
  • 40. Rainbio database Duplicates Bounding boxes of duplicated observations with coordinates from different sources Bounding box After duplicates iterations and ranking (quality flag location) After elimination bbox width < 10km and bbox height < 10km ALL FOLLOW UP: โ€ฆ use column d_x_meters and d_y_meters < 10000 as filter
  • 41. Rainbio database Duplicates Duplicates with different geographical coordinates Location of geographical coordinates Bounding box Example Exemple: Mefou National Park, Ndanen 2 Current location in Rainbio database Current location in Rainbio database is wrong !! FOLLOW UP: Change the database ranking in order to select the โ€œbestโ€ location IDC ISO3 LOC_NOTES DDLON DDLAT COLNAM PREFIX NBR SUFFIX COLY VERIF_ISO3 VERIF_COAS TAX_TAX TS100579839 CMR Mefou Proposed National Park. Mefou National Park, Ndanen 2. 9.61889 4.99 Etuge, M. 5271 2004 OK OK Tylophora conspicua KWK000199925 CMR Mefou National Park, Ndanen 2. 11.58 3.63 Etuge, M. 5271 2004 OK OK Tylophora JW1056921 CMR Mefou National Park, Ndanen 2. 11.5833 3.61667 Etuge, M. 5271 2004 OK OK Tylophora conspicua
  • 42. Rainbio database Quality flags geographical coordinates AO BK BS DH GD JW KW MS OH TS UB VD RAINBIO DATABASE Janโ€™s auteurs reference table + โ€œNEWโ€ names Janโ€™s taxonomy reference table + โ€œSpecial casesโ€ Barbara Quality flag taxonomy Quality flag geographical coordinates Observations Duplicate OK EXPORT OF CSV FILES RAINBIO DATABASE VALUES OUT OF RANGE STEPS DUPLICATES PROBLEMS TAXONOMY LINK SHAPEFILES LOCATION ERRORS NEW VERSION EXPORT CSV FILE STANDARDIZATION COLLECTEORS
  • 43. Rainbio database Rainbio CSV file Name: utf8_rainbio_vXX.csv Location : database_csv/db_csv First series of columns : Second series of columns : Rainbio profile + quality flags (47 columns) Columns from taxonomy reference table starting with โ€œtax_โ€ (16 columns) Values from Janโ€™s table if we have the link otherwise values from the child database Third series of columns : Identification of duplicates starting with โ€œd_โ€ (54 columns) NULL if we do not have any duplicated valus - If the values in the duplicated records are identical XX||YY values from duplicated records if they are different Fourth series of columns : Identification of duplicates starting with โ€œc_โ€ (6 columns) Steps to identify duplicates without elimination Exemple : step 43 observations with duplicates but determination on family level is different, just flag them but do not treat them as a duplicated record.
  • 44. 562683 51442 19981 4732 119086 66304 sp. gen. fam. no determination 0 100000 200000 300000 400000 500000 600000 700000 800000 Number of observations Number of observations by determination rank location OK no coordinates location error location error duplicates difference determination family level duplicates Rainbio database Version 6 Taxonomy Rank All Location OK Others family 383 359 24 genus 4520 3750 770 sp. 34116 26259 7857 Locations All Location OK Others * Locations 179349 104121 75228 * iso3, main collector, year of collect, location notes Locations
  • 45. Rainbio database Version 6 7857 6790 2728 1780 1334 1061 2474 0 1000 2000 3000 4000 5000 6000 7000 8000 0 5 10 15 20 25 30 35 40 45 50 Number of species Number of georeferenced locations Number of species by number of georeferenced locations