Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
ย
Zaiss rainbio database
1. Rainbio
workflow to build up the database
Rainer Zaiss
IRD AMAP
Rainbio meeting
Aix-en-Provence 18-22 may 2015
2. Rainbio database Child databases
5567
1596
4529
3565
12874
449509
55919
94829
14510
147504
62380
2054
AO
BK
BS
DH
GD
JW
KW
MS
OH
TS
UB
VD
0 50000 100000 150000 200000 250000 300000 350000 400000 450000
Number of observations - child databases
Total: 854 836 observations
ID DATABASE Owner
AO Anne Black-Overgraad
BK Barbara McKinder
BS Bonaventure Sonkรฉ
DH David Harris
GD Gilles Dauby
JW Jan Wieringa
KW KEW GBIF
MS Marc Sosef
OH Olivier Hardy
TS Tariq Stevart
UB UB
VD Vincent Droissart
Child databases
Rainbio database: unique ID of an observation (line)
ID DATABASE + unique ID of observation in child database
Exemples: BK1239, MSBR0000008978882BGB310957, JW64476
3. Rainbio database RAINBIO profile
Rainbio profile
TYPE COLUMN TYPE EXEMPLES UPDATE DESCRIPTION
UNIQUE IDS
TRACKING OF DUPLICATES
idrb integer serial AUTO Internal unique ID of the observation in Rainbio database
idc character GD000001 Unique identifier of record in child dataset
idsc character JW1234545,BK12334 SCRIPT Duplicates
tax character AUTO Taxon name (fam,gen+esp+rank01+nam01+rank02+gen02)
TAXON
INFORMATION
fam character Family to which specimen belongs
gen character Genus to which specimen belongs
esp character Epithet of species to which specimen belongs
rank01 character var., subsp., โฆ Rank of the first infrataxonomic name
nam01 character First infrataxonomic name
rank02 character var., subsp., โฆ Rank of the second infrataxonomic name
nam02 character Second infrataxonomic name
DETERMINATION
INFORMATION
detok character D, OK Determination status of the specimen: D: doubtful; OK: no doubt indicated
detnam character Name of the identifier
dety integer, Year of the identification
detm integer, Month of the identification
detd integer, Day of the identification
LOCATION
INFORMATION
iso3 character AUTO ISO3 country code
country character Country where the specimen was collected
iso3lonlat character AUTO ISO3 country code from geographical join with country layer
maj_area character Major area of the country where the specimen was collected
loc_notes character Locality notes telling where the specimen was collected
ddlon real, Longitude at which the specimen was collected in DD
ddlat real, Latitude at which the specimen was collected in DD
accuracy integer, 0,1, โฆ, 8 BRAHMS code, Accuracy of the georeference of the specimen
alt integer, Altitude of the specimen in meters
Rainbio profile Other columns
4. Rainbio database RAINBIO profile
Rainbio profile
TYPE COLUMN TYPE EXEMPLES UPDATE DESCRIPTION
DETERMINATION
INFORMATION
colnam character Collector name including initials
prefix character Prefix of the number the collector gave to the specimen
nbr integer, Number the collector gave to the specimen
suffix character Suffix of the number the collector gave to the specimen
colnamsup character Additional collectors associated with the specimen
coly integer, Year when the specimen was collected
colm integer, Month when the specimen was collected
cold integer, Day when the specimen was collected
DESCRIPTIVE
INFORMATION
kind_col character
Herb, Sili, Observation,
Plot_data
Kind of collection: Herb: herbarium voucher; sili: silica gel voucher; etc
dups character Acronyms of herbaria were duplicates are found (komma-delimited)
description character Description of the specimen in the field
pheno_fl character st, Fl, Fr Phenological state of the specimen: st: sterile; Fl: flowering; Fr: fruiting
pheno_fr character yes, no
habitat character Habitat in which specimen was collected
habit character Tr, Li, Sh, He, Ep Habit of the specimen: Tr: tree; Li: liana; Sh: shrub; He: herb; Ep: epiphyte
QUALITY FLAGS
verif_iso3 character AUTO Code verification ISO3 country code (column iso3 and iso3lonlat)
verif_coast character AUTO Code verification coastline
verif_distance integer, AUTO Distance in meters
fktax integer, AUTO ID taxon name Jan reference table
verif_fktax character AUTO Code verification taxon name
calc_accuracy integer, AUTO BRAHMS code, accuracy of the georeference of the specimen calculated from ddlon and ddlat
Rainbio profile Other columns
5. Rainbio database Data integration
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
9. Rainbio database Values out of range csv files
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
Values out of range
1. Skip, correct or leave them like they are โฆ
2. Export as csv files
10. Rainbio database Values out of range csv files
1. utf8_error_data_alt_8000_or_0_v05.csv
Altitude out of range <0 OR >8000 (37 records)
2. utf8_error_data_coly_dety_v05.csv
Year of collect < Year of determination (437 records)
3. utf8_error_data_coly_v05.csv
Year of collect > 2014 or Year of collect < 1200 (10 records)
4. utf8_error_data_lonlat_v05.csv
longitude = 0 and latitude= 0 or longitude >= 180 or longitude <= -180 or
latitude >= 90 or latitude <= -90 (1895 records)
FOLLOW UP: leave them like they are or โฆ..
11. Rainbio database Values out of range csv files
Location : database_csv/errors
Encoding : UTF8
You have to select Unicode (UTF8)
to open the files in Excel.
Otherwise you will have problems with
the conversion of special characters like รฉ, ร , โฆ
Exemple
If you do not select UTF8: Lรยฉon J.
If you select UTF8: Lรฉon J.
12. Rainbio database Standardization: collector names
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
Janโs auteurs
reference
table
+
โNEWโ
names
Collector names (column colnam and colnamsup)
1. Link collector names with reference table from Jan
2. Track changes in CSV file to better the standardization
13. Rainbio database Standardization: collector names
Link name of main collector with reference table from Jan
Standards
Column โcolnamโ and โcolnamsupโ
1. Format: Name, Initials Prefix
Examples: Brenan, J.P.M., Forestor, H. de
2. Keep accented version
Example: Bovรฉ, N.
Column โcolnamsupโ
3. Separator of names: ;
Example: Arends, J.C.; Bruijn, J. de
FOLLOW UP: leave them like they are or โฆ.. Some strange names remain like 'Peterโ, [ B.J. Brutt], Agric Department, โฆ
14. Rainbio database Standardization: collector names
CSV file to better the standardization of collector names
fkdb c_colnam colnam colnam_modify refcolnam iso3 type
JW(3) Achomfo Nangasudo, NB Achomfo Nangasudo, N.B. Achomfo Nangasudo, N.B. Achomfo Nangasudo, N.B.
GHA|JW(3) COLSUP
MS(1)||BS(1) Achoundong G. Achoundong, G. Achoundong, G. Achoundong, G. CMR|MS(1)||CMR|BS(1) COL||COL
JW(22) Achoundong, G Achoundong, G. Achoundong, G. Achoundong, G. CMR|JW(22) COL
KW(18) Achoundong, G. Achoundong, G. Achoundong, G. Achoundong, G. CMR|KW(18) COL
AO(2)||JW(1)||JW(1)||JW(1)||JW(1)||JW(13)||JW(2)||JW(3)||KW(1)||KW(1)||TS(1)||TS(44)
G. Achoundong||Achoundong, G; Freddy & Enow||Zapfack, L; Achoundong, G; Onana, J-M; Elad, ME; Aggi; Ndumbe, P & Nguembock, F||Achoundong, G & Nana, Z||Achoun
Achoundong, G. Achoundong, G. Achoundong, G. CMR|AO(2)||CMR|JW(1)||CMR|JW(1)||CMR|JW(1)||GIN|JW(1)||
COLSUP
MS(211) Achten L.T. Achten, L.T.M. Achten, L.T.M. Achten, L.T.M. COD|MS(211) COL
JW(22) Achten, LTM Achten, L.T.M. Achten, L.T.M. Achten, L.T.M. COD|JW(22) COL
TS(2) L. Achten Achten, L.T.M. Achten, L.T.M. Achten, L.T.M. COD|TS(2) COL
AO(2)||KW(2) Martin Achu||Achu, M. Achu, M. Achu, M. CMR|AO(2)||CMR|KW(2) COLSUP
JW(1)||JW(1)||JW(2)||JW(5)||JW(5)
Chouaibou, K; Toh, C; Biye, EH; Tadjouteu, F; Rheede, C van de; Iwanaka; Achu, PF & Garcia, J||Njie, F; Chouaibou, K; Gwellem Abula, J; Wanduku, D; Muma Ngu, N; Fomba, V
Achu, P.F. Achu, P.F. Achu, P.F. CMR|JW(1)||CMR|JW(1)||CMR|JW(2)||CMR|JW(5)||CMR|JW(5)
COLSUP
JW(16)||JW(2)||JW(39) Mackinder, BA; Nana, V & Achuo, F||Achuo, F; Ndene, R & Okon, FI||Mackinder, BA; Nana, V; Achuo, F; Abwe, E & Morgan, B
Achuo, F. Achuo, F. Achuo, F. CMR|JW(16)||CMR|JW(2)||CMR|JW(39) COLSUP
MS(11) Acocks J.P.H. Acocks, J.P.H. Acocks, J.P.H. Acocks, J.P.H. NAM|MS(1),ZAF|MS(10) COL
JW(406)||JW(1) Acocks, JPH Acocks, J.P.H. Acocks, J.P.H. Acocks, J.P.H. NAM|JW(4),ZAF|JW(402)||ZAF|JW(1) COL||COLSUP
Name: utf8_error_data_colnam_colnamsup_vXX.csv (22620 lines)
Location : database_csv/nams
Encoding : UTF8
Initial name in
child database
Child database
(number of lines)
Current name in
Rainbio database
Name in Janโs table
Empty if not in Janโs table
COL: main col.
COLSUP: add. Col.
ISO3 country code +
first column
If you would like to help to better the standardization:
1. Do not touch the column in red
2. Just enter the name (Name, Initials Prefix) that you would like to retain in the column in green
15. Rainbio database Link taxonomy reference table
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
Janโs auteurs
reference
table
+
โNEWโ
names
Janโs taxonomy
reference
table
+
โSpecial casesโ
Barbara
Link taxonomy reference table
1. Link taxon name of the child database to
Janโs taxonomy reference table to get the valid name
2. Export CSV files to track errors
Quality flag
taxonomy
16. 46269
41993
Jan's reference table
valid names
all observations
After data migration
taxon names
child databases
0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000
Number of taxa by taxonomy quality flag
OK OK BARBARA TAXNAM NO MATCH MORE THAN ONE VALID IDTAX FOR TAXNAM FKTAX NO VALID NAME MATCH
42648
Rainbio database Link taxonomy reference table
Link taxon name of child databases to Janโs taxonomy reference table
Taxonomy link quality flag
17. Rainbio database Link taxonomy reference table
1. Taxonomy quality flag: TAXNAM NO MATCH
Name: utf8_error_data_tax_match_ref_tab_tax_vXX
Location : database_csv/tax_match
Code
child databases
TAXNAM NO MATCH:
JAN: names from Jan
Current name
in Rainbio database
Columns to store the taxon name in the Rainbio database
FOLLOW UP: leave them like they are or correct themโฆ
1. Do not touch the column in red
2. Enter the correct information in the columns in green or enter the valid taxon id to create a special case
The file holds all names in use in the Rainbio database from Janโs reference table +
413 non matching names (TAXNAM NO MATCH)
Valid ID
from Jan
verif_fktax fkdb idvalid tax gen esp rank01 nam01 rank02 nam02
JAN 13880 Acalypha indica
JAN 38314 Acalypha integrifolia
TAXNAM NO MATCH TS Acalypha integrifolia var. crateriana Acalypha integrifolia var. crateriana
JAN 17536 Acalypha intermedia
Taxon name of child database not in Janโs taxonomy reference table
18. Rainbio database Link taxonomy reference table
1. Taxonomy quality flag: FKTAX NO VALID NAME MATCH
Name: utf8_error_ref_tax_link_idtax_valid_name_no_match_vXX
Location : database_csv/tax_match
14 cases in Rainbio database
Problem in Janโs taxonomy reference
Simple or more complicated loops in synonyms relationship
idtax relation
7595 FKTAX FLAGGED AS HOMONYME
304123 SYN OF SYN OF ID: 304123,304127,304123||SYN OF SYN OF ID: 375298,304127,304123
313097 SYN OF SYN OF ID: 346134,49195,313097||SYN OF SYN OF ID: 346135,49195,313097
ID
from Jan
Relationship inside Janโs table
FOLLOW UP: leave them like they are, correct them in Janโs taxonomy reference table or create some special casesโฆ
19. Rainbio database Link taxonomy reference table
1. Taxonomy quality flag: MORE THAN ONE VALID IDTAX FOR TAXNAM
Name: utf8_error_ref_tax_link_idtax_tax_not_unique_vXX.csv
Location : database_csv/tax_match
191 names in Rainbio database
Problem in Janโs taxonomy reference
Same name at least twice in Janโs reference table with different relationships
FOLLOW UP: leave them like they are, correct them in Janโs taxonomy reference table or create some special casesโฆ
Taxon name Relationship inside Janโs table
idtax tax relation idvalid
197 Schmidelia javensis f. genuinus SYN OF VALID: 77982,197||SYN OF SYN OF ID: 156194,230960,197 197
197 Schmidelia velutina SYN OF VALID: 78014,197||SYN OF VALID: 103535,103391 197
103391 Schmidelia velutina SYN OF VALID: 78014,197||SYN OF VALID: 103535,103391 197
ID that we use
Valid ID
from Jan
20. Rainbio database Quality flags geographical coordinates
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
Janโs auteurs
reference
table
+
โNEWโ
names
Janโs taxonomy
reference
table
+
โSpecial casesโ
Barbara
Quality flag
taxonomy
Quality flag
geographical
coordinates
Quality flags geographical coordinates
Columns verif_iso3 and verif_coast
1. Location (lon / lat) inside continent
2. Location (lon / lat) = ISO3 code
21. Rainbio database Quality flags geographical coordinates
1. Check if the geographical coordinates are on the African continent
verif_coast
Number of observations
%
COORDINATES OUT OF RANGE 1895 0.2%
NO COORDINATES 161167 18.9%
OK 689757 80.7%
ERROR 2017 0.2%
Location of geographical coordinates
22. Rainbio database Quality flags geographical coordinates
2. Compare ISO3 country code of observation with ISO3 of geographical coordinates (Lon/Lat)
ISO3: GAB
verif_iso3
Number of observations
%
COORDINATES OUT OF RANGE 1895 0.2%
NO COORDINATES 161167 18.9%
ISO3 NOT IN AFRICA 1 0.0%
OK 684773 80.1%
NEIGHBOUR 6213 0.7%
ERROR 786 0.1%
OK: ISO3 Lon/Lat = ISO3 observation
Location of geographical coordinates
ISO3 : XXX ISO3 country code of observation
23. Rainbio database Quality flags geographical coordinates
2. Compare ISO3 country code of observation with ISO3 of geographical coordinates (Lon/Lat)
verif_iso3
Number of observations
%
COORDINATES OUT OF RANGE 1895 0.2%
NO COORDINATES 161167 18.9%
ISO3 NOT IN AFRICA 1 0.0%
OK 684773 80.1%
NEIGHBOUR 6213 0.7%
ERROR 786 0.1%
NEIGHBOUR: ISO3 Lon/Lat = ISO3 observation neighboring country
Draw a line from the location of the geographical coordinates
to the nearest boundary of the neighboring country
Location of geographical coordinates
ISO3 : XXX ISO3 country code of observation
ISO3: COG
24. Rainbio database Quality flags geographical coordinates
ISO3: COG
ISO3: GNQ
verif_iso3
Number of observations
%
COORDINATES OUT OF RANGE 1895 0.2%
NO COORDINATES 161167 18.9%
ISO3 NOT IN AFRICA 1 0.0%
OK 684773 80.1%
NEIGHBOUR 6213 0.7%
ERROR 786 0.1%
NEIGHBOUR: ISO3 Lon/Lat = ISO3 observation neighboring country
Draw a line from the location of the geographical coordinates
to the nearest boundary of the neighboring country
Location of geographical coordinates
ISO3 : XXX ISO3 country code of observation
ERROR: ISO3 Lon/Lat โ ISO3 observation neighboring country
Draw a line from the location of the geographical coordinates
to the centroid of the country associated with the observation
2. Compare ISO3 country code of observation with ISO3 of geographical coordinates (Lon/Lat)
25. Rainbio database Quality flags geographical coordinates
2. Shapefiles quality flag geographical coordinates
AO BK BS DH JW
GD
KW MS OH TS VD
UB
Name: error_tab_data_verif_iso3=error_verif_coast=error.shp
error_tab_data_verif_iso3=error_verif_coast=ok.shp
Location : database_csv/shp
Encoding : UTF8
Verif_iso3 = ERROR and verif_coast = ERROR or OK
verif_iso3 verif_coast
Number of observations
After data integration After elimination of duplicates
% difference
COORDINATES OUT OF RANGE COORDINATES OUT OF RANGE 1895 0.2% -35 1860 0.2%
NO COORDINATES NO COORDINATES 161167 18.9% -17959 143208 18.4%
ISO3 NOT IN AFRICA OK 1 0.0% 0 1 0.0%
OK OK 683236 80.1% -54265 628971 80.6%
ERROR OK 357 0.0% -38 319 0.0%
ERROR ERROR 429 0.1% -28 401 0.1%
FOLLOW UP: exclude them or correct themโฆ
26. Rainbio database Quality flags geographical coordinates
2. Shapefiles quality flag geographical coordinates
AO BK BS DH JW
GD
KW MS OH TS VD
UB
Name: error_tab_data_verif_iso3=neighbour_verif_coast=error.shp
error_tab_data_verif_iso3=neighbour_verif_coast=ok.shp
Location : database_csv/shp
Encoding : UTF8
FOLLOW UP: use distance as a filter, exclude them or
correct themโฆ
Verif_iso3 = NEIGHBOUR and verif_coast = ERROR or OK
verif_iso3 verif_coast
Number of observations
After data integration After elimination of duplicates
% difference
COORDINATES OUT OF RANGE COORDINATES OUT OF RANGE 1895 0.2% -35 1860 0.2%
NO COORDINATES NO COORDINATES 161167 18.9% -17959 143208 18.4%
ISO3 NOT IN AFRICA OK 1 0.0% 0 1 0.0%
OK OK 683236 80.1% -54265 628971 80.6%
ERROR OK 357 0.0% -38 319 0.0%
ERROR ERROR 429 0.1% -28 401 0.1%
NEIGHBOUR OK 6163 0.7% -751 5412 0.7%
NEIGHBOUR ERROR 50 0.0% -4 46 0.0%
27. Rainbio database Quality flags geographical coordinates
2. Shapefiles quality flag geographical coordinates
AO BK BS DH JW
GD
KW MS OH TS VD
UB
Name: error_tab_data_verif_iso3=ok_verif_coast=error.shp
Location : database_csv/shp
Encoding : UTF8
FOLLOW UP: use distance as a filter, exclude them or
correct themโฆ
Verif_iso3 = OK and verif_coast = ERROR
verif_iso3 verif_coast
Number of observations
After data integration After elimination of duplicates
% difference
COORDINATES OUT OF RANGE COORDINATES OUT OF RANGE 1895 0.2% -35 1860 0.2%
NO COORDINATES NO COORDINATES 161167 18.9% -17959 143208 18.4%
ISO3 NOT IN AFRICA OK 1 0.0% 0 1 0.0%
OK OK 683236 80.1% -54265 628971 80.6%
ERROR OK 357 0.0% -38 319 0.0%
ERROR ERROR 429 0.1% -28 401 0.1%
NEIGHBOUR OK 6163 0.7% -751 5412 0.7%
NEIGHBOUR ERROR 50 0.0% -4 46 0.0%
OK ERROR 1537 0.2% -85 1452 0.2%
28. Rainbio database Duplicates
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
Janโs auteurs
reference
table
+
โNEWโ
names
Janโs taxonomy
reference
table
+
โSpecial casesโ
Barbara
Quality flag
taxonomy
Quality flag
geographical
coordinates
Observations
Duplicate
OK
Identification of duplicates
1. Same observation
2. Location (lon / lat) =
ISO3 code
29. Rainbio database Duplicates
ISO3 country code
surname main collector
prefix
collection number
year of collect
suffix
Unique observation
Columns unique observation (6 columns)
Rainbio profile (32 columns)
What is an unique observation ?
Same surname of main collector +
Same year of collect +
Same ISO3 country code +
Same prefix + collection number + suffix
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
colnam
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
DETERMINATION
INFORMATION
LOCATION
INFORMATION
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
TAXON
INFORMATION
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
colnam
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
TAXON
INFORMATION
DETERMINATION
INFORMATION
LOCATION
INFORMATION
30. Rainbio database Duplicates
ISO3 country code
surname main collector
prefix
collection number
year of collect
suffix
Unique observation
identify
identical lines
Rank identical lines
within a grading system
Export as
csv file
Keep the best line and
remove the others
remove one
column
Final stage 6 columns
Initial stage 32 columns
Location : database_csv/duplicates
Encoding : UTF8
First iteration : intra-child database
Second iteration : Rainbio database
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
colnam
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
DETERMINATION
INFORMATION
LOCATION
INFORMATION
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
TAXON
INFORMATION
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
colnam
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
TAXON
INFORMATION
DETERMINATION
INFORMATION
LOCATION
INFORMATION
31. Rainbio database Duplicates
Location : database_csv/duplicates/duplicates.xlsx
IDENTIFY DUPLICATES SKIP
COLUMN STEP1 STEP2 STEP3 STEP4 STEP5 STEP6 STEP7 STEP8 STEP9 STEP10 STEP11 STEP12 STEP13 STEP14 STEP15 STEP16 STEP17 STEP18 STEP19 STEP20 STEP21 STEP22 STEP23 STEP40 STEP41 STEP42 STEP43
fam
gen
tax
detok VALUE
detnam VALUE VALUE VALUE VALUE
dety
detm
detd
iso3
country NBR CHAR
maj_area VALUE
loc_notes VALUE VALUE VALUE VALUE VALUE VALUE VALUE
ddlon
ddlat
accuracy VALUE
alt MIN
colnam SURNAME SURNAME SURNAME SURNAME SURNAME SURNAME CHECK SURNAME SURNAME SURNAME SURNAME SURNAME SURNAME
prefix VALUE
nbr
suffix VALUE DESC
colnamsup NBR CHAR
coly MAX
colm MAX VALUE
cold MAX VALUE VALUE
kind_col VALUE
dups VALUE
description VALUE
pheno_fl VALUE
pheno_fr VALUE
habitat VALUE
habit VALUE
CHECK COORDINATES IF LOCATION NAME IS DIFFERENT 16_01_CHECK 17_01_CHECK 18_01_CHECK 19_01_CHECK 20_01_CHECK
CHECK COORDINATES IF DISTANCE > 0 meters 16_02_CHECK 17_02_CHECK 18_02_CHECK 19_02_CHECK 20_02_CHECK 21_02_CHECK 22_02_CHECK 40_02_CHECK 41_02_CHECK 42_02_CHECK 43_02_CHECK
CHECK DATE OFCOLLECT IFMORETHAN ONE FULL DATE (YYY-MM-DD) IS AVAILABLE 18_03_CHECK 19_03_CHECK 20_03_CHECK 21_03_CHECK 22_03_CHECK 40_03_CHECK 41_03_CHECK 42_03_CHECK 43_03_CHECK
CHECK COLLECTOR NAME 16_04_CHECK 17_04_CHECK 18_04_CHECK 19_04_CHECK 20_04_CHECK 40_04_CHECK 41_04_CHECK 42_04_CHECK 43_04_CHECK
CHECK DETERMINATOR NAME 40_06_CHECK 41_06_CHECK 42_06_CHECK 43_06_CHECK
USE THE VERIFICATION TO BETTER THE STANDARDIZATION OFCOLLECTEUR NAMES 15_00_CHECK 21_00_CHECK 22_00_CHECK 23_00_CHECK 43_00_CHECK
DB RANK
DET
DB RANK
DET
SKIP
DB RANK
DET
DB RANK
DET
DB RANK
LOC
DB RANK
LOC
DB RANK
LOC
DB RANK
LOC
DB RANK
LOC
DB RANK
LOC
DB RANK
LOC
Detailed documentation Excel file
32. Rainbio database Duplicates
Location : database_csv/duplicates/
Name: dups_rb15_00.csv
Exemple CSV file step 15 in rainbio database
cdb: iteration intra-child database
rb: iteration rainbio database
15: step Excel file _00: all identified duplicates
Step 15
Sidwell, K.||Sidwell
We keep the first line
Line separator: ||
We remove all other lines
d_fkdb d_calc_accuracy d_alt d_colnam d_prefix d_nbr d_suffix d_colnamsup d_coldate
JW,MS 5||5 99999||99999 Sidwell, K.||Sidwell ###||### 165||165 ###||### ###||### 1992-10-24||1992-10-24
BK,MS 5||5 1248||1248 Dubois, L.||Dubois, J. ###||### 1494||1494 ###||### ###||### 1949-09-01||1949-09-01
BS,MS 5||5 99999||99999 Savory, H.J.||Savory, L. FHI||FHI 25148||25148 ###||### Keay, R.W.J.||### 1948-12-24||1948-12-24
BS,MS 5||5 99999||99999 Savory, H.J.||Savory, L. FHI||FHI 25138||25138 ###||### Keay, R.W.J.||### 1948-12-23||1948-12-23
JW,MS 3||3 99999||99999 Muller, T.||Muller ###||### 1981||1981 ###||### Pope, G.V.; Russell, E.||### 1971-12-19||1971-12-19
JW,MS 4||4 99999||99999 Thompson, S.A.||Thompson ###||### 1626||1626 ###||### Rawlins, J.E.||### 1984-07-16||1984-07-16
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
colnam
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
TAXON
INFORMATION
DETERMINATION
INFORMATION
LOCATION
INFORMATION
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
33. Rainbio database Duplicates
How to rank identical lines within a grading system in order to keep the best line
Step 16 name of location
Step 17 geographical coordinates
1 OK
2 NEIGHBOUR
3 ISO3 NOT IN AFRICA
4 ERROR
5 COORDINATES OUT OF RANGE
6 LATITUDE MISSING
7 NO COORDINATES
1 VD
2 AO
3 BK
3 BS
3 DH
3 GD
3 JW
3 KW
3 MS
3 OH
3 TS
3 UB
8
7
6
5
4
3
2
1
3
CALCULATED
ACCURACY
CODE
RANKING
1
ISO3
VERIFICATION
RANKING
2
CHILD
DATABASE
LOCATION
RANKING
Step 16 and 17
fam
gen
tax
detok
detnam
dety
detm
detd
iso3
country
maj_area
loc_notes
ddlon
ddlat
accuracy
alt
SURNAME
prefix
nbr
suffix
colnamsup
coly
colm
cold
kind_col
dups
description
pheno_fl
pheno_fr
habitat
habit
LOCATION
INFORMATION
COLLECTION
INFORMATION
DESCRIPTIVE
INFORMATION
TAXON
INFORMATION
DETERMINATION
INFORMATION
37. Rainbio database Duplicates
Duplicates with different geographical coordinates
Location of geographical coordinates
Bounding box
Example
IDC ISO3 LOC_NOTES DDLON DDLAT COLNAM PREFIX NBR SUFFIX COLY VERIF_ISO3 VERIF_COAS TAX_TAX
TS100579839 CMR Mefou Proposed National Park. Mefou National Park, Ndanen 2. 9.61889 4.99 Etuge, M. 5271 2004 OK OK Tylophora conspicua
KWK000199925 CMR Mefou National Park, Ndanen 2. 11.58 3.63 Etuge, M. 5271 2004 OK OK Tylophora
JW1056921 CMR Mefou National Park, Ndanen 2. 11.5833 3.61667 Etuge, M. 5271 2004 OK OK Tylophora conspicua
64828 (7.6%) observations having 69367 duplicates
59797 (92%) of duplicated observations are georeferenced
Rainbio database :
Total: 854836 observations
Coordinates from different sources for 53163 (89%)
georeferenced duplicated observations
Coordinates are different for 15397 observations (29%)
Exemple: Mefou National Park, Ndanen 2
39. Rainbio database Duplicates
Bounding boxes of duplicated observations with coordinates from different sources
Bounding box
After duplicates iterations and ranking (quality flag location)
ALL
Coordinates are different but at least they are in the same country
40. Rainbio database Duplicates
Bounding boxes of duplicated observations with coordinates from different sources
Bounding box
After duplicates iterations and ranking (quality flag location)
After elimination bbox width < 10km and bbox height < 10km
ALL
FOLLOW UP: โฆ
use column d_x_meters and d_y_meters < 10000 as filter
41. Rainbio database Duplicates
Duplicates with different geographical coordinates
Location of geographical coordinates
Bounding box
Example
Exemple: Mefou National Park, Ndanen 2
Current location in Rainbio database
Current location in Rainbio database is wrong !!
FOLLOW UP: Change the database ranking in order to select
the โbestโ location
IDC ISO3 LOC_NOTES DDLON DDLAT COLNAM PREFIX NBR SUFFIX COLY VERIF_ISO3 VERIF_COAS TAX_TAX
TS100579839 CMR Mefou Proposed National Park. Mefou National Park, Ndanen 2. 9.61889 4.99 Etuge, M. 5271 2004 OK OK Tylophora conspicua
KWK000199925 CMR Mefou National Park, Ndanen 2. 11.58 3.63 Etuge, M. 5271 2004 OK OK Tylophora
JW1056921 CMR Mefou National Park, Ndanen 2. 11.5833 3.61667 Etuge, M. 5271 2004 OK OK Tylophora conspicua
42. Rainbio database Quality flags geographical coordinates
AO BK BS DH GD JW KW MS OH TS UB VD
RAINBIO DATABASE
Janโs auteurs
reference
table
+
โNEWโ
names
Janโs taxonomy
reference
table
+
โSpecial casesโ
Barbara
Quality flag
taxonomy
Quality flag
geographical
coordinates
Observations
Duplicate
OK
EXPORT OF CSV FILES
RAINBIO
DATABASE
VALUES
OUT OF RANGE
STEPS
DUPLICATES
PROBLEMS
TAXONOMY LINK
SHAPEFILES
LOCATION ERRORS
NEW VERSION
EXPORT CSV FILE
STANDARDIZATION
COLLECTEORS
43. Rainbio database Rainbio CSV file
Name: utf8_rainbio_vXX.csv
Location : database_csv/db_csv
First series of columns :
Second series of columns :
Rainbio profile + quality flags (47 columns)
Columns from taxonomy reference table starting with โtax_โ (16 columns)
Values from Janโs table if we have the link
otherwise values from the child database
Third series of columns : Identification of duplicates starting with โd_โ (54 columns)
NULL if we do not have any duplicated valus
- If the values in the duplicated records are identical
XX||YY values from duplicated records if they are different
Fourth series of columns : Identification of duplicates starting with โc_โ (6 columns)
Steps to identify duplicates without elimination
Exemple : step 43
observations with duplicates but determination on family level is different,
just flag them but do not treat them as a duplicated record.
44. 562683
51442
19981
4732
119086 66304
sp.
gen.
fam.
no determination
0 100000 200000 300000 400000 500000 600000 700000 800000
Number of observations
Number of observations by determination rank
location OK no coordinates
location error location error duplicates
difference determination family level duplicates
Rainbio database Version 6
Taxonomy
Rank All Location OK Others
family 383 359 24
genus 4520 3750 770
sp. 34116 26259 7857
Locations All Location OK Others *
Locations 179349 104121 75228
* iso3, main collector, year of collect, location notes
Locations
45. Rainbio database Version 6
7857
6790
2728
1780
1334
1061
2474
0
1000
2000
3000
4000
5000
6000
7000
8000
0 5 10 15 20 25 30 35 40 45 50
Number
of
species
Number of georeferenced locations
Number of species by number of georeferenced locations