Format for the population data in forensic genetics ppt

PROPOSALS FOR THE FORMAT
FOR POPULATION DATA BASES
AND THEIR ANALYSIS
A. G. Smolyanitsky1, N. N. Khromov-Borisov1, G. B.A. G. Smolyanitsky1, N. N. Khromov-Borisov1, G. B.
Lazzarotto2 and T. B. L. Kist2
1Forensic Medicine Bureau of Leningrad District, Saint
Petersburg, Russia
2Institute of Biosciences, Federal University of Rio Grande do
Sul, Porto Alegre, Brazil
Andrew.Smolyanitsky@yandex.ru
Nikita.KhromovBorisov@gmail.com
Gustavo.Lazzarotto@terra.com.br
Kist@molgen.mpg.de

DNA-PCR Data Banks
DNA-PCR Databank: http://www.uni-
duesseldorf.de/WWW/MedFak/Serology/database.
html
DB on Nuclear DNADB on Nuclear DNA
http://www.ertzaintza.net/cgi-bin/
db2www.exe/adn.d2w/INPUT?IDIOMA=INGLES
World population data
J. Forensic Sci. 45 (1) 118-146 (2000)
CODIS STR loci data
J. Forensic Sci. 46 (3) 453-489 (2001)

Precision and accuracy
Sometime inaccurate calculation or
presentation of relative allele
frequencies are observedfrequencies are observed
Precision up to three significant
digits appear to be not sufficient

Round-off
Sometimes the sum of the frequencies is not
equal to unit due to low precision or round-off
errors, such as, e.g., 0.879 or 1.123
Sometime it is difficult to round-off correctly
the recalculated absolute frequencies, such as,
e.g., 18.51 or 75.48
As a result their sum may be odd or not equal to
the published value

Uncertainties
Some data sets appear to be completely
identical
Such duplications may result from the
fact that they are reproduced infact that they are reproduced in
different publications
SANCT software permits to identify
them in very large DB automatically

Independence
Some data sets seems to be non-
independent: preliminary data
published earlier are then combined
with the new data in subsequentwith the new data in subsequent
publications
SANCT software facilitates their
detection

Collapsability
Sometime rare alleles are combined with
the nearest ones, e.g., 14+15+16
SANCT puts this manipulation on the solidSANCT puts this manipulation on the solid
statistical ground:
Categories (both, alleles and/or samples)
are combined (collapsed) not arbitrarily,
but those which are statistically
homogeneous, e.g., 14+21

Precision
Compute relative frequencies with at least
four or even more significant digits (GDA)
Check the equality of their sum to unit:
Sum (pi)=1.0000
Check the “re-computability” of the initial
absolute counts:
Sum (pi ×N)=N

Show individual genotypes
when feasible
ID Locus A Locus B Locus Z
Xx-xxx 3.2/7 --/-- 6/6Xx-xxx
1
3.2/7
3207
--/--
0000
6/6
0606
Yy-yyy
2
6/14
0614
17/18
1718
9/9.3
0093
FSTAT is able to detect 0093 as an error

Convertibility
Program Import Export
GDA BIOSYS BIOSYS
GeneStrut GeneStat-PCGeneStrut GeneStat-PC
Weir GeneStrut
Nexus
SAS
Weir

Convertibility
Program Import Export
GENETIX GENEPOP Arlequin
FSTAT BIOSYS
Text GENEPOP
FSTAT

Show absolute counts
Present genotype counts in form of
triangle matrix.
Such presentation visualizes theSuch presentation visualizes the
“saturation” of the data and permits to
present important information on the
partial fixation indices in compact form
on the same matrix.

Template for genotype and allele counts,
partial fixation indices and relative allele
frequencies
Locus: GC n = 196
Allele A B C fii Ni pi
A 25 0.06 0.08 0.08 131 0.3308A 25 0.06 0.08 0.08 131 0.3308
B 14 2 0.06 -0.03 45 0.1136
C 67 27 63 0.04 220 0.5556
Total 0.044 396 1.0000
GDA software provides computing fii

Availability
“Open and show all your data”,
visualization and “statistification”
or GSP (Good Statistics Practice)or GSP (Good Statistics Practice)
must be the main principles in data
basing.
Make all your data available to the
users preferably online or under
request from the authors.

Sincere thanks
Drs.
Carsten HohoffCarsten Hohoff
Edwin Ehrlich
Kurt Trübner
for the invitation, help and support

Format for the population data in forensic genetics ppt

Recommended

Recommended

More Related Content

Similar to Format for the population data in forensic genetics ppt

Similar to Format for the population data in forensic genetics ppt (20)

More from Nikita Khromov-Borisov

More from Nikita Khromov-Borisov (19)

Recently uploaded

Recently uploaded (20)

Format for the population data in forensic genetics ppt