SlideShare a Scribd company logo
1 of 25
The fundamental problem of
Forensic Statistics
How to assess the evidential value
of a rare type match
Giulia Cereda, Université de Lausanne
Richard D. Gill, University of Leiden
The problem
• A crime
• A piece of evidence found at the crime scene
(DNA, fingerprint, footprint, hand writing, etc.)
• A suspect (identified independently)
• A match between suspect’s characteristic and
evidence’s characteristic.
• A database which counts the frequency of each
characteristic.
• Database frequency of the crime (and the
suspect) characteristic is 0
Example
• A DNA stain is found on the victim’s body.
• Y-STR profile of type h.
• A suspect is identified, which is also of Y-STR type
h.
• The Y-STR database of reference does not contain
type h
Small databases
Generalized-Good. Non parametric Good-type
estimator based on Good (1953).
DiscLap-method (Andersen et al. 2013)
Explore other methods (Brenner 2010, Roewer
2000, …)
How to evaluate this kind of evidence?
The Likelihood Ratio
E is the evidence to be evaluated
B is the background information
Hp: the suspect left the stain
Hd: someone else left the stain
Many possible
choices
THE likelihood ratio does not exists
Typical choice
• E= the particular haplotype of the suspect
and of the crime stain
• B=the list of haplotypes in the database
e.g. Discrete Laplace Method
This frequency is not known. It can only be estimated
Uncertainty
e.g.DiscLapmethod
A different choice
• E=number of times the haplotypes of the
suspect (hs) and the haplotype of the crime-
stain (hc) are in the data-base and whether or
not they are the same haplotype.
• B= the frequencies of the frequencies of the
database.
Ignore information about the particular
haplotype
• D database
Gotham City, 12,13,30,24,10,11,13
Gotham City, 12,13,30,24,10,11,14
Gotham City, 13,12,30,24,10,11,13
Gotham City, 13,13,29,23,10,11,13
Gotham City, 13,13,29,24,10,11,14
Gotham City, 13,13,29,24,11,13,13
Gotham City, 13,13,29,24,11,13,13
Gotham City, 13,13,30,24,10,11,13
Gotham City, 13,13,30,24,10,11,13
Gotham City, 13,13,30,24,10,11,13
Gotham City, 13,13,30,24,10,11,13
D’ database count
Gotham City, 12,13,30,24,10,11,13
1
Gotham City, 12,13,30,24,10,11,14
1
Gotham City, 13,12,30,24,10,11,13
1
Gotham City, 13,13,29,23,10,11,13
1
Gotham City, 13,13,29,24,10,11,14
1
Gotham City, 13,13,29,24,11,13,13
2
Gotham City, 13,13,30,24,10,11,13
4
The frequencies of frequencies
N1 5
N2 1
N3 0
N4 1
Df frequencies of frequencies
Information
is discarded
N1 is the number of haplotypes which occur
once in D (singletons)
N2 is the number of duplets
Etc.
A database D of size N
Gotham City, 12,13,30,24,10,11,13
Gotham City, 12,13,30,24,10,11,14
Gotham City, 13,12,30,24,10,11,13
Gotham City, 13,13,29,23,10,11,13
Gotham City, 13,13,29,24,10,11,14
Gotham City, 13,13,29,24,11,13,13
Gotham City, 13,13,29,24,11,13,13
Gotham City, 13,13,30,24,10,11,13
Gotham City, 13,13,30,24,10,11,13
Gotham City, 13,13,30,24,10,11,13
Gotham City, 13,13,30,24,10,11,13
can be considered as an
i.i.d. sample (Y1, Y2, …, YN ) from
species {1,2,…,s} with
probabilities (p1, p2, … ps).
The database count
Gotham City, 12,13,30,24,10,11,13 1
Gotham City, 12,13,30,24,10,11,14 1
Gotham City, 13,12,30,24,10,11,13
1
Gotham City, 13,13,29,23,10,11,13 1
Gotham City, 13,13,29,24,10,11,14 1
Gotham City, 13,13,29,24,11,13,13 2
Gotham City, 13,13,30,24,10,11,13 4
is a realization of r.v. (X1, X2, …, Xs),
defined Xj=#{i|Yi=j}.
The frequencies of frequencies
is made of (N1, N2,… )
where Nj=#{i|Xi=j}
N1 5
N2 1
N3 0
N4 1
• E=numbers of times the haplotypes of the
suspect (hs) and the haplotype of the crime-
stain (hc) are in the data-base and whether or
not they are the same haplotype.
• B= the frequencies of the frequencies of the
database (Df)
unbiased estimator for the numerator
unbiased estimator
for the denominator
It is more sensible to estimate instead of .
is approximately unbiased for .
This suggests to use
as an estimator for
How well estimates the true (unknown) ?
Take a big database of size 12,727.
Consider it as the world population. C1=0, C2=0.
Then,
1. Sample a little databases of size N=100+1+1.
2. If the 101th type is a new one in the small database increase
C1=C1+1
3. Check if the 101th is a new type equal to the 102th. C2=C2+1
4. Repeat steps 1-3 M=10,000 times.
P1=C1/M, P2=C2/M,
distribution of over many replications of small
databases (size N=100) sampled from a bigger one (size N=12,727)
which we pretend is the population.
And from which we obtain a value for 2.603:
We sample 1000 databases of size 100 from the big one, and for
each we calculate the estimate :
Performance of the GG-method
We know .
We know .
We sample 1000 databases of size 100 from the big one, and for
each we calculate the estimate :
Performance of the GG-method
How well estimates the true (unknown) ?
distribution over many replications of small databases (size N=100)
and new haplotype sampled from a bigger one (size N=12,727).
For each database sampled, the true frequency of the new
haplotype h is taken equal to its frequency in the big database.
The estimated frequency is calculated using the Discrete
Laplace method with default options (iterations, init_y …).
We calculate the distribution of and for each
database and new haplotype sampled.
Performance of the DiscLap-method
Comparing the distribution of
0 200 400 600 800 1000
0246
Comparing the errors of the two methods
DiscLap-method GG-method
0 200 400 600 800 1000
0246
log10(Ratio_Gill)
−10123456
−10123456
log10(Ratio_Gill)
Comparing the errors of the two methods
DiscLap-method GG-method
Remarks
Two more levels of uncertainty:
• whether or not the model M that we are
assuming for Pr is “correct enough”
• whether or not parameters of Pr in the model
M are “correct enough”
Basic uncertainty:
• whether or not the trace comes from the
suspect
Maybe DiscLap was never intended it to be used for such
small databases.
Maybe DiscLap does better for our purpose when used in
more clever (targeted for our purpose) ways.
The error in the DiscLap method is given by two levels of
uncertainty:
• Population vs DiscLap
• Parameter estimation (within Disclap)
The GG is a “model-free” method which thus has only one
level of uncertainty.
Conclusions
• The situation is more complex than it appears.
• Using more information less accurate LR.
• Assuming less gives more reliable LR.
References
You want to discuss? Know more?
Collaborate? Give suggestions?
You are welcome!
Giulia.cereda@unil.ch

More Related Content

Similar to The fundamental problem of Forensic Statistics

Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Jonathan Stray
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detectionguest76d673
 
Es credit scoring_2020
Es credit scoring_2020Es credit scoring_2020
Es credit scoring_2020Eero Siljander
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detectionguest0edcaf
 
Querylog-based Assessment of Retrievability Bias in Delpher
Querylog-based Assessment of Retrievability Bias in DelpherQuerylog-based Assessment of Retrievability Bias in Delpher
Querylog-based Assessment of Retrievability Bias in DelpherMyriam Traub
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
Data mining Concepts and Techniques
Data mining Concepts and Techniques Data mining Concepts and Techniques
Data mining Concepts and Techniques Justin Cletus
 
Relational machine-learning
Relational machine-learningRelational machine-learning
Relational machine-learningBhushan Kotnis
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
 

Similar to The fundamental problem of Forensic Statistics (20)

Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
 
Data science
Data scienceData science
Data science
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detection
 
BA 3 Statistics.ppt
BA 3 Statistics.pptBA 3 Statistics.ppt
BA 3 Statistics.ppt
 
Es credit scoring_2020
Es credit scoring_2020Es credit scoring_2020
Es credit scoring_2020
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Querylog-based Assessment of Retrievability Bias in Delpher
Querylog-based Assessment of Retrievability Bias in DelpherQuerylog-based Assessment of Retrievability Bias in Delpher
Querylog-based Assessment of Retrievability Bias in Delpher
 
2주차
2주차2주차
2주차
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Environmental statistics
Environmental statisticsEnvironmental statistics
Environmental statistics
 
Data in science
Data in science Data in science
Data in science
 
Data mining Concepts and Techniques
Data mining Concepts and Techniques Data mining Concepts and Techniques
Data mining Concepts and Techniques
 
Relational machine-learning
Relational machine-learningRelational machine-learning
Relational machine-learning
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Sampling Theory Part 1
Sampling Theory Part 1Sampling Theory Part 1
Sampling Theory Part 1
 

Recently uploaded

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 

Recently uploaded (20)

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 

The fundamental problem of Forensic Statistics

  • 1. The fundamental problem of Forensic Statistics How to assess the evidential value of a rare type match Giulia Cereda, Université de Lausanne Richard D. Gill, University of Leiden
  • 2. The problem • A crime • A piece of evidence found at the crime scene (DNA, fingerprint, footprint, hand writing, etc.) • A suspect (identified independently) • A match between suspect’s characteristic and evidence’s characteristic. • A database which counts the frequency of each characteristic. • Database frequency of the crime (and the suspect) characteristic is 0
  • 3. Example • A DNA stain is found on the victim’s body. • Y-STR profile of type h. • A suspect is identified, which is also of Y-STR type h. • The Y-STR database of reference does not contain type h Small databases
  • 4. Generalized-Good. Non parametric Good-type estimator based on Good (1953). DiscLap-method (Andersen et al. 2013) Explore other methods (Brenner 2010, Roewer 2000, …) How to evaluate this kind of evidence?
  • 5. The Likelihood Ratio E is the evidence to be evaluated B is the background information Hp: the suspect left the stain Hd: someone else left the stain Many possible choices THE likelihood ratio does not exists
  • 6. Typical choice • E= the particular haplotype of the suspect and of the crime stain • B=the list of haplotypes in the database e.g. Discrete Laplace Method
  • 7. This frequency is not known. It can only be estimated Uncertainty e.g.DiscLapmethod
  • 8. A different choice • E=number of times the haplotypes of the suspect (hs) and the haplotype of the crime- stain (hc) are in the data-base and whether or not they are the same haplotype. • B= the frequencies of the frequencies of the database. Ignore information about the particular haplotype
  • 9. • D database Gotham City, 12,13,30,24,10,11,13 Gotham City, 12,13,30,24,10,11,14 Gotham City, 13,12,30,24,10,11,13 Gotham City, 13,13,29,23,10,11,13 Gotham City, 13,13,29,24,10,11,14 Gotham City, 13,13,29,24,11,13,13 Gotham City, 13,13,29,24,11,13,13 Gotham City, 13,13,30,24,10,11,13 Gotham City, 13,13,30,24,10,11,13 Gotham City, 13,13,30,24,10,11,13 Gotham City, 13,13,30,24,10,11,13 D’ database count Gotham City, 12,13,30,24,10,11,13 1 Gotham City, 12,13,30,24,10,11,14 1 Gotham City, 13,12,30,24,10,11,13 1 Gotham City, 13,13,29,23,10,11,13 1 Gotham City, 13,13,29,24,10,11,14 1 Gotham City, 13,13,29,24,11,13,13 2 Gotham City, 13,13,30,24,10,11,13 4 The frequencies of frequencies N1 5 N2 1 N3 0 N4 1 Df frequencies of frequencies Information is discarded N1 is the number of haplotypes which occur once in D (singletons) N2 is the number of duplets Etc.
  • 10. A database D of size N Gotham City, 12,13,30,24,10,11,13 Gotham City, 12,13,30,24,10,11,14 Gotham City, 13,12,30,24,10,11,13 Gotham City, 13,13,29,23,10,11,13 Gotham City, 13,13,29,24,10,11,14 Gotham City, 13,13,29,24,11,13,13 Gotham City, 13,13,29,24,11,13,13 Gotham City, 13,13,30,24,10,11,13 Gotham City, 13,13,30,24,10,11,13 Gotham City, 13,13,30,24,10,11,13 Gotham City, 13,13,30,24,10,11,13 can be considered as an i.i.d. sample (Y1, Y2, …, YN ) from species {1,2,…,s} with probabilities (p1, p2, … ps). The database count Gotham City, 12,13,30,24,10,11,13 1 Gotham City, 12,13,30,24,10,11,14 1 Gotham City, 13,12,30,24,10,11,13 1 Gotham City, 13,13,29,23,10,11,13 1 Gotham City, 13,13,29,24,10,11,14 1 Gotham City, 13,13,29,24,11,13,13 2 Gotham City, 13,13,30,24,10,11,13 4 is a realization of r.v. (X1, X2, …, Xs), defined Xj=#{i|Yi=j}. The frequencies of frequencies is made of (N1, N2,… ) where Nj=#{i|Xi=j} N1 5 N2 1 N3 0 N4 1
  • 11. • E=numbers of times the haplotypes of the suspect (hs) and the haplotype of the crime- stain (hc) are in the data-base and whether or not they are the same haplotype. • B= the frequencies of the frequencies of the database (Df)
  • 12.
  • 13. unbiased estimator for the numerator unbiased estimator for the denominator It is more sensible to estimate instead of . is approximately unbiased for . This suggests to use as an estimator for
  • 14. How well estimates the true (unknown) ? Take a big database of size 12,727. Consider it as the world population. C1=0, C2=0. Then, 1. Sample a little databases of size N=100+1+1. 2. If the 101th type is a new one in the small database increase C1=C1+1 3. Check if the 101th is a new type equal to the 102th. C2=C2+1 4. Repeat steps 1-3 M=10,000 times. P1=C1/M, P2=C2/M, distribution of over many replications of small databases (size N=100) sampled from a bigger one (size N=12,727) which we pretend is the population. And from which we obtain a value for 2.603:
  • 15. We sample 1000 databases of size 100 from the big one, and for each we calculate the estimate : Performance of the GG-method We know .
  • 16. We know . We sample 1000 databases of size 100 from the big one, and for each we calculate the estimate : Performance of the GG-method
  • 17. How well estimates the true (unknown) ? distribution over many replications of small databases (size N=100) and new haplotype sampled from a bigger one (size N=12,727). For each database sampled, the true frequency of the new haplotype h is taken equal to its frequency in the big database. The estimated frequency is calculated using the Discrete Laplace method with default options (iterations, init_y …). We calculate the distribution of and for each database and new haplotype sampled.
  • 18. Performance of the DiscLap-method Comparing the distribution of
  • 19. 0 200 400 600 800 1000 0246 Comparing the errors of the two methods DiscLap-method GG-method 0 200 400 600 800 1000 0246 log10(Ratio_Gill)
  • 20. −10123456 −10123456 log10(Ratio_Gill) Comparing the errors of the two methods DiscLap-method GG-method
  • 21. Remarks Two more levels of uncertainty: • whether or not the model M that we are assuming for Pr is “correct enough” • whether or not parameters of Pr in the model M are “correct enough” Basic uncertainty: • whether or not the trace comes from the suspect
  • 22. Maybe DiscLap was never intended it to be used for such small databases. Maybe DiscLap does better for our purpose when used in more clever (targeted for our purpose) ways. The error in the DiscLap method is given by two levels of uncertainty: • Population vs DiscLap • Parameter estimation (within Disclap) The GG is a “model-free” method which thus has only one level of uncertainty.
  • 23. Conclusions • The situation is more complex than it appears. • Using more information less accurate LR. • Assuming less gives more reliable LR.
  • 25. You want to discuss? Know more? Collaborate? Give suggestions? You are welcome! Giulia.cereda@unil.ch