Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Defining	the	relevant	population: can	
the	DNA	approach	work	for	speech?
Vincent Hughes
Department of Language and Linguis...
0.	Outline
focus	of	this	talk:
• introduction	to	forensic	voice	comparison	(FVC)	
and	the	likelihood	ratio	(LR)
• what	is	...
1.	Introduction
4
1.1	Forensic	voice	comparison
• forensic	voice	comparison	(FVC)	=	voice	of	
offender	(unknown)	vs.	voice	of	suspect	(kno...
5
1.2	Expressing	conclusions
GULP
30th
January	2014
what	can’t	the	expert	say?
POSITIVE	IDENTIFICATION NEGATIVE	IDENTIFICA...
6
1.2	Expressing	conclusions
why?
• how	likely	it	is	that	the	suspect	is	the	offender	
given	the	evidence?
– it’s	an	asses...
7
1.3	Likelihood	ratio	(LR)
what	can	the	expert	say?
• provides	a	gradient	assessment	of	the	strength/	
weight	of	evidence...
8
1.3	Likelihood	ratio	(LR)
why?
• evaluation	of	the	evidence,	rather	than	the	
hypotheses	(innocence	vs.	guilt)
– separat...
9
1.4	Computing	a	LR
• LR	=	similarity and	typicality
– it	matters	“whether	the	values	found	matching	(…)	
are	vanishingly...
10
1.4	Computing	a	LR
p(E|Hp)	
p(E|Hd)
= 0.047
0.0115
=						4.08
GULP
30th
January	2014
11
Raw	LR Log10	LR Strength of	evidence Support
>	10,000 4	à 5 very	strong
1000 à 10,000 3	à 4 strong
100	à 1000 2	à 3 mod...
2.	The	relevant	population
13
2.1	The	defence hypothesis	(Hd)
• relevant	population	=	depends	on	the	question
– defined	by	the	defencehypothesis
BUT	...
14
2.2	DNA	approach
• “logical	relevance”	(Kaye	2004,	2008)
– factors	which	affect	the	distribution	a	variable	in	the	
pop...
15
2.3	Issues	for	speech
• multidimensionality	of	between-speaker	
variation	in	spontaneous	speech
– social	stratification...
3.	Definitions	of	the	relevant	
population	in	FVC
logical	relevance
17
3.1	Logical	relevance
• Rose	(2004:4):
“quite	often	(Hd)	will	simply	be	that	the	voice	of	the	
unknown	speaker	does	not...
18
3.2	Issues
• why	just	sex	and	language?
(i) sex/	language	are	easily	accessible	in	the	speech	
signal	(but	see	French	e...
19
3.3	Empirical	testing
• general	structure:
– 1	set	of	test	data	multiple	sets	of	reference	data
• matched with	test	dat...
20
3.3.1	Empirical	testing:	PRICE
• formant	
dynamics	(F1,	F2	
+	F3)	for	/aɪ/
– measured	at	
+10%	steps
• fitted	with	cubi...
21
3.3.1	Empirical	testing:	PRICE
• test	data	=	20	speakers	of	Standard	Southern	
British	English	(SSBE)
– DyViS	database	...
22
3.3.1	Empirical	testing:	PRICE
GULP	Colloquium
30th
January	2014
23
3.3.1	Empirical	testing:	PRICE
GULP	Colloquium
30th
January	2014
24
3.3.3	Empirical	testing:	FACE
• investigating	(i)	socio-economic	class,	(ii)	age	
and	(iii)	class*age	(interaction)
– c...
25
3.3.3	Empirical	testing:	FACE
why	NZE?	
GULP	Colloquium
30th
January	2014
adapted	from	Hay	et	al.	(2008:	97)
26
Experiment	1:	
Class
GULP	Colloquium
30th
January	2014
Matched Mismatched Mixed
-7
-6
-5
-4
-3
-2
-1
0
1
DS SS DS SS DS...
27
Experiment	2:	
Age
GULP	Colloquium
30th
January	2014
Matched Mismatched Mixed
-4
-3
-2
-1
0
1
DS SS DS SS DS SS
Log10LR
28
General	patterns:	Cllr
29
3.4	Issues
• so	many	sources	of	between-sp variation:	
which	to	control?
– inevitable	mismatch	between	evidential	recor...
4.	Definitions	of	the	relevant	
population	in	FVC
speaker	similarity
31
4.1	Speaker	similarity
GULP	Colloquium
30th
January	2014
“it	wasn’t	our	client	(the	suspect),	it	was…
…a	member	of	a	po...
32
4.1	Speaker	similarity
• similar	sounding	speakers	to	the	offender	as	
judged	by	lay	listeners
– lay	listener	(police	o...
4.1	Speaker	similarity
problems
• limited	view	of	variation	in	production	and	
perception
• what	factors	do	we	control	in	...
5.	Discussion
5.	Discussion
• direct	application	of	logical	relevance	(DNA)	
clearly	inappropriate
• but	speaker	similarity	is	as	proble...
5.1	Multiple	Hd (from	DNA)
• offer	multiple	LRs	based	on	different	definitions	
of	the	relevant	population
– “if	the	relev...
5.2	Normalisation(from	DNA)
• control	big	sources	of		(e.g.	regional	
background,	sex)	variation	in	the	database	and	
use	...
5.3	Speaker	similarity
• best	to	develop	the	speaker	similarity	approach	
in	Morrison	et	al.	(2012)
• probably	the	underly...
5.3	Speaker	similarity
• speaker	similarity	based	on	objective	similarity	
(rather	than	lay	listener	judgments)
– using	di...
6.	Conclusions
• speech	is	complex	and	multivariate:
- both	in	terms	of	the	things	we	analysis	and	the	
degree	of	within- ...
6.	Conclusions
• DNA	model	is	problematic	for	speech:
– but	elements	of	it	may	be	able	to	be	adapted	to	
improve	the	situa...
Thanks!
Questions?
Acknowledgements:	Paul	Foulkes,	Erica	Gold,	Peter	
French,	Dom	Watt,	Ashley	Brereton,	FSS	Research	
Gro...
Upcoming SlideShare
Loading in …5
×

Defining the relevant population: can the DNA approach work for speech?

Hughes, V. (2014) Defining the relevant population: can the DNA approach work for speech? Glasgow University Laboratory of Phonetics (GULP) Colloquium, University of Glasgow, Glasgow, UK. 30 January 2014. (INVITED TALK)

  • Be the first to comment

  • Be the first to like this

Defining the relevant population: can the DNA approach work for speech?

  1. 1. Defining the relevant population: can the DNA approach work for speech? Vincent Hughes Department of Language and Linguistic Science Glasgow University Laboratory of Phonetics University of Glasgow 30th January 2014
  2. 2. 0. Outline focus of this talk: • introduction to forensic voice comparison (FVC) and the likelihood ratio (LR) • what is meant by the ‘relevant population’? – application in forensic DNA analysis – complexities of language variation • current approaches in FVC • three potential alternatives GULP 30th January 2014 2
  3. 3. 1. Introduction
  4. 4. 4 1.1 Forensic voice comparison • forensic voice comparison (FVC) = voice of offender (unknown) vs. voice of suspect (known) is the person on the criminal recording the same as the person on the suspect recording? • auditory-acoustic linguistic-phonetic analysis – analysis of a range of segmental (vowels, consonants), suprasegmental (f0, intonation, AR), higher-order linguistic (lexical choice, syntax) and VQ/vocal setting GULP 30th January 2014
  5. 5. 5 1.2 Expressing conclusions GULP 30th January 2014 what can’t the expert say? POSITIVE IDENTIFICATION NEGATIVE IDENTIFICATION sure beyond reasonable doubt probable there can be very little doubt quite probable highly likely likely likely highly likely very probable probable quite possible possible … that they are the same person … that they are different people from French & Baldwin (1990) ✗
  6. 6. 6 1.2 Expressing conclusions why? • how likely it is that the suspect is the offender given the evidence? – it’s an assessment of guilt – this is the job of the judge/ jury (trier of fact) • requires access to all of the evidence • possibility doesn’t tell you about probability – continuum isn’t equal on both sides (bias towards positive identification?) GULP 30th January 2014
  7. 7. 7 1.3 Likelihood ratio (LR) what can the expert say? • provides a gradient assessment of the strength/ weight of evidence • ratio = value centered on 1 where: – support for prosecution = > 1 – support for defence= < 1 ✓ p(E|Hp) p(E|Hd) p = probability E = evidence | = ‘given’ Hp = prosecution hyp Hd = defence hyp GULP 30th January 2014
  8. 8. 8 1.3 Likelihood ratio (LR) why? • evaluation of the evidence, rather than the hypotheses (innocence vs. guilt) – separates the role of the expert and trier of fact • explicit consideration of both prosecution and defence hypotheses (objective) • clear (??) probabilistic statement presented to the Court GULP 30th January 2014
  9. 9. 9 1.4 Computing a LR • LR = similarity and typicality – it matters “whether the values found matching (…) are vanishingly rare (…) or near universal” (Nolan 2001:16) – typicality of values within- and between-speakers • typicality = dependent on patterns in the “relevant population” (Aitken & Taroni 2004) – reference data = sample of that population – distributions modelledstatistically to generate numerical output GULP 30th January 2014
  10. 10. 10 1.4 Computing a LR p(E|Hp) p(E|Hd) = 0.047 0.0115 = 4.08 GULP 30th January 2014
  11. 11. 11 Raw LR Log10 LR Strength of evidence Support > 10,000 4 à 5 very strong 1000 à 10,000 3 à 4 strong 100 à 1000 2 à 3 moderately strong Prosecution 10 à 100 1 à 2 moderate 1 à 10 0 à 1 limited 1 0 neutral Neither 0.1 ß 1 -1 ß 0 limited 0.01 ß 0.1 -2 ß -1 moderate 0.001 ß 0.01 -3 ß -2 moderately strong Defence 0.0001 ß 0.001 -4 ß -3 strong > 0.0001 -5 ß -4 very strong 4.08 = limited support for the prosecution GULP 30th January 2014
  12. 12. 2. The relevant population
  13. 13. 13 2.1 The defence hypothesis (Hd) • relevant population = depends on the question – defined by the defencehypothesis BUT impossible to assess the probability of the evidence with a vague (or no) Hd p(E|Hd) “it wasn’t our client (the suspect), it was… …someone else” GULP 30th January 2014
  14. 14. 14 2.2 DNA approach • “logical relevance” (Kaye 2004, 2008) – factors which affect the distribution a variable in the population (determine sub-populations) • ethnicity = logically relevant for DNA – allele frequencies differ between racial groups – 3 databases used for forensic DNA analysis in UK (Fraser & Williams 2009) • multiple LRs based on different Hd assumptions – 3 LRs using the 3 different databases GULP 30th January 2014
  15. 15. 15 2.3 Issues for speech • multidimensionality of between-speaker variation in spontaneous speech – social stratification = much more complex than DNA – mass of evidence from socio-linguistics/-phonetics • linguistic-phonetic variables affected by different social factors to different extents within- and between-dialects • logically relevant information is accessible – we can make numerous inferences about the offender from a sample of his/her speech GULP 30th January 2014
  16. 16. 3. Definitions of the relevant population in FVC logical relevance
  17. 17. 17 3.1 Logical relevance • Rose (2004:4): “quite often (Hd) will simply be that the voice of the unknown speaker does not belong to the accused, but to another same-sex speaker of the language” • reflected in the majority of LR-based studies: – Kinoshita (2002), Aldermann (2004), Kinoshita (2005), Rose (2006), Rose et al. (2006), Rose (2007), Morrison & Kinoshita (2008), Morrison (2009), Morrison et al. (2011) Morrison (2011), Zhang et al. (2011)… GULP 30th January 2014
  18. 18. 18 3.2 Issues • why just sex and language? (i) sex/ language are easily accessible in the speech signal (but see French et al. (2010:145), Foulkes &French (2012:569)) (ii) sex/ language are the most significant sources of social variation defining sub-populations - lack of understanding of complexity of socially stratified variation in speech • paradox = without knowing who the offender is we can’t know (for sure) the population of which (s)he is a member GULP 30th January 2014
  19. 19. 19 3.3 Empirical testing • general structure: – 1 set of test data multiple sets of reference data • matched with test data for social factor of interest • mismatched with test data for social factor of interest • mixed: no control over social factor of interest • system error evaluated using: Log LR cost (Cllr): – penalises the system for high magnitude contrary to fact LRs • theory = an incorrect LR close to unity is much less important than an incorrect LR with very high magnitude • MVKD formula used to compute LRs (Aitken & Lucy 2004) GULP 30th January 2014
  20. 20. 20 3.3.1 Empirical testing: PRICE • formant dynamics (F1, F2 + F3) for /aɪ/ – measured at +10% steps • fitted with cubic polynomial curves – coefficients used as input for LRs GULP 30th January 2014
  21. 21. 21 3.3.1 Empirical testing: PRICE • test data = 20 speakers of Standard Southern British English (SSBE) – DyViS database (Nolan et al. 2009): Task 1 – young (18-25 yrs), male – 10 tokens per speaker • reference data – matched: 32 DyViS (SSBE) – mixed (BrEng): 8 DyViS/ 8 Manchester/ 8 York/ 8 Newcastle GULP 30th January 2014
  22. 22. 22 3.3.1 Empirical testing: PRICE GULP Colloquium 30th January 2014
  23. 23. 23 3.3.1 Empirical testing: PRICE GULP Colloquium 30th January 2014
  24. 24. 24 3.3.3 Empirical testing: FACE • investigating (i) socio-economic class, (ii) age and (iii) class*age (interaction) – class: professional vs. non-professional – age: younger (born after 1960) vs. older (born after 1960) • data for 101 male speakers of NZE (ONZE) – 8 tokens per speaker – cubic polynomial coefficients of F1, F2 and F3 • for each experiment, reference data: (a) matched, (b) mismatched, (c) mixed GULP Colloquium 30th January 2014
  25. 25. 25 3.3.3 Empirical testing: FACE why NZE? GULP Colloquium 30th January 2014 adapted from Hay et al. (2008: 97)
  26. 26. 26 Experiment 1: Class GULP Colloquium 30th January 2014 Matched Mismatched Mixed -7 -6 -5 -4 -3 -2 -1 0 1 DS SS DS SS DS SS Log10LR
  27. 27. 27 Experiment 2: Age GULP Colloquium 30th January 2014 Matched Mismatched Mixed -4 -3 -2 -1 0 1 DS SS DS SS DS SS Log10LR
  28. 28. 28 General patterns: Cllr
  29. 29. 29 3.4 Issues • so many sources of between-sp variation: which to control? – inevitable mismatch between evidential recordings and any set of reference data • don’t want to make the relevant population too narrow: – reductioad absurdum – issue with prior odds (Rose 2013) • paradox remains: still don’t know if we’re right GULP Colloquium 30th January 2014
  30. 30. 4. Definitions of the relevant population in FVC speaker similarity
  31. 31. 31 4.1 Speaker similarity GULP Colloquium 30th January 2014 “it wasn’t our client (the suspect), it was… …a member of a population of speakers who sound sufficiently similar that an investigator or prosecutor would submit recordings of these speakers for forensic analysis” from Morrison et al. (2012)
  32. 32. 32 4.1 Speaker similarity • similar sounding speakers to the offender as judged by lay listeners – lay listener (police officer) who made the decision to submit the samples for analysis – ∴ it can include males + females, different accents… as long as they ‘sound similar’ (Morrison et al. 2012) • listeners match characteristics of the person who made the original decision: – e.g. young, male police officer from X… GULP Colloquium 30th January 2014
  33. 33. 4.1 Speaker similarity problems • limited view of variation in production and perception • what factors do we control in our listeners? • what do the listeners hear? – how to replicate the conditions of the original decision – some controls over what is played to the listeners (usually sex and language again) • lack of replicability • lay listeners are linguistically erratic when it comes to assessing speaker-similarity (McDougall 2011) 33 GULP Colloquium 30th January 2014 ✗
  34. 34. 5. Discussion
  35. 35. 5. Discussion • direct application of logical relevance (DNA) clearly inappropriate • but speaker similarity is as problematic, if not more so, on linguistic grounds • need new ways of defining the relevant populations – logically/ legally/ linguistically appropriate • there might be elements of the DNA approach which can be applied to speech 35 GULP Colloquium 30th January 2014
  36. 36. 5.1 Multiple Hd (from DNA) • offer multiple LRs based on different definitions of the relevant population – “if the relevant population is x, then the LR is y” BUT: – still have to control some factors and ignore others – need multiple sets of reference data = impractical • easier in DNA with one grouping factor (ethnicity) and available databases – outcome isn’t particularly clear for the Court 36 GULP Colloquium 30th January 2014
  37. 37. 5.2 Normalisation(from DNA) • control big sources of (e.g. regional background, sex) variation in the database and use a correction factor to normalise for lower level variation (see Balding & Nichols 1994) BUT: – requires a priori knowledge of the type of variation in the dataset – not clear mathematically how this should be done for such multidimensional data 37 GULP Colloquium 30th January 2014
  38. 38. 5.3 Speaker similarity • best to develop the speaker similarity approach in Morrison et al. (2012) • probably the underlying assumption should be that “it wasn’t our client, it was someone else who sounds like the offender” but decisions relating to linguistic evidence are best made by linguists! 38 GULP Colloquium 30th January 2014
  39. 39. 5.3 Speaker similarity • speaker similarity based on objective similarity (rather than lay listener judgments) – using distance scores (Euclidean distances ??) based on auditory judgment/ acoustic measurements – *should* capture speakers of the same sociolinguistic background too! • approach used by ASR systems – BatVox identifies the 30 ‘closest’ speakers to the suspect (but should be based on offender) 39 GULP Colloquium 30th January 2014
  40. 40. 6. Conclusions • speech is complex and multivariate: - both in terms of the things we analysis and the degree of within- and between-speaker variation • defining the relevant population for speech is a difficult issue – current approaches are inadequate – reflect a lack of awareness of the complexity of speech – preference for logical correctness over linguistic correctness 40 GULP Colloquium 30th January 2014
  41. 41. 6. Conclusions • DNA model is problematic for speech: – but elements of it may be able to be adapted to improve the situation • probably best to look at speaker similarity – expert doesn’t have to commit to saying that the offender is a white, middle class male from X – more simple but remaining logically/ legally/ linguistically appropriate 41 GULP Colloquium 30th January 2014
  42. 42. Thanks! Questions? Acknowledgements: Paul Foulkes, Erica Gold, Peter French, Dom Watt, Ashley Brereton, FSS Research Group (York) 42

×