(2008) Statistical Analysis Framework for Biometric System Interoperability Testing


Published on

Abstract—Biometric systems are increasingly deployed in networked environment, and issues related to interoperability are bound to arise as single vendor, monolithic architectures become less desirable. Interoperability issues affect every subsystem of the biometric system, and a statistical framework to evaluate interoperability is proposed. The framework was applied to the acquisition subsystem for a fingerprint recognition system and the results were evaluated using the framework. Fingerprints were collected from 100 subjects on 6 fingerprint sensors. The results show that performance of interoperable fingerprint datasets is not easily predictable and the proposed framework can aid in removing unpredictability to some degree.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

(2008) Statistical Analysis Framework for Biometric System Interoperability Testing

  1. 1. 5th International Conference on Information Technology and Applications (ICITA 2008) Statistical Analysis Framework for Biometric System Interoperability Testing Shimon K. Modi, Stephen J. Elliott, Ph.D., and H. Kim, Ph.D., Eric P. Kukula Abstract—Biometric systems are increasingly deployed in architectures are designed. Instead of stand-alone, monolithic networked environment, and issues related to interoperability are authentication architectures of the past, today’s networked bound to arise as single vendor, monolithic architectures become world offers the advantage of distributed and federated less desirable. Interoperability issues affect every subsystem authentication architectures. The development of distributed of the biometric system, and a statistical framework to authentication architectures can be seen as an evolutionary step, evaluate interoperability is proposed. The framework was but it raises an issue always accompanied by an attempt to mix applied to the acquisition subsystem for a fingerprint disparate systems: interoperability. What is the effect on recognition system and the results were evaluated using the performance of the authentication process if an individual framework. Fingerprints were collected from 100 subjects establishes his/her credential on one system, and then on 6 fingerprint sensors. The results show that performance authenticates him/her-self on a different system of the same of interoperable fingerprint datasets is not easily modality? This issue is of relevance to all kinds of predictable and the proposed framework can aid in authentication mechanisms, and of particular relevance to removing unpredictability to some degree. biometric recognition systems. The last decade has witnessed a huge increase in deployment of biometric systems, and while Index Terms—fingerprint recognition, biometrics, most of these systems have been single vendor systems the issue statistics, fingerprint sensor interoperability, analysis of interoperability is bound to arise as distributed architectures framework. are considered to be the norm, and not an exception. Fingerprint recognition systems are the most widely deployed biometric I. INTRODUCTION systems, and most commercially deployed fingerprint systems Establishing and maintaining identity of individuals is are single vendor systems [1]. The single point of interaction of becoming evermore important in today’s networked world. The a user with the fingerprint system is during the acquisition stage, complexity of these tasks has also increased proportionally with and this stage has the maximum probability of introducing advances in automated recognition. There are three main inconsistencies in the biometric data. Fingerprint sensors are methods of authenticating an individual: 1) using something based on a variety of different technologies like electrical, that only the authorized individual has knowledge of e.g. optical, thermal etc. The physics behind these technologies passwords 2) using something that only the authorized introduces distortions and variations in the feature set of the individual has possession of e.g. physical tokens 3) using captured image that is not consistent across all of them. This physiological or behavioral characteristics that only the makes the goal of interoperability even more challenging. authorized individual can reproduce i.e. biometrics. The Performance analysis of biometric systems can be achieved increasing use of information technology systems has created using several different techniques; one such technique involves the concept of digital identities which can be used in any of analyzing DET curves. This methodology can be also applied these authentication mechanisms. Digital identities and to testing performance rates of native systems and interoperable electronic credentialing have changed the way authentication systems. Although this method allows a researcher to visually compare the different error rates, creating a statistical methodology for testing interoperability of biometric systems is S. K. Modi is a researcher and Ph.D. candidate in the Biometrics Standards, also of great importance. A formalized process of testing Performance, and Assurance Laboratory in the Department of Industrial Technology, Purdue University, West Lafayette, IN 47907 USA (e-mail: interoperability of biometric systems will be required as the shimon@purdue.edu). issues related to interoperability become more prominent. This S. J. Elliott is Director of the Biometrics Standards, Performance, and research proposes and tests a basic statistical framework for Assurance Laboratory and Associate Professor in the Department of Industrial analyzing matching scores and error rates for fingerprints Technology, Purdue University, West Lafayette, IN 47907 USA (e-mail: elliott@purdue.edu). collected from 6 different fingerprint sensors. H. Kim is Professor of School of Information & Communication Engineering at Inha University, and a member of Biometrics Engineering Research Center (BERC) at Yonsei University, Seoul, Korea (email: hikim@inha.ac.kr). II. REVIEW OF LITERATURE E. P. Kukula is a researcher and Ph.D. candidate in the Biometrics The acquisition of fingerprint images is heavily affected by Standards, Performance, and Assurance Laboratory in the Department of interaction and contact issues. Fingerprint images are affected Industrial Technology, Purdue University, West Lafayette, IN 47907 USA (e-mail: kukula@purdue.edu). by issues like inconsistent contact, non-uniform contact and irreproducible contact [2]. The interaction of a finger when placed on a sensor maps a 3D shape on a 2D surface. This ICITA2008 ISBN: 978-0-9803267-2-7 777
  2. 2. mapping is not controlled, and mapping for the same finger can datasets were far lower than FNMR for interoperable datasets. differ from one sensor to another and thereby adding sensor In [7] the authors conduct an image quality and minutiae count specific deformations. The area of contact between a finger analysis on fingerprints collected from three different sensors surface and the sensor is not the same for different sensors. This and assess the performance rates of the native and interoperable non-uniform contact can influence the amount of detail captured fingerprint datasets using different enrollment methodologies. from the finger surface and consistency of detail captured. Jain They also describe an ANOVA test for testing differences in and Ross examined the issue of matching feature sets mean genuine matching scores between the three datasets. originating from an optical sensor and a capacitive sensor Their preliminary statistical analysis showed that the genuine [3]. Their results showed that minutiae count for the dataset matching scores were statistically significant in their differences collected from the optical sensor was higher than the minutiae for the native and interoperable datasets. The previous body of count for the dataset collected from the capacitive sensor. Their literature points to the growing importance of interoperability results showed that Equal Error Rate (EER) for the two native for biometric systems. The previous research has concentrated databases were 6.14% and 10.39%, while the EER for the on interoperability error rate matrices and comparison of EER interoperable database was 23.13%. Ko and Krishnan [4] between native and interoperable datasets. Although analysis of present a methodology for measuring and monitoring quality of error rates serve as a good indicator of performance, an alternate fingerprint database and fingerprint match performance of the technique which utilizes statistical techniques would be Department of Homeland Security’s Biometric Identification beneficial. A formalized statistical analysis framework for System. One of the findings of their research was the importance testing interoperability is lacking and this problem needs to be of understanding the impact on performance if fingerprints addressed. The researchers in this experiment build on previous captured by a new fingerprint sensor were integrated into an work done in this area and propose a statistical analysis identification application with images captured from an existing framework for testing interoperability. fingerprint sensor. Han et. al [5] performed a study examining the influence of image resolution and distortion due to differences in fingerprint III. STATISTICAL ANALYSIS FRAMEWORK sensor technologies to their matching performance. Their Interoperability of biometric systems is going to become an approach proposed a compensation algorithm which worked on important issue, and the need for an analysis framework will raw images and templates so that all the fingerprint images and become imperative. Frameworks for testing biometric systems templates processed by the system are normalized against a and biometric algorithms can be found in literature, and the pre-specified baseline. Their research performed statistical researchers have proposed a framework for testing analysis on the basic fingerprint features for the original images and the transformed images to test for differences between the interoperability of biometric systems in this paper. For the two. purposes of this research the framework was adapted for testing The International Labor Organization (ILO) commissioned a interoperability of fingerprint sensors. The framework is based biometric testing campaign in an attempt to understand the on the concept that if two fingerprint sensors are interoperable causes of the lack of interoperability [6]. 10 products were then the resulting fingerprint datasets should have similar error tested, where each product consisted of a sensor paired with an rates compared to error rates of fingerprint datasets collected algorithm capable of enrollment and verification. Native and from any one of the single fingerprint sensors. The framework interoperable False Accept Rate (FAR) and False Reject Rate for testing interoperability of fingerprint sensors is based on (FRR) were computed for all datasets. Mean FRR for genuine three steps: matching scores of 0.92% was observed at FAR of 1%. The 1. Statistical analysis of basic fingerprint features. objectives of this test were twofold: to test conformance of the 2. Error rates analysis of native and interoperable fingerprint products in producing biometric information records complying datasets. with ILO SID-0002, and to test which products could 3. Statistical analysis of matching scores of native and interoperate at levels of less than 1% FRR at a fixed 1% FAR. interoperable fingerprint datasets. The results showed that out of the 10 products, only 2 products were able to interoperate at the mandated levels. This framework was evaluated using a dataset of fingerprints NIST conducted the minutiae template interoperability test - collected from 6 fingerprint sensors and the methodology and MINEX 2004 – to assess the interoperability of fingerprint results are discussed in the following sections. templates generated by different extractors and then matched with different matchers. Four different datasets were used which were referred to as the DOS, DHS, POE and POEBVA. The performance evaluation framework calculated FNMR at fixed FMR of 0.01% for all the matchers. Performance matrices were created which represented all FNMR of native and interoperable datasets and provided a means for a quick comparison. Their results showed that FNMR for native 778
  3. 3. Sensor 1 Sensor 2 Sensor 3 Sensor 4 Sensor 5 Sensor 6 Fig. 2. Example Fingerprint Images. Fig. 1. Statistical Analysis Framework for Interoperability V. FINGERPRINT FEATURE ANALYSIS Testing. An important factor to consider when examining interoperability is the ability of different fingerprint sensors to capture similar fingerprint features from the same fingerprint. IV. DATA COLLECTION Human interaction with the sensor, levels of habituation, finger skin characteristics, and sensor characteristics introduce its own The dataset used in this research is a part of KFRIA-DB source of variability. All of these factors affect the consistency (Korea Fingerprint Recognition Interoperability Alliance of fingerprint features of images acquired from different Database). Fingerprints were collected from 100 subjects using sensors. It is important to analyze the amount of variance in 6 different fingerprint sensors. Each subject provided 6 image quality and minutiae count of fingerprints captured from fingerprint images from the right index finger, right middle finger, left index finger, and left middle finger. 2,400 different sensors. This analysis was performed by computing fingerprint images were collected using each sensor. Table I has image quality scores and minutiae count for all fingerprint a specifications overview for all the fingerprint sensors used in images using a commercially available software. Table II the study. shows the average values for image quality scores and minutiae counts for all the datasets. Table I. Sensor Specifications Table II. Average Image Quality Scores & Minutiae Count Sensor Technology Resolution Interaction Image Size Type (DPI) Type Fingerprint dataset Quality Score Minutiae Count Sensor 1 Thermal 500 Swipe 360 X 500 Range 0-100 Sensor 2 Optical 500 Touch 280 X Sensor 1 15.24 94.13 320 Sensor 2 74.97 45.89 Sensor 3 Optical 500 Touch 248 X Sensor 3 71.15 38.77 292 Sensor 4 Polymer 620 Touch 480 X Sensor 4 6.62 52.44 640 Sensor 5 68.92 39.21 Sensor 5 Capacitive 508 Touch 256 X Sensor 6 62.58 31.25 360 Sensor 6 Optical 500 Touch 224 X 256 The datasets for Sensor1 and Sensor4 showed very low quality scores. It should be noted that the background of the All analysis for this study was performed on raw fingerprint images captured from Sensor4 had a very dark background images collected from the 6 sensors. Sensor characteristics like which could have contributed for a very low quality score. Also capture area, capture technology, aspect ratio etc. have an Sensor1 and Sensor4 were different technologies compared to influence on the resulting image. Fig. 2 shows sample images the other sensors which are more commonly available. An collected from the different sensors. analysis of variance (ANOVA) was performed on all the datasets to test the differences in the mean count of image quality and minutiae count between all the datasets. The hypothesis stated in (1) was tested using the ANOVA test. 779
  4. 4. symmetric process, the matrix can be viewed as a symmetric H10: µ1 = µ 2 = ……..= µ 6 matrix as well. H1A: µ1 ≠ µ2 ≠ ……..≠ µ 6 (1) The FNMR matrix for 0.1% fixed FMR showed a varying range of FNMR for native and interoperable datasets. All the native datasets had a significantly lower FNMR compared to the The p-value of this hypothesis test was computed to be less interoperable datasets. For example, S4 showed a native FNMR than 0.05 which indicated that all the mean scores were of 0.1% and lowest interoperable FNMR of 35% which statistically significant in their differences. The same indicates a very low level of interoperability between the hypothesis test was conducted on minutiae count for all the datasets. When the FNMR are analyzed in the context of image datasets. The p-value of this hypothesis test was calculated to quality scores, it can be seen that sensor 4 had the lowest image be less than 0.05 which indicated that all the mean scores quality scores which indicated it was an important factor in the were statistically significant in their differences. Table II high FNMR of interoperable datasets. The interesting shows that image quality scores for fingerprints collected observation is the relatively low level of FNMR for the native from different sensors were significantly different. Previous dataset of fingerprints captured with S4. It was also observed research has shown that image quality has an impact on that FNMR of interoperable datasets created from fingerprint performance of fingerprint matching systems [8]. The next sensors of the same acquisition technology and interaction type, step of the research involved evaluating the impact of the for example S2 and S3, was comparable to the FNMR of their basic fingerprint feature inconsistencies on performance of native datasets. fingerprint datasets collected from different sensors. Table III. FNMR at fixed 0.1% FMR VI. PERFORMANCE RATES ANALYSIS S1 S2 S3 S4 S5 S6 S1 0.8 5.0 8.0 35.0 18.0 7.0 A. Performance Rates: ROC and Error Rates Matrix S2 0 0.6 38.0 2.0 0.12 S3 0.1 38.0 1.9 0.12 In order to evaluate the performance of fingerprint datasets S4 0.1 58.0 32.0 False Non Match Rates (FNMR) were computed for all datasets. S5 0.1 6.0 A commercially available fingerprint feature extractor and S6 0.1 matcher was used to generate FNMR. FNMR were generated for native datasets and interoperable datasets, where the native Table IV. FNMR at fixed 1% FMR dataset refers to the comparison of fingerprint images collected from the same fingerprint sensors, and the interoperable S1 S2 S3 S4 S5 S6 datasets refer to fingerprint images collected from different S1 0 0 0 0 0 0 sensors. The first three fingerprint images provided by the S2 0 0 0 0 0 subject were used to create the enrollment database, and the last S3 0 0 0 0 three images were used to create the verification database. S4 0 0 0 Enrollment databases were created for each of the 6 sensors, and S5 0 0 verification databases were also created for each of the 6 sensors. Matching scores were generated by comparing every S6 0 enrollment template from each enrollment database against every fingerprint image from each verification database, which resulted in a set of scores S for every combination of enrollment Detection Error Tradeoff (DET) curves were also generated for and verification databases, where all the datasets. DET curves are a modification of Receiver Operating Characteristic (ROC) curves. ROC curves are a S ={Eix,Vjy,scoreijxy} means of representing results of performance of diagnostic, i= 1,.. , number of enrolled template detection and pattern matching systems [9]. A DET curve plots j = 1,.., number of verification images FMR on the x-axis and FNMR on the y-axis as function of x= 1,.., number of enrollment datasets decision threshold. DET curves for different combinations of y=1,…, number of verification datasets enrollment/verification databases allow comparison of error scoreijxy = match score between enrollment template and rates at different thresholds. verification image DET (T) = (FMR (T), FNMR (T)) Using this set of scores, a FNMR matrix was generated. (2) FNMR was calculated for each set of scores at a fixed False where T is the threshold Match Rate (FMR) of 0.1% and 1%. The results are shown in Fig. 3 shows three superimposed DET curves. It can be Table III and IV. The diagonal of the FNMR matrix are rates observed that the DET curve for the interoperable dataset for S2 for the native datasets, and the cells off the diagonal are rates for and S3 performs worse than the other two native datasets at the interoperable datasets. Since matching of fingerprints is a every possible threshold. Fig. 4 also shows three superimposed 780
  5. 5. DET curves. The DET curve for the interoperable dataset for S2 performance of interoperable datasets should be statistically and S4 shows its performance is much worse compared to the similar to performance of native dataset. An ANOVA test was native datasets. Looking at the DET curves for native datasets in used to test for differences in the mean genuine matching scores Fig. 3 and Fig. 4, the difference in performance between the between the native dataset and the interoperable datasets at a native datasets is comparable. But the difference in performance significance level of 0.05. This test was performed for each of between the interoperable datasets is significantly different. the six native datasets, which resulted in 6 sets of hypothesis as This indicates the unpredictable nature of determining stated in (3). performance of the interoperable datasets based entirely on performance of native datasets. H20: µnative = µ interoperable1 = ……..= µ interoperable5 H2A: µnative ≠ µ interoperable1 ≠ ……..≠ µ interoperable5 (3) The ANOVA test for all six hypothesis had a p-value of less than 0.05, which resulted in rejecting the null hypothesis and concluding that native genuine matching scores were significantly different compared to interoperable matching scores. In several experiments such as this, one of the treatments is a control and the other treatments are comparison treatments. A statistical test can be devised which compares different treatments to a control. Such a statistical test can be performed using the Dunnett’s test, which is a modified form of a t-test [10]. For this particular experiment, the native dataset genuine match scores were used as the control and the interoperable dataset genuine match scores as the comparison treatments. For Fig. 3. DET Curve for S2 and S3 datasets. each native dataset, there were 5 control treatments which corresponded to interoperable datasets. The mean genuine match score for each interoperable database was tested against the control (i.e. native database score). According to the Dunnett’s test, the null hypothesis H0: µnative = µinteroperable is rejected at α = 0.05 if 1 1 | yi. ya. | dα(a 1, f ) MSE ( ) (4) ni na where dα (a-1,f) = Dunnet’s constant a-1=number of interoperable datasets f = number of error degrees of freedom MSE = mean square of error terms ni = number of samples in control na= number of samples in interoperable set a Fig. 4. DET Curve for S2 and S4 datasets. The Dunnet’s test was performed on all the possible B. Statistical Analysis of Matching Scores combinations of native and interoperable datasets. The Dunnet’s test showed all of the genuine matching scores of the The DET curves and FNMR matrix provide an insight into interoperable datasets were different compared to the genuine any existing differences in FNMR between native and matching scores of the control dataset. Table IV. shows the interoperable databases, but they do not provide a statistical average genuine matching score of the control dataset and the basis for testing the differences. A statistical analysis of the average genuine matching score of the interoperable dataset results could help uncover underlying patterns which contribute which had the least absolute difference with the control. to the unpredictability observed in comparison of the DET An evaluation of results in Table IV shows that S2 and S3 had curves. To assess interoperability at matching score level, the the best interoperable genuine matching scores. When the matching scores from the genuine comparisons of native dataset interoperable matching rates are analyzed in context of image were compared to matching scores from genuine comparisons quality scores and minutiae count, S2 and S3 had the least of interoperable datasets. For true interoperability, absolute difference between their image quality scores and minutiae counts. Combining these results provides a positive 781
  6. 6. indicator for improving predictability of FNMR for uses basic fingerprint features as predictor variables and interoperable datasets. matching scores as response variables is another avenue of future work. Understanding the effect of these predictor variables on interoperable matching scores could be used to Table V. Matching Scores for Control Dataset and create a model which is capable of describing the interactions Interoperable Dataset and effects. Average Matching Score Interoperable Dataset with ACKNOWLEDGMENT Control Dataset Least Difference Sensor 1 Dataset- 319.3 Sensor 3- 294.2 The authors would like to thank KFRIA (Korea Fingerprint Recognition Interoperability Alliance) for supporting this Sensor 2 Dataset- 749.2 Sensor 3- 609.2 research and providing the fingerprint database for analysis. Sensor 3 Dataset- 789 Sensor 2- 609.2 REFERENCES [1] IBG, Biometrics Market and Industry Report. 2007, IBG: NY. p. Sensor 4 Dataset- 575.5 Sensor 3- 281.6 224. [2] Haas, N., S. Pankanti, and M. Yao, Fingerprint Quality Sensor 5 Dataset- 631.9 Sensor 3- 390.5 Assessment. In Automatic Fingerprint Recognition Systems. 2004, NY: Springer-Verlag. 55-66. [3] Jain, A. and A. Ross, eds. Biometric Sensor Interoperability. Sensor 6 Dataset- 652.7 Sensor 2- 521.9 BioAW 2004, ed. A. Jain and D. Maltoni. Vol. 3067. 2004, Springer-Verlag: Berlin. 134-145. [4] Ko, T. and R. Krishnan. Monitoring and Reporting of Fingerprint Image Quality and Match Accuracy for a Large User Application. VII. CONCLUSIONS AND FUTURE WORK in Applied Imagery Pattern Recognition Workshop. 2004. Previous research has shown image quality has a significant Washington, D.C.: IEEE Computer Society. [5] Han, Y., et al. Resolution and Distortion Compensation based on impact on performance of native fingerprint datasets, and this Sensor Evaluation for Interoperable Fingerprint Recognition. in research showed that image quality and minutiae count have an 2006 International Joint Conference on Neural Networks. 2006. impact on performance of interoperable fingerprint datasets. Vancourver, Canada. The type of capture technology did not have a consistent effect [6] Campbell, J. and M. Madden, ILO Seafarers' Identity Documents Biometric Interoperability Test Report 2006, International Labour on FNMR of interoperable fingerprint datasets which was Organization: Geneva. p. 170. noticed in the difference in FNMR between datasets S2 and S3, [7] Modi, S., S. Elliott, and H. Kim. Performance Analysis for Multi and S2 and S4. It is important to understand the effect of these Sensor Fingerprint Recognition System. in International Conference on Information Systems Security. 2007. Delhi, India: factors since they can be used to reduce the unpredictability of Springer Verlag. performance of interoperable datasets. Interoperability is [8] Elliott, S.J. and S.K. Modi. Impact of Image Quality on dependent on several factors, and this research uncovered Performance: Comparison of Young and Elderly Fingerprints. in 6th International Conference on Recent Advances in Soft important factors and illustrated its significance using statistical Computing (RASC). 2006. Canterbury, UK. tests and analysis methodologies. The results of these findings [9] Mansfield, A. and J. Wayman, Best Practices. 2002, National can be used in designing fingerprint matching algorithms which Physics Laboratory: Middlesex. p. 32. [10] Montgomery, D.C., Design and Analysis of Experiments. 4th ed. specifically take advantage of this new knowledge. 1997, New York: John Wiley & Sons. 704. The results discussed in this paper indicate several avenues of research which could be followed to improve the statistical analysis framework. Along with comparison of genuine matching scores using the Dunnet’s test, a comparison of proportions can also be applied to statistically test the FNMR between native and interoperable datasets. This would add one more test to collection of interoperability tests. Application of this framework to a different modality would also be an interesting study. In this research the framework was applied exclusively to interoperability tests for fingerprint recognition and it helped in synthesizing the results in a novel way. Other biometric modalities will be facing the same problems related to interoperability as those by fingerprint recognition, and it will become imperative to understand these issues and try to solve them. Application of this framework to other modalities could provide ideas into solving the problems of interoperability in a larger context. An investigative multivariate analysis which 782