Keystroke Dynamics Verification Using a Spontaneously Generated
                                   Password

         Shim...
twice. The objective of these experiments was to study the             per second the counter is updated. The resolution o...
2.   The base reference for successive keystroke                           plausible
         combinations and the keystro...
8                             16
            if |r-v| < threshold
                        accepted as genuine user
       ...
Shape Difference                                                                      g= Gprev + Wi
                      ...
research should be carried out to tap into the benefits that
keystroke dynamics may offer.

                      VI. REFE...
Upcoming SlideShare
Loading in …5
×

(2006) Keystroke Dynamics Verification Using a Spontaneously Generated Password

862 views

Published on

Current keystroke dynamics applications have tackled the problem of traditional knowledge-based static password verification, but the problem of spontaneous
password verification persists. The intent of this study was to examine the predictive strength of typing patterns for spontaneous passwords. The typing patterns of an individual typing at a DELL® keyboard on a DELL OptiPlex® GX260 machine were recorded. Variables collected included keystroke press time and keystroke latency. Computed performance measures included false match rates (FMR) and false non match rates (FNMR) at various threshold levels.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
862
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
34
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

(2006) Keystroke Dynamics Verification Using a Spontaneously Generated Password

  1. 1. Keystroke Dynamics Verification Using a Spontaneously Generated Password Shimon Modi Dr. Stephen J. Elliott Industrial Technology, Associate Professor, Industrial Technology Purdue University Purdue University West Lafayette, IN 47907 West Lafayette, IN 47907 USA USA their passwords persists. A keystroke dynamics solution that Abstract - Current keystroke dynamics applications have does not require users to remember their passwords would be tackled the problem of traditional knowledge-based static a better option for commercial and business users than password verification, but the problem of spontaneous currently commercially available keystroke dynamics password verification persists. The intent of this study was to solutions. The intent of this study is to examine the predictive examine the predictive strength of typing patterns for strength of typing patterns for spontaneous passwords. The spontaneous passwords. The typing patterns of an individual typing patterns of an individual typing at a DELL® keyboard typing at a DELL® keyboard on a DELL OptiPlex® GX260 on a DELL OptiPlex® GX260 machine are recorded. machine were recorded. Variables collected included Variables collected include keystroke press times and keystroke press time and keystroke latency. Computed keystroke interval times. Performance measures included performance measures included false match rates (FMR) and false match rates (FMR) and false non match rates (FNMR) false non match rates (FNMR) at various threshold levels. computed at various threshold levels. Index Terms —keystroke dynamics, biometrics, spontaneous password, static verification, dynamics II. BACKGROUND verification. Keystroke dynamics traces its roots to the telegraph. When the telegraph was the preferred means of communication, I. INTRODUCTION telegraphers could transmit messages across long distances; they came to recognize their sending colleagues by their Computer systems are now used in almost all commercial, unique and familiar rhythm and timing. Keystroke dynamics is industrial, and individual activities. Businesses rely heavily on the process of authenticating individuals based on their typing the effective operation of computer systems to ensure their patterns. This premise stems from the observation that similar business run smoothly and successfully. Corporations neurophysical factors that make written signatures unique safeguard intellectual property, customer transactions, also make typing patterns unique [3]. The latencies between employee information, and sensitive data from unauthorized successive keystrokes and the keystroke press time can be access, which might result in intellectual property and used to create a unique template for an individual. financial loss. In a 2001 survey conducted by the Computer Keystroke dynamics, as it is known today, was first Security Institute and the San Francisco Federal Bureau of proposed in a RAND report in 1980, as part of an NSF-funded Investigation’s Computer Intrusion Squad, 34 respondents project to authenticate individuals. Although this approach reported financial losses of $151,230,100 due to loss of used an external device, it proved that keystroke dynamics proprietary information. Traditionally computer security for could be used to identify individuals in a 1:1 setting. Patents personal system accounts has depended on knowledge- were filed in 1985 and issued in 1989 for key interval timing. based passwords or unique tokens possessed only by the Gaines et al described the RAND experiment in which 7 individual authorized to access those accounts. Keystroke secretaries were recruited as subjects [4]. Each secretary was dynamics is a concept to address the inherent weakness of asked to type three passages, each comprised of 300-400 knowledge-based passwords. The current keystroke words, at two different sessions held four months apart. The dynamics solution only works with static passwords. This keystroke interval time was recorded for the experiment, and concept adds a layer of security to password protection, but t-tests were carried out to check if the means of the keystroke the user is still obligated to remember the password. The interval times were the same at the two sessions. password protection scheme is less vulnerable, but it is not The RAND experiment yielded encouraging results, but the inviolable. Keystroke dynamics is the only behavioral small size of the study’s participant pool was a drawback. biometric that works as a software-based solution and does Umphress and Williams conducted two different experiments not require additional hardware other than what accompanies that involved more subjects than the RAND experiment [5]. a regular computer. In the past, research in the field of The first experiment included 17 programmers typing one keystroke dynamics has focused on static verification. There passage of 1400 words and another passage of 300 words. is no argument that static verification reduces the success The 1400-word passage served as a template and the 300- rate of an imposter gaining unauthorized access under the word passage was used as a verification attempt. The second genuine identity, but the problem authorized users forgetting experiment included 36 subjects typing a 530-word passage 1
  2. 2. twice. The objective of these experiments was to study the per second the counter is updated. The resolution of the mean of the keystroke interval timings for a larger population. timing can be detailed to the level of a microsecond. The results showed that if 5 keystroke interval timings were used, the false reject rate was approximately 30% and the false accept rate was approximately 17%. The best results were obtained using the keystroke interval timings for Type group of 15 words lowercase keystrokes. The false reject rate was approximately 5.5% and the false accept rate was approximately 5%. Garcia introduced the term “electronic signature” in the patent for authenticating an individual. His approach had the subjects type their own name, since there is less variance in Completed the keystroke interval timings. First, participants typed their 10 times names a number of times, which created a template using the mean of the keystroke interval times and a covariance matrix of the keystroke interval times. When the individuals sought verification, they would be required to type their name and a Type spontaneous password verification vector would be created. If the verification vector was statistically similar to the template vector, then the attempt would be classified as an authentic attempt. Haider, Abbas and Zaidi describe a multi-technique approach rather than merely a statistical approach. They used fuzzy logic, neural networks, statistical techniques, and Finish combinations of these strategies to verify individuals [6]. They had all the subjects type in a 7-letter password and then used all of their technical strategies to verify individuals. Figure 1. Protocol for enrollment using the keystroke Joyce and Gupta describe a classifier based on comparing timing recording software signatures shapes for verification. They used keystroke press times and keystroke interval times as measures in their experiment. They achieved results of 16.67% FRR and 0.25% A. Data Extraction FAR [7]. Data extraction was performed on data collected from each III. DATA COLLECTION valid participant’s enrollment data. The group of 15 different words, plus the spontaneously generated password word, The population for this study consisted of 18 to 25 year-old which for this case happened to be the word “keystroke”, was students at Purdue University, West Lafayette, Indiana, part of the data extraction pool. The first step in the process U.S.A. Data was collected from 42 subjects; data from two was determining the number of common successive subjects were not used because of system malfunction. Each keystrokes from the group of words that made up the subject for this study was required to follow the same testing keystroke interval times. Several combinations of successive protocol. Each subject typed in a group of 15 words 10 times. keystrokes and individual keystrokes had more than a single This resulted in 150 attempts that had only 15 unique words. instance. For example, there were two keystroke latencies for The subject never typed the same word in succession. After “tr”, first from the word “abstract” and the second from the the subject finished typing all the words, the subject was word “petrol”. The test of normality failed when it was carried instructed to type in a word that had not be typed before as out on a data set containing each successive keystroke part of the study. This word was created from a combination combination and individual keystrokes. The raw data were of the 15 words that had been typed earlier, and served as then transformed using the natural log function, but the data the spontaneous or dynamic word for this study. The typing set still failed the test of normality, so it was decided to pattern of the last word, also referred to as the spontaneous maintain the dataset in its raw format. Results from the password, was used to validate the ability to predict typing Kruskal-Wallis non-parametric test for similarity between patterns from the group of 15 words. groups containing multiple instances of the same successive The keystroke timing recording software created with the keystroke combination and individual keystrokes showed that study’s programming form was the same for all the subjects. the different groups were not similar. The computer platform used the Microsoft® Windows® XP The next step required selection of a particular instance operating system. If keystroke timing recording software from the multiple instances of successive keystroke malfunctioned during a session, the session was aborted and combinations and keystroke letters to create the template. the subject was not obligated to participate in the study again. Two different criteria were used for selecting the base All the data collected during that aborted session were reference for successive keystroke combinations and destroyed at the session’s end. The keystroke timing keystroke letters: recording software used the operating system’s 1. The base reference for successive keystroke QueryPerformanceCounter API to get a precise measurement combinations and the keystroke letters be at the of the keystroke timing because this API’s frequency same offset (position) from the beginning of the component is dependent on the CPU frequency. The dynamic word chosen — keystroke. frequency component basically measures how many times 2
  3. 3. 2. The base reference for successive keystroke plausible combinations and the keystroke letters be in the theater t 2 2*10=20 same region relative to the dynamic word — existance keystroke. For example, the combination “ke” occurs fingerprint r 1 1*10=10 in the 7th position from the start of the dynamic word, “keystroke”. The combination “ke” occurs in the 5th karaoke o 2 2*10=30 position from the start of the word “karaoke”. Since petrol both of the combinations are near the end of their karaoke k 3 3*10=30 respective words, the decision was used to make kennel “ke” from the word “kennel” the base reference. 1 1*10=30 existance e The SAS/STAT® PROC MIXED procedure was used to IV. CLASSIFIERS AND RESULTS locate different instances of keystroke latencies and keystroke press times that were statistically similar to the base reference. An α level of .05 was used for this test. This This section describes the classifiers that were procedure is specifically designed for analysis of repeated implemented in order to test the efficiency of the classification measures and assumes that data from subjects are performed on the final dataset. The efficiency of the statistically independent [8]. classification algorithm was judged in terms of FAR and FRR. To prove independence of data samples for the same subject, a visual analysis of scatter plots of timings against the B. Classifier 1 attempts was performed. The conclusion from the visual analysis of the scatter plots was that there was no habituation This classifier was implemented to test the efficiency of the between different attempts from the same individual, and that classifier for spontaneous password verification using the data could be considered to be independent samples. keystroke dynamics. In the ensuing discussion, M is the mean The final enrollment dataset consisted of keystroke vector, T is the timing vector from the single verification latencies and keystroke press times that were chosen, as attempt, and K, E, Y, S, T, R, O, K1, E1, KE, EY, YS, ST, TR, indicated Table 1 and Table 2, based on results of the PROC RO, OK, KE1 are the 17 timing vectors where the number of MIXED procedure. elements in each vector was decided based on results shown in Table 1 and Table 2. The template for the classifier was calculated for each individual using the following steps: Table 1. Selection of instances of keystroke latencies Bank Keystroke Total Total 1. Calculate M for the enrollment dataset where M =(µk , µe , µy Of latencies Instances data , µs , µt , µr , µo , µk1 , µe1 , µke , µey , µys , µst , µtr , µro , µok , µke1 Words chosen Selected points ). kennel ke 1 1*10=10 n vineyard ey 1 1*10=10 µk = 1/n ∑ i =1 Ki (1) systematic ys 1 1*10=10 systematic st 2 2*10=20 2. Calculate Rk where existance n ∑ abstract tr 2 2*10=20 (2) Rk = 1/n ( | µk – Ki | ) petrol i =1 petrol ro 1 1*10=10 karaoke ok 1 1*10=10 Repeat the same calculation for Re , Ry , Rs , Rt , Rr , Ro , Rk1 karaoke ke 1 1*10=10 , Re1 , Rke , Rey , Rst , Rtr , Rro , Rok , Rke1. Table 2. Selection of instances of keystroke press times 3. Calculate r where r = Rk + Re + Ry + Rs + Rt + Rr + Ro + Rk1 + Re1 Keystroke Total Total + Rke + Rey + Rst + Rtr + Rro + Rok + Rke1. (3) Bank press Instances data of times Selected points 4. Calculate verification value v where Words chosen 17 ∑ karaoke k 3 3*10=30 (4) v= | Mi – Ti | kennel i =1 ambivalent kennel e 2 2*10=20 The reference value for all 40 genuine individuals was organize calculated. For every genuine individual, there were 39 other vineyard y 1 1*10=10 attempts that were imposter attempts. The absolute value of abstract s 2 2*10=20 the difference between the reference value R and test value for verification attempt V was calculated: |r-v|. The FAR and existance FRR were calculated using the rule: 3
  4. 4. 8 16 if |r-v| < threshold accepted as genuine user g= ∑i =1 (|Ii -R'i |) W1i + ∑j =9 (|Ij -R'j |) W2j (7) else W1i = |I i+1 - Ii| / max (|I i+1 - Ii|) rejected as imposter W2j = |I j+1 - Ij| / max (|I j+1 - Ij|) The results computed for Classifier 1 are shown in Table 3. Table 3. Performance Results of Classifier 1 Threshold FAR FRR 4. Calculate the vector of slopes T' from the verification attempt where .05* r .47% 94.87% T' = (Tk – Te , Te – Ty , Ty – Ts , Ts – Tt , Tt – Tr , Tr – To , To – Tk1 , Tk1 – Te1 , Tke – Tey , Tey – Tys , Tys – Tst , (8) .1* r .94% 94.87% Tst – Ttr ,Ttr – Tro , Tro – Tok , Tok – Tke1 ). .15*r 1.95% 94.87% .2 *r 2.83% 94.87% 5. A measure of the difference of shapes between I and T was calculated using the formula: 8 16 The results from the classification performed by Joyce and Gupta showed a FRR of 13.3% and a FAR of 0.17% [8]. They v= ∑ i =1 (|Ii-T'i |) W1i + ∑j =9 (|Ij -T'j |) W2j (9) had used a threshold of 1.5 standard deviation from the mean. The results from their experiment showed their W1i = |I i+1-Ii| / max (|I i+1-Ii|) classification method for static password keystroke dynamics W2j = |I j+1-Ij| / max (|I j+1-Ij|) verification could be used with a high level of confidence. The reference value for all 40 individuals was calculated. C. Classifier 2 For every genuine attempt, there were 39 imposter attempts. The measure of difference |g-v| between all the attempts was This classifier was specifically implemented to take into calculated. The FAR and FRR were calculated using the rule: consideration the difference in shapes of typing patterns of If ( |g-v| < threshold) then individuals. According to Joyce and Gupta, “Digital signatures accepted as genuine user often show sharp changes between successive latencies as a else result of an individual’s unique typing pattern.” The shape of rejected as imposter the typing pattern is defined as the difference in successive keystroke latencies and keystroke press timings (Figure 2). The results computed for Classifier 2 are shown in Table 4. For the purpose of this classifier, the difference in successive keystroke latencies and keystroke press timings can also be Table 4. Performance Results of Classifier 2 called the slopes of shape. Since the difference in slopes Threshold FAR FRR between the reference shape and the test shape were .1* r 3.03% 97.43% considered as distinctive features of the typing pattern, the difference in slopes between the reference shape and the test .2 *r 7.22% 97.43% shape were weighted by the amount of slope change in reference shape [9]. The template for this classifier was The results from the classification performed by Joyce and calculated for each individual using the following steps: Gupta showed a FRR of approximately 5% and a FAR of approximately 5% [7]. They had used a threshold of two 1. Calculate the vector of slopes I from mean vector M, where standard deviations from the mean. The results from their I =( µe – µk , µy – µe , µs – µy , µt – µs , µr – µt, µo – (5) experiment showed their classification method for static µr, µe1 – µk1 , µey – µke , µys – µey , µst – µys , µtr – password keystroke dynamics verification could be combined µys , µro – µtr , µok – µro , µke1 – µok). with other classification methods with a high level of confidence. 2. Calculate the vector of slopes R' where R' = (Re – Rk , Ry – Re , Rs – Ry , Rt – Rs , Rr – Rt , Ro – Rr , Re1 – Rk1 , Rey – Rke , Rys – Rey , Rst – Rys , (6) Rtr – Rys , Rro – Rtr , Rok – Rro , Rke1 – Rok ). 3. A measure of difference of shapes between I and R' was calculated using the formula: 4
  5. 5. Shape Difference g= Gprev + Wi Gprev = g 0.7 end. 0.6 3. Calculate vector T' where 0.5 T' = ( |µk – Tk| , |µe – Te| , |µy – Ty| ,| µs – Ts| , |µt – Tt| , |µr – Tr| , |µo – To| , |µk1 – Tk1| , |µe1 – Te1| , |µke – (12) 0.4 Mean Vector M Tke| , |µey – Tey| , |µys – Tys| , |µst – Tst| , |µtr – Ttr| , |µro Time Genuine Test Vector T – Tro| , |µok – Tok| , |µke1 – Tke1|) 0.3 Imposter Test Vector T 0.2 4. Calculate value y for each attempt using the following rule: 0.1 y=0; Yprev =0; for i=1 until 17 0 ke ey ys st tr ro ok ke if (T'i < Si) then Figure 2. Using slopes to measure shape difference for y= Yprev + Wi keystroke latencies Yprev = y end. The reference value for all 40 individuals was calculated. D. Classifier 3 For every genuine attempt, there were 39 imposter attempts. The measure of difference |g-y| between all the attempts was Classifiers 1 and 2 take into account the mean of the calculated. The FAR and FRR were calculated using the rule: keystroke press times and keystroke latencies, and the if (|g-y| < threshold) then differences between shapes of the keystroke typing patterns. accepted as genuine user They do not take into account the variability that might be else associated with the individual keystroke press times and rejected as imposter keystroke latencies. This classifier was created to take into account the variabilities of keystroke press times and keystroke latencies that could be unique to the genuine The results computed for Classifier 3 are shown in Table 5. individual. The vector of standard deviations S was calculated using the enrollment dataset for each individual where S = (σ k Table 5. Performance Results of Classifier 3 , σ e , σ y , σ s , σ t , σ r , σ o , σ k1 , σ e1 , σ ke , σ ey , Threshold FAR FRR σ ys , σ st , σ tr , σ ro , σ ok , σ ke1). .1*g .33% 94.87% The vector weighted values W was calculated using the following steps: V. CONCLUSION 1. Calculate vector L where L = (|S|/σ k , |S|/σ e , |S|/σ y , |S|/σ s , |S|/σ t , |S|/σ r Previous research concerning static verification resulted in , |S|/σ o , |S|/σ k1 , |S|/σ e1 , |S|/σ ke , |S|/σ ey , |S|/σ ys , (10) performance acceptable as an additional layer of security to |S|/σ st , |S|/σ tr , |S|/σ ro , |S|/σ ok , |S|/σ ke1). the knowledge-based password. The dynamic verification classifiers implemented as part of this study gave a lowest 2. Calculate vector W where FRR of approximately 94%, which is not acceptable in a real- W = ( (|S|/σ k)/|L| , (|S|/σ e)/|L| , (|S|/σ y)/|L| , world scenario. This study, in its present form, did not indicate (|S|/σ s)/|L| , (|S|/σ t)/|L| , (|S|/σ r)/|L| , (|S|/σ o)/|L| , (11) that keystroke interval times and keystroke press times can (|S|/σ k1)/|L| , (|S|/σ e1)/|L| , (|S|/σ ke)/|L| , be used to verify an individual who types a spontaneous (|S|/σ ey)/|L| , (|S|/σ ys)/|L| , (|S|/σ st)/|L| , (|S|/σ tr)/|L| , password. Nonetheless, this failure should not be considered (|S|/σ ro)/|L| , (|S|/σ ok)/|L| , (|S|/σ ke1)/|L |). as a deterrent for further research in this area. There are several recommendations for further research. One is to The template for the classifier was calculated using the conduct outlier analysis as part of the enrollment procedure. following steps: Commercial products that use keystroke dynamics for static password verification use outlier analysis as part of the 1. Calculate vector R where enrollment procedure; the subject is asked to type the words R = (Rk , Re , Ry , Rs , Rt , Rr , Ro , Rk1 , Re1 , Rke , Rey , Rys , Rst , used in enrollment more than the required number of times if Rtr , Rro , Rok , Rke1). their typing patterns are deemed to be too variable. Instead of using a completely spontaneous password, use a variation of 2. A reference value g was calculated using the following rule: a word that the subject is habituated to typing as the g=0; spontaneous password. Further research can be conducted Gprev =0; to examine the similarity of keystroke latency times and for i=1 until 17 keystroke press times that have the same instance in different if (Ri < Si ) then locations of the word. Verification techniques using spontaneous passwords hold significant potential and further 5
  6. 6. research should be carried out to tap into the benefits that keystroke dynamics may offer. VI. REFERNCES [1] M. Bishop, Computer Security. Boston: Addison- Wesley, 2003. [2] X. Ke., A. Peacock, and M. Wilkerson, “Typing Patterns: A Key to User Identification,” IEEE Security & Privacy, vol. 2, no. 5, pp. 40-47. [3] F. Monrose, and D. Rubin, “Keystroke Dynamics as a Biometric for Authentication,” Future Generation Computing Systems (FGCS) Journal: Security on the Web, vol. 16, no. 4, pp. 351-359, March 2000. [4] R. Gaines, W. Lisowski, S. Press, and N. Shapiro, “Authentication by keystroke timing: Some preliminary results,” Rand Report R-256- NSF. Rand Corporation, Santa Monica, CA, 1980. [5] D. Umphress, and G. Williams, “Identity verification through keyboard Characteristics,” Int. J. Man-Machine Studies, vol. 23, no. 3, pp. 263-273, Sept. 1985. [6] A. Abbas, S. Haider, and K. Zaidi, “A Multi-Technique Approach for User Identification through Keystroke Dynamics,” in Proc. IEEE Int. Conf. Systems, Man, and Cyberneitcs, vol. 2, 2000, pp.1336-1341. [7] R. Joyce, G. Gupta, “Identity authorization based on keystroke latencies,” Commun. ACM, vol. 33, no. 2, pp. 168–176, 1990. [8] M. Chang, and R. Wolfinger, “Comparing the SAS GLM and Mixed Procedures for Repeated Measurements Analysis,” SUGI Proceedings. [9] S. Bleha, C. Slivinksy , and B. Hussein ,“Computer- access security systems using keystroke dynamics,” IEEE Trans Pattern Anal Mach Intelligence, vol. 12, no. 12, pp.1217–1222, Dec. 1990. 6

×