Ecological Validity is the degree to which the behaviors observed and recorded in a study reflect the behaviors that actually occur in natural settings. The more control over subjects in a study, the less ecological validity and thus, the lessthey may be able to generalize.
Digital collaboration with Microsoft 365 as extension of Drupal
On the ecological validity of a password study
1. O F A PAS S W O R D S T U D Y
ECOLOGICAL VALIDITY
On the
Alexandria Farar
2. WHAT IS ECOLOGICAL VALIDITY?
Definitions:
http://www.thefreedictionary.com/ecological
The relationship between organisms
and their environment.
3. WHAT IS ECOLOGICAL VALIDITY?
Definitions:
How well a study can be related to
or reflects everyday, real life.
http://holah.co.uk/page/ecologicalvalidity/
4. WHAT IS ECOLOGICAL VALIDITY?
Definitions:
http://www.alleydog.com/glossary/definition.php?term=Ecological%20Validity
http://study.com/academy/lesson/ecological-validity-in-psychology-definition-lesson-quiz.html
Experimental
Control
Ecological
Validity~
5. MOTIVATION
• Problems with Ecological Validity in Password Studies
• Complex & Difficult to Quantify
• Hard to Study ~ Lack of “Ground Truth”
6. BACKGROUND
• Studies on Password Security & Usability
Real Leaked / Stolen PasswordsUser Studies
Real World DataControlled
7. BACKGROUND
• Studies on Password Security & Usability
Types of User Studies
Online Surveys
• Increase sample size & diversity
Laboratory Studies
• Not in natural environment
• Aware of being studied
Pen & Paper-based
8. METHODOLOGY
Study Design
Five unique passwords stored
Asymmetric cryptography
Password Decryption
Five University-wide services
IDM
Email
Wifi
Campus Login
Single Sign-on (SSO)
Anonymized dump of decrypted passwords
Password
Policy
IDM
Identity Management
• Study design mirrored
enrollment process
• Able to compare study
passwords to real passwords
• Student role-play
• Informed consent
9. METHODOLOGY
• Analysis conducted offline without demographic information
• Account information never revealed for real or study
passwords
• Results of password data analysis linked to demographic data
• Results shared with Privacy Officer before publication
Study Design
10. METHODOLOGY
• Study Design
Do passwords generated by participants asked to role play
a scenario in which they have to create a password for
fictitious accounts resemble their real passwords?
Do participants behave so differently because of the study
that the results of the study should not be used to make
Inferences about their real behavior?
11. METHODOLOGY
• Study Design
• University password policy required strong passwords
• Independent Variable:
• Openly mentioned study was about passwords
vs. obfuscation
Within-subjects Between-subjects
• Real vs. Study Passwords • Lab vs. online study
• Password priming vs. not
Conditions
12. METHODOLOGY
• Study Design
ROLE
PLAY
Enroll in University
Register for Services
Mirrored University Password Policies
• A password’s minimal length is 8 characters;
its maximum length is 16 characters.
• Password characters are split into four
different groups: Upper and lower case
alphabetical characters, special characters
,.:;!?#%$@+-/_><=()[]{}* and digits.
Passwords that are shorter than 12 characters
must include characters from three of the four
described character groups. Passwords that
are 12 characters or longer must only include
characters from two of the four described
character groups.
• Neither the student’s first/last name nor the
student’s ID number may be part of a
password.
• Users must use different passwords for all
accounts.
University Password Policy
13. METHODOLOGY
Study Design
• 16,500 students invited via email
• Two-part online study creating
online accounts
• 15-20 minute questionnaire
• Second part two days after first
• Raffle 3x100 Euro Amazon
vouchers
• Two introductory texts
(prime, non-prime)
• Prime – important for passwords
to be available
• Participants created accounts as
normal
• Act as if passwords created were
real passwords
• Must login to accounts two days
later to complete
• Redirected to survey after creating
accounts
14. METHODOLOGY
Study Design
• 740 students invited via email
• 68 attended
• Same rules as online study
• Lab environment, PC and
supervised
• First part completed in lab
• Second completed at home
• Incentive – 20 Euros
• Opportunity to ask questions
• Assistance with technical
15. METHODOLOGY
• Password Analysis
Manual Scoring
Categorized participants based on
How similar the metrics study passwords
compared to real ones
User behavior considered
Example:
Study: “PwdIDM11.”, “PwdMail11.”,
“PwdWifi11.”,
“PwdPC11.”
Real: ‘B0ru$$ia09”, “16.Januar”, “(aus- tralien)”,
“314159Pi”
20. METHODOLOGY
• Password Analysis
Full System
Realistic
Similar
Passwords
Single
Unrealistic
Null /
Derogatory
Inconsistent
Password
Composition
Password
length
#Uppercase
#Lowercase
#digits
# spec
char
Entropy
NIST
Entropy
For every password from real accounts and online / lab study
Password strength – John the Ripper, Entropy
22. RESULTS
18.1 Online Accts
Medium IT
Expertise
6.5%
No Forgotten
Password
79.6%
Forgotten 2x
17.4% Account
Abuse
63.2%
Use 2-3
Passwords
14.9%
Different
Password
• Participants
23. RESULTS
• Scoring Evaluation
Hypothesis:
Category Full participants would have the highest correlation of password
composition values between their two password sets of all categories.
• Expected a weaker correlation for category Single and Category System
participants
• No correlation for category Null and Derogatory participants.
24. RESULTS
• Scoring Evaluation
• Found highly significant and strong correlations for participants in score category Full
and mostly significant correlations in categories Single and System.
• No correlation when the entire set of study passwords was analyzed as a whole.
• No correlation for the categories Null and Derogatory.
25. RESULTS
• Scoring Evaluation
Legitimate Categories Regardless of Condition:
(Online, lab, primed, non-primed)
Single, Full, System Participants - behave more realistically in our study than category Null
and Derogatory participants, with category Full participants showing the strongest correlation.
26.5% of our participants even used at least one of their real passwords in the study.
No difference between those conditions with respect to our categorization; it is possible to
compare the differences in password behavior solely on the category irrespective of the
condition.
Scoring consistent: Participants classified to behave consistently between real and study
pass-words by our scoring system did compose their passwords consistently.
Those behaving inconsistently according to our classification produced independent sets of
passwords.
27. EVALUATION
• Online vs. Lab Study
More participants fell into the helpful categories Single, Full and System
compared to our online study (Table 3).
Priming – Null hypothesis that no difference in behavior could not be
Rejected with p=0.4698.
28. RESULTS
Self-reported Values in predicting inconsistent study behavior
Asked participants if they behaved differently
• Different behavior = fewer counts in Full, Single and System; Higher counts in Null
and Derogatory
• Participants who changed their usual behavior for the study obtained significantly
fewer ratings in categories Full, System and Single, and more in Null and Derogatory
than participants who did not self-report
• Participants who said that they use individual passwords for each account also
scored significantly more frequently in categories Null and Derogatory when
participating online
Some reasons for deviation include Distrust, Policy and Lazy.
30. REVIEWS
• Because of the content and the impacts mentioned above, the topic of the paper presents
a novelty, important new knowledge and fits the requirements of the call for paper of this
conference (see http://cups.cs.cmu.edu/soups/2013/cfp.html). Additionaly, the paper treats
the impact of organizational policy or procurement decisions and tuches the same topics as
failed usable security experiments, with the focus on the lessons learned from them.
Furthermore, the must-have criteria, that the work should relate to usability or human
factors and either privacy or security, is fulfilled. The length of the paper does not violate
the rules.
• positive aspects:
- comparison of lab to online and real-world behavior delivers wide coverage.
- compact, very informative evaluation display across multiple aspects of password
studies, including some interesting results (realistic passwords in lab environment).
• negative aspects:
- Password conditions were pretty strict, requiring relatively save passwords from the
get go.
- Perhaps unbalanced set of data between online and offline study, however, this is
a general problem of the two types of studies.
Editor's Notes
Ecological Validity is the degree to which the behaviors observed and recorded in a study reflect the behaviors that actually occur in natural settings.
experimental control and ecological validity. The more control over subjects in a study, the less ecological validity and thus, the less they may be able to generalize.
In most studies there is a trade-off between experimental control and ecological validity. The more control over subjects in a study, the less ecological validity and thus, the less they may be able to generalize.
high ecological validity, then you can generalize the findings of your research study to real-life settings.
low ecological validity, you cannot generalize your findings to real-life situations.
What Impact does User Study Setups Have on the Ecological Validity of these studies?
Informed consent for study and permission to compare fake pw to real account passwords.
After two days, our participants received a personalized email requesting their participation in the previously announced second part of the study. After clicking a link con-
4tained in the email, each participant was asked to log into the same four services as before, using the password they had created two days ago. After three tries, participants could choose to continue to the next service without successfully logging in, in order to not unnecessarily frustrate our subjects. The system recorded whether or not participants succeeded and how many tries each participant failed. Finally, participants completed a second questionnaire asking how they had managed the study passwords.
After two days, our participants received a personalized email requesting their participation in the previously announced second part of the study. After clicking a link contained in the email, each participant was asked to log into the same four services as before, using the password they had created two days ago. After three tries, participants could choose to continue to the next service without successfully logging in, in order to not unnecessarily frustrate our subjects. The system recorded whether or not participants succeeded and how many tries each participant failed. Finally, participants completed a second questionnaire asking how they had managed the study passwords.
Dictionary attack.
We conducted a correlation test within the categories, comparing study password sets with the respective real password sets. We applied the Bonferroni correction that gave us an alpha value of 0.0063. As expected, we found highly significant correlations in category Fullsome significant correlations in categories Single and System and rather random correlation behavior in categories Derogatory and Null. This strongly supports our scoring procedure, while also pointing to the limits of assuming the correlation of the above metrics to be very strong between studies and real passwords.
very useful for studying password behavior.
System - 5.1 % (33) still representing partially valuable password samples.
passwords that showed abnormal and derogatory behavior.