Junyan Wu Healthcare information security control on insider threat proposal
1.
Center
for
Business
Intelligence
and
Analytics
Leidos
Graduate
Fellow
in
Advanced
Information
Systems
–
Junyan
Wu
Proposal:
Healthcare
information
security
control
on
insider
threat
Background
and
Hypothesis:
Currently,
more
and
more
concerns
are
focused
on
the
issue
of
healthcare
security.
The
trends
of
adopting
of
digital
patient
records,
increasingly
used
mobile
devices,
provider
consolidation
and
higher
demand
for
fast
information
exchange
between
patients,
providers
and
payers,
all
point
toward
an
urgent
need
for
better
information
security.
Human
agents
inside
an
organization
have
been
shown
to
be
more
dangerous
than
those
outside
the
organization
because
of
their
intimate
knowledge
of
the
organizational
information
systems
and
access
to
data
during
the
process
of
their
routine
work
[1,2,3,4,5].
According
to
Symantec
and
Ponemon
(2009)[6],
59%
of
ex-‐employees
admit
that
they
have
stolen
confidential
company
data
from
their
company,
such
as
the
customer
contact
information
lists.
The
CSI
Computer
Crime
&
Security
Survey
[7]
shows
that
44%
of
the
respondents
reported
internal
abuse
of
computer
systems,
making
it
the
second
most
frequent
form
of
security
breach,
only
slightly
behind
virus
incidents,
but
well
above
the
29%
of
respondents
who
reported
unauthorized
access
from
external
sources.
According
the
2014
report
from
Breach
Level
Index,
malicious
insiders
stole
more
records
than
outsiders
did
(Fig.
1).
Figure
1.
Source
:
Inforsec
Institute
(2015)
The
DTI/PWC
(2004)
survey
mentions
that
insider
incidents
happened
more
frequently
in
large
companies
than
small
organizations
(Fig.
2).
2.
Figure
2.
Source:
PWC
(2004)
In
situations
with
malicious
insiders,
employees
may
be
angry,
disgruntled,
or
rogue.
They
are
either
on
the
way
out
or
have
already
been
fired
but
still
have
access
to
legally
login.
These
attackers
are
extremely
dangerous
because
they
are
already
familiar
with
their
way
around
the
network
and
can
easily
access
large
amounts
of
information,
without
the
slightest
effort.
In
my
previous
research,
I
investigated
the
company
evaluations
made
by
employees
from
the
Glassdoor
website,
which
shows
employee
attitudes
towards
their
company.
I
found
that
occurrence
of
data
theft
correlated
with
the
low
employee
ratings
of
their
company.
The
University
of
Pittsburgh
Medical
Center
(UPMC)
is
a
global
nonprofit
health
enterprise.
It
is
considered
a
leading
American
healthcare
provider.
On
November
2013,
malicious
insiders
breached
UPMC
data.
As
a
result,
1.6
million
taxpayers
were
affected
by
identity
theft.
After
comparing
the
UPMC’s
rating
on
Glassdoor,
I
found
the
breach
happened
at
a
time
when
the
employee
ratings
were
close
to
the
local
lowest
point
(Fig.
3).
Figure 3. Data theft time and rating trends of UPMC from Glassdoor.
Acxiom
Company
takes
a
strong
position
in
healthcare
marketing.
On
3. September
2014,
malicious
insiders
breached
Acxiom
data.
From
the
rating
trends
shown
on
Glassdoor,
the
time
when
the
breach
occurred
was
close
to
a
local
lowest
point
of
employee
ratings
(Fig
4).
Figure
4.
Data
theft
time
and
rating
trends
of
Acxiom
Company
from
Glassdoor.
The
preliminary
research
shows
insider
data
theft
may
have
some
correlations
with
company
ratings,
which
represents
the
employees’
review
of
their
company.
One
of
my
research
topics
will
focus
on
relationships
between
insider
data-‐breach
events
and
employees’
reviews
including
satisfaction
and
disgruntlement
on
the
social
media.
Based
on
the
above
statement,
the
hypothesis
I
want
to
test
is
whether
employees’
disgruntlement
will
increase
the
events
of
insider
security
breaches
and
data
theft.
To
control
the
insider
threat
from
insiders,
monitoring
employees’
behavior
becomes
more
and
more
important.
Puhakainen
and
Siponen
[8]
provided
direct
evidence
of
how
top
management
actions
in
supporting
the
established
information
security
policy
observed
by
employees
changed
the
attitudes
of
the
employees
and
resulted
in
higher
levels
of
compliance
as
well
as
discussions
on
new
information
security
initiatives
among
the
employees.
Employees
can
create
severe
threat
to
the
confidentiality,
integrity,
or
availability
of
the
IS
through
deliberate
activities
(disgruntled
employee
or
espionage).
In
addition,
they
may
introduce
risks
by
showing
passive
noncompliance
towards
the
security
policies,
laziness,
sloppiness,
poor
training.
They
might
also
lack
motivation
to
protect
the
sensitive
information
of
the
organization
and
its
partners,
clients,
and
customers.
This
has
been
termed
the
‘endpoint
security
problem’
[9].
Email
and
other
electrical
communication
tools
are
ubiquitous
in
today’s
workplace.
To
protect
information
security,
many
employee-‐monitoring
tools
are
built
to
prevent
harmful
activities.
It
is
possible
to
use
output
from
network
auditing
appliances
used
to
monitor
email,
instant
messaging,
social
media
and
web
traffic
to
reveal
psychosocial
factors
that
suggest
increased
insider
threat
risks.
Many
researches
show
that
word
use
frequency
reveals
an
individual’s
personality
[10-‐14]
and
that
those
personality
factors
may
be
used
to
infer
psychosocial
indicators
of
potential
insider
abuse
[15-‐21].
The
five-‐factor
personality
traits
(agreeableness,
conscientiousness,
neuroticism,
extraversion,
and
openness)
4. represent
a
widely
accepted
for
measuring
personality
[22].
Christopher
R.
Brown
et.
2013
uses
personality
factor
detected
from
employees’
email
to
predict
the
insider
threat.
They
use
word
dictionary
containing
27
categories
representing
5
personality
factors
and
statistical
tests
to
find
the
correlation
between
words
and
insider
threats.
However,
this
method
does
not
precisely
predict
malicious
insiders.
Here
I
propose
to
add
Machine
learning
and
bag-‐of-‐words
methods
to
predict
the
malicious
insiders.
My
hypothesis
is
that
through
machine
learning
training
and
bag-‐of-‐words
construction,
the
insider
threat
prediction
by
personality
factors
will
be
more
accurate
than
statistical
tests.
Only
relying
on
personality
factors
to
predict
insider
threats
will
not
be
precise.
These
methods
are
not
sufficient
to
predict
the
person
who
may
be
a
malicious
insider
and
likely
to
breach
security.
Especially
for
cyber
security,
the
disgruntlement
of
healthcare
industry
employees
and
technical
actions
need
to
be
considered.
For
further
systematic
monitoring,
these
three
factors
are
very
important
(Fig,
5).
Figure
5.
Three
factors
need
to
be
considered
into
employee
monitoring
Disgruntled
employees
are
frequently
mentioned
as
a
potential
insider
threat
[23-‐24].
Disgruntled
employees
may
speak
something
bad
about
their
company
on
email
or
other
online
communication
tools.
Carolyn
Holton
et.
(2009)
use
contexts
scrawled
from
intra-‐company
groups
such
like
Vault.com
and
Yahoo!
discussion
groups
to
predict
disgruntled
employees.
To
focus
on
the
healthcare
industry,
here
I
will
scrape
all
the
negative
reviews
of
healthcare
companies
from
the
Glassdoor
website.
To
predict
complaining
sentences,
I
will
use
the
probability
machine-‐learning
model,
which
has
been
proved
to
have
a
better
performance
on
natural
language
classification.
My
third
hypothesis
is
:
the
accuracy
of
probability
machine
learning
model
will
perform
better
than
SVM
in
prediction
of
complaining
sentences
in
the
healthcare
industry.
Technical
action
is
another
important
factor
to
predict
insider
threat.
The
hacking
skills
such
like
how
to
hack
into
a
company
database
or
decipher
the
password
are
likely
to
show
in
the
malicious
employees’
email
or
other
online
communication
tools.
Employers
can
use
such
information
to
find
out
malicious
insiders.
To
detect
these
hacking
languages,
Victor
Benjamin
(2015)
used
an
Employees’
disgruntlement
Email
messages
from
Enron
Co.
Feature
extracted
by
psychological
dictionary
Training
on
Cons
review
posted
by
employees
from
healthcare
industry
Employees’
personality
Technical
action
Training
on
Hacker
community
language
5. unsupervised
neural
network
to
find
out
hacker
language
patterns.
I
will
use
the
probability
machine-‐learning
model
to
predict
hacker
language.
My
fourth
hypothesis
is
:
the
accuracy
of
probability
machine
learning
model
will
be
better
than
ANN
in
prediction
of
Hacker
language.
Technical approach
Data:
Firm employees’ reviews will be gathered from Glassdoor and MedZilla
(Fig.6).
Figure
6.
Pfizer
employee’s
review
on
Glassdoor.
The
security
breach
records
will
be
gathered
from
news
and
some
database
like
BreachAlarm
(Fig
7),
Privacy
Rights
Clearinghouse
(Fig
8),
Breach
Level
Index
(Fig
9)
and
U.S.
Department
of
Health
and
Human
Services
Office
for
Civil
Rights
(Fig
10).
6.
Figure 7. Data breach resources from BreachAlarm.
Figure 8. Data breaches records from Privacy Rights Clearinghouse.
7.
Figure 9. Data breaches records from Breach Level Index.
Figure
10.
Breach
reports
from
U.S.
Department
of
Health
and
Human
Services
Office
for
Civil
Rights.
E-‐mail
messages
from
about
150
senior
level
executives
at
Enron
Corporation
were
made
public
by
the
Federal
Energy
Regulatory
Commission
as
part
of
an
investigation
into
alleged
energy
price
manipulation
by
the
firm.
Emails
will
be
divided
into
insider
threat
samples
and
no
threat
samples.
http://www.cs.cmu.edu/~enron/
Hacker
language
scrawled
from
HackFive.com
(Fig.
11)
8. Figure
11.
An
example
of
a
posted
message
on
the
HackFive.com
Source
from
Victor
Benjamin
(2015).
Identify
disgruntlement:
First,
I
will
prepare
a
sample
from
data
source.
Second,
disgruntled
sentence
and
non-‐disgruntled
sentence
from
part
of
employee
reviews
will
be
manually
marked.
Third,
I
will
build
the
classify
model
to
differentiate
two
sample
sets
by
using
machine
learning
upon
bag-‐of-‐words.
In
the
forth
step,
the
rest
of
employee
reviews
will
be
predicted
by
the
classification
model.
Statistical
test:
I
will
test
the
relativity
between
data
breach
frequency
and
goal
factor
from
privacy
policy
and
employee
disgruntlement
by
using
T-‐test
or
Wilcox
test.
Also
I
will
try
to
build
the
regression
model
by
using
Logistic
or
LASSO.
PCA
will
be
used
for
feature
selection
if
necessary.
Classification:
The
email
message
will
be
indexed
by
psychological
dictionary.
Then
I
will
use
bayes
or
HMM
model
to
classify
the
insider
threat
samples
and
non-‐threat
samples.
Disgruntled
sentence
and
email
sample
will
be
indexed.
Then
bayes
or
HMM
will
be
built
on
those
samples
to
differentiate
2
samples.
The
hacker
language
detection
will
be
also
conducted
by
the
same
way
as
disgruntled
sample.
PCA
or
Random
Forest
may
be
used
for
feature
selection
if
necessary.
Uni-‐gram
and
bi-‐gram
will
be
build
after
indexing.
Estimate
of
Cost
For
one
PhD
student’s
work
for
1
year
(including
hardware,
software,
data,
graduate
assistantship
and
tuition
waiver):
$30,400.
Preliminary
Schedule
9. 3
month
getting
the
text
from
online
social
media.
3
month
manually
annotation
the
text.
3
month
natural
language
processing.
2
month
performing
machine
learning.
1
month
running
statistical
test.
Affiliation
and
Qualifications
Junyan
Wu,
PhD
student,
Computer
Science
Department,
Virginia
Tech.
I
have
published
papers
in
Bioinformatics
and
Life
science
area
in
the
last
2
years.
I
have
experience
in
Machine
learning
and
Data
mining.
And
I
am
confident
to
conduct
the
proposed
research
successfully.
Reference:
1.
Herath,
T.,
&
Rao,
H.
R.
2009.
Encouraging
information
security
behaviors
in
organizations:
Role
of
penalties,
pressures
and
perceived
effectiveness.
Decision
Support
Systems,
47(2),
154–165.
2.
Herath,
T.,
&
Rao,
H.
R.
2009.
Protection
motivation
and
deterrence:
A
frame-‐
work
for
security
policy
compliance
in
organisations.
European
Journal
of
Information
Systems,
18(2),
106–125.
3.
Bulgurcu,
B.,
Cavusoglu,
H.,
&
Benbasat,
I.
2010.
Information
security
policy
compliance:
An
empirical
study
of
rationality-‐based
beliefs
and
information
security
awareness.
MIS
Quarterly,
34(3),
523–548.
4.
Johnston,
A.
C.,
&
Warkentin,
M.
2010.
Fear
appeals
and
information
security
behaviors:
An
empirical
study.
MIS
Quarterly,
33(4),
549–566.
5.
Puhakainen,
P.,
&
Siponen,
M.
2010.
Improving
employees’
compliance
through
information
systems
security
training:
An
action
research
study.
MIS
Quar-‐
terly,
34(4),
757–778.
6.
Symantec,
&
Ponemon
2009.
More
than
half
of
ex-‐employees
admit
to
stealing
company
data
according
to
new
study.
Press
release
by
Symantec
Corpo-‐
ration
and
Ponemon
Institute.
Retrieved
from
http://www.symantec.com/about/news/release/article.jsp?prid=20090223_017.
Richardson,
R.
2008.
CSI
computer
crime
and
security
survey.
Retrieved
from
http://www.cse.msstate.edu/∼cse6243/
readings/CSIsurvey2008.pdf.
8.
Puhakainen,
P.,
&
Siponen,
M.
2010.
Improving
employees’
compliance
through
information
systems
security
training:
An
action
research
study.
MIS
Quar-‐
terly,
34(4),
757–778.
9.
Warkentin
M.,
Davis
K.
and
Bekkering
E.
2004.
Introducing
the
check-‐off
password
system
(COPS):
an
advancement
in
user
authentication
methods
and
information
security.
Journal
of
Organizational
and
End
User
Computing
16(3),
41–58.
10. C.
N.
DeWall,
L.
E.
Buffardi,
I.
Bonser
and
W.
K.
Campbell,
2011.
Narcissism
and
implicit
attention
seeking:
Evidence
from
linguistic
analyses
of
social
networking
and
online
presentation.
Personality
and
Individual
Differences,
pp.
57-‐62.
11.
J.
B.
Hirsh
and
J.
B.
Peterson,
2009.
Personality
and
language
use
in
self-‐narratives.
Journal
of
Research
in
Personality,
vol.
43,
pp.
524-‐527.
12.
T.
Holtgraves,
2011.
Text
messaging,
personality,
and
the
social
context.
Journal
of
Research
in
Personality,
vol.
45,
pp.
92-‐99,.
13.
Y.
R.
Tausczik
and
J.
W.
Pennebaker,
2010.
The
Psychological
Meaning
of
Words:
LIWC
and
Computerized
Text
Analysis
Methods.
Journal
of
Language
and
Social
10. Psychology,
vol.
29,
no.
1,
p.
24054.
14. T.
Yarkoni,
2010.
Personality
in
100,000
Words:
A
large-‐
scale
analysis
of
personality
and
word
use
among
bloggers.
Journal
of
Research
in
Personality,
vol.
44,
pp.
363-‐373,
15. C.
E.
Bartley
and
S.
C.
Roesch,
2011.
"Coping
with
daily
stress:
The
role
of
conscientiousness,"
Personality
and
Individual
Differences,
vol.
50,
pp.
79-‐83.
16. J.
E.
Bono,
T.
L.
Boles,
T.
A.
Judge
and
K.
J.
Lauver,
2002."The
Role
of
Personality
in
Task
and
Relationship
Conflict,"
Journal
of
Personality,
vol.
70,
no.
3,
pp.
311-‐344.
17. L.
A.
Burton,
J.
Hafetz
and
D.
Henninger,
2007.
"Gender
Differences
in
Relational
and
Physical
Aggression,"
Social
Behavior
and
Personality,
vol.
35,
no.
1,
pp.
41-‐50.
18. N.
Corry,
R.
D.
Merritt,
S.
Mrug
and
B.
Pamp,
2008."The
Factor
Structure
of
the
Narcissistic
Personality
Inventory,"
Journal
of
Personality
Assessment,
vol.
90,
no.
6,
pp.
593-‐600.
19. J.
F.
Ebstrup,
L.
F.
Eplov,
C.
Pisinger
and
T.
Jorgensen,
2011.
"Association
between
the
Five
Factor
personality
traits
and
perceived
stress:
is
the
effect
mediated
by
general
self-‐efficacy?,"
Anxiety,
Stress,
&
Coping,
vol.
24,
no.
4,
pp.
407-‐419.
20. V.
Egan
and
M.
Lewis,
2011.
"Neuroticism
and
agreeableness
differentiate
emotional
and
narcissistic
expressions
of
aggression,"
Personality
and
Individual
Differences,
vol.
50,
pp.
845-‐850.
21. J.
J.
Mondak,
M.
V.
Hibbing,
D.
Canache,
M.
A.
Seligson
and
M.
R.
Anderson,
2011.
"Personality
and
Civic
Engagement:
An
Integrative
Framework
for
the
Study
of
Trait
Effects
on
Political
Behavior,"
American
Political
Science
Review,
vol.
104,
no.
1,
pp.
85-‐110.
22.
R.
R.
McCrae,
2010.
"The
Place
of
the
FFM
in
Personality
Psychology,"
Psychological
Inquiry,
vol.
21,
pp.
57-‐
64.
23.
M.
A.
Maloof
and
G.
D.
Stephens,
2007.
“ELICIT:
A
system
for
de-‐
tecting
insiders
who
violate
need-‐to-‐know,”
in
Recent
Advances
in
Intrusion
Detection.
Springer,
pp.
146–166.
24.
F.
L.
Greitzer,
L.
J.
Kangas,
C.
F.
Noonan,
A.
C.
Dalton,
and
R.
E.
Hohimer,
2012.
“Identifying
at-‐risk
employees:
Modeling
psychosocial
precursors
of
potential
insider
threats,”
in
System
Science
(HICSS),
2012
45th
Hawaii
International
Conference
on.
IEEE,
pp.
2392–2401.
11.
Contact Information:
Junyan Wu, PhD student
Department of Computer Science, Virginia Tech
Student ID: 905927469
E-mail: wujy128@vt.edu