Data Security Guidelines

Data Security Guidelines

May 2010 – Version 1.0

Gary Waldrom

Candidate Selection

Entity Class

A logical model of
an identifiable
party
• Each instance of an
Domains
entity defined within
the system should be
identified and marked
for drill down
investigation

A logical structure
of attributes
represented within
Attributes
a single entity
• Each instance of a
domain structure as
listed within the
spreadsheet (slides 5- Individual data fields under data type constraints and associated business
8) and being contained
within an identified and integrity rules
Entity should be • Each attribute type as listed within the spreadsheet (slides 5-8) and being contained within
marked for further drill an identified domain is a candidate for data obfuscation based on the data obfuscation
down investigation rules

Data Security Guidelines 2010· Page 2

Data Sensitivity

Level 1
• Sensitivity level 1 is a unique identifier in which a party can be identified
without further reference to other sensitive information (High Cardinality),
all instances should be obfuscated or masked

Level 2
• Sensitivity level 2 is information which collectively i.e. more than 1
instance may form a positive identification of a party, in isolation this data,
although deemed sensitive has no direct and unique identification of the
party however the more attributes supplied ultimately form a sensitivity
level 1 without a level 1 being involved (Normal-Cardinality). All combined
instances must be obfuscated

Level 3
• Sensitivity level 3 is data with a Low Cardinality ratio. All combined
instances should be obfuscated although individual instances will not
identify a party


Risk of Identification of Parties

Composite
Identifiers –
•∞ Sensitivity Level 2 • n + Composite
• Multiple composites
• High risk, these • n exponent increase identification,
identifiers will uniquely • Becomes an identifier cardinality increases as
identify party and are instances are added
traceable through as multiple instances
various public domain increase cardinality,
based systems exponent based on
cardinality
Low Cardinality
Unique Identifier –
Identifiers –
Sensitivity Level 1
Sensitivity Level 3


Attribute Identification
Entity Domain Attribute Data Type (Generic) Classification
Firstname(s) Character 2

Name Surnames / Family Name Character 2
Title / Prefix Denormalised: Character 3
Suffix Denormalised: Character 3
Salutation Denormalised: Character 2

House Number/Name Character 2
Address Line 1 Character 2
Address Address Line 2 Character 2
Client
State / County / Canton / Region etc Denormalised: Character 3
Zip / Post Code Character 2
Country Denormalised: Character 3

Home Telephone Number Character 1
Work Telephone Number Character 1
Cell/Mobile Number Character 1
Contact
Additional Telephone Numbers Character 1
Email1 Character 1
Email2 Character 1
Additional Email Accts Character 1


Date of Birth Date 3
Gender Denormalised: Character 3
Political Persuasion Denormalised: Character 3
Religious or Philosophical Beliefs Denormalised: Character 3
Sexual Persuasion Denormalised: Character 3
Race or Ethnic Origin Denormalised: Character 3
Accusations or Suspicions Denormalised: Character 3
Client Personal Details
Convictions / Judgements /
Criminal Records Denormalised: Character 3
Long Character (Free text could hold
Notes sensitive details) 1
Internet usage & web tracking
information Character / W3C Logs 2
Physical and/or Mental Health Character 3
Long Character (Free text could hold
Source of Wealth sensitive details) 1
Nationality Denormalised: Character 3
Domicile Denormalised: Character 3
Spouse Name Domain 2
Children Name Domain 2


SSN / Tax ID / NI Number Character 1
Passport Number Character 1
Login ID's & Passwords Character 1
Natural Keys Union / Club / Society
Membership Character 1
Bank Account Number(s) Number 1
Sort Code(s) Number 2
Client Account Name(s) Character 1
Residential Address Address Domain 2

Beneficiary Beneficiary Entity 1
IFA IFA Entity 2
Linked Data
Intermediary Intermediary Entity 2
Sub Account Sub Account Entity 1
Accountant Accountant Entity 2


Beneficiary All Client Entity Domains 2

IFA All Client Entity Domains 3

Intermediary All Client Entity Domains 3

Sub Account All Client Entity Domains 1

Sensitivity level 1 is a unique identifier in which a party can be identified without further
Sensitivity Level 1 reference to other sensitive information (High Cardinality), all instances should be
obfuscated
Sensitivity level 2 is information which collectively i.e. more than 1 instance may form a
positive identification of a party, in isolation this data, although deemed sensitive has no
Classification Sensitivity Level 2 direct and unique identification of the party ,however the more attributes supplied
Key ultimately form a sensitivity level 1 without a level 1 being involved (Normal-Cardinality).
All combined instances must be obfuscated

Sensitivity Level 3 Sensitivity level 3 is data with a Low Cardinality ratio. All combined instances should be
obfuscated although individual instances will not identify a party

Note: Normalised data types obfuscated layer at the reference table level


Use-Case Example of Composite Identifiers (Sensitivity
Level 2) Data is purely for reference

• Cardinality
First Name
=>1,000,000
• Cardinality =>100,000
Surname

• Cardinality =>10,000
Country
Increase of
positive • Cardinality =>100
identification Region
by a
cumulative of • Cardinality =>5 Obfuscation point
sensitivity 2 Post Code
attributes held
within the
same domain House • Cardinality =<2 Point of probability
Number


Use-Case Example of Composite Identifiers (Sensitivity
Level 3) Data is purely for reference

• Cardinality
Gender =>100,000,000
• Cardinality
Country =>10,000,000
Little increase • Cardinality =>1,000,000
of positive Region
identification
by a • Cardinality =>3,000
cumulative of Date of Birth
sensitivity 1
until the
addition of a
• Cardinality =>5 Obfuscation point
Surname
sensitivity
level 2
attribute Post • Cardinality <=2 Point of probability
Code


Numeric Obfuscation

Numbers used in aggregate functions and checked to provide accuracy
i.e. holdings, values, transactions, should not be obfuscated if all other
attributes within the domain/entity structure have been obfuscated and
there is no method of reversing the obfuscation layer to identify sensitive
data against the values, barring that:

Fixed point numbers
should be Floating point
Integers should be Ordinal numbers
obfuscated equal to numbers should be
obfuscated equal to Currency/percentage should have the
or less than the obfuscated equal to
or less than the formatting over alphabetic element
original precision or less than the
length of the original numeric values obfuscated in the
and obfuscated but original precision
number but still should be retained same way as an
retain the original and scale number
conform to any for verification alpha data element
scale number but but still conform to
specific business purposes retaining the same
still conform to any any specific
rules two character format
specific business business rules
rules


Alpha Obfuscation

Alphabetic and Alphanumeric data types should be
obfuscated retaining the original structure of the underlying
data, however certain exceptions exist for search/view
criteria

SGML/XML/HTML/XHTML/RSS
data formats must retain XML
Embedded Java Code must be
reserved characters in order for
retained but underlying attributes
them to be used in native views,
obfuscated
DTD, XLS, Web based formats
etc.


Key Obfuscation

Obfuscation of keys gives rise to the challenge of failure
of Declarative Referential Integrity when presented to
certain applications that rely upon them thus:

Natural keys that are
Natural keys that are
identified as non- Surrogate keys are
identified as sensitive
sensitive are out of out of scope and
data can only be
scope and may be should be retained
anonymised/masked
retained


Date Obfuscation 1

Dates should retain the original date format of
the National Character set of the underlying
data
Day names
should be Ordinal numbers
obfuscated as per should have the
Day of the week
the alpha data alphabetic
Day numbers numbers should Month numbers
element, however element
should be be obfuscated but should be
the length of the obfuscated in the
obfuscated but retain the 0-6 or obfuscated but
day must be same way as an
retain the 1-31 1-7 formatting retain the 1-12
changed to a alpha data
format dependent on format
length between 6 element retaining
platform
and 9 but not the the same two
same length as character format
the original day


Date Obfuscation 2

Dates should retain the original date format of
the National Character set of the underlying
data
Year numbers should always retain
the century 4-number format in the
range (current year- any validation
Month names should be
criteria) to current year-1 for years
obfuscated as per the alpha data
in the past and current year + 1 to
element, however the length of the Decision support systems relying
(current year +any validation
month must be changed to a length on “roll-forward”/”roll-back” date
criteria) for projected ranges. (This
between 3 and 12 but not the same scenarios and date range queries
potentially could cause problems
length as the original month. must retain the requested period
with date verification functions and
Abbreviated month names should change between two dates
any function code which performs
be obfuscated retaining the 3-
these verifications must utilise the
character format
same seed value as the date value
and must fully enclose within the
same block all other dates)


Granularity of Access to Sensitive Data
Business
Users only
Business Users,
Development &
Support
Development
& Support only

Production
• Production environments
must be fully obfuscated
to all Development,
Support, and Non-
UAT Authorised users
• UAT environments must • Business users may see
be fully obfuscated to all sensitive data based on
Development, Support, their individual levels of
and Non-Authorised users authorisation
• Business users may see • Access to data by
sensitive data based on Support users should be
their individual levels of disallowed if possible
Development authorisation • If access is allowed for
• Access to data by Support “fix-on-fail” functionality
• Development environments users should be this must be keystroke
must be fully obfuscated at disallowed if possible logged through an
the data level (not auditing application
obfuscated views) as • If access is allowed for
developers usually hold “fix-on-fail” functionality
higher privileges in these this must be keystroke
environments logged through an auditing
application


Deployment Methods

Data Security Guideline Policy
Shared
Full Environment Hybrid
Environment
Access Control Environments
Access

Prod, UAT, SIT &
Dev environment Prod, UAT, & SIT
Data may share different Data is environments may
Prod, UAT, SIT & obfuscation/anonym user types i.e. obfuscated/anonymi be obfuscated at a Data is obfuscated
Dev environments isation/masking is business, sed/masked based user type level but to the same rules
are fully segregated performed through developers, on the authority transfers of data but the deployment
by user type, or ETL tools from one support. The level level of the user into Dev method uses both
privilege level. environment to the of granularity must type or privilege environments may technical methods
next be defined on a per- level be performed
user type or through ETL utilities
privilege level basis.


Benefits & Drawbacks of Deployment Methods

Full Shared
Hybrid
Environment Environment
Access Control Environments
Access Control
Benefits Benefits Benefits
• Leverage existing tools • Higher level of access • All prior mentioned
capabilities and vendor support granularity, greater flexibility • Greater flexibility in defining a
• Guaranteed obfuscation • Define the level of encryption to solution which fits with a current
contained within the environment conform to national regulatory “modus operandi”
• User access managed at controls
different layer to data access • No load window issues all users
• Access to environment share same data instance
determines visibility

Drawbacks Drawbacks Drawbacks
•ETL tool license/platform • Development costs • All prior mentioned
costs • Requires clear delineation of • Potential support complexity
•Load window issues user roles and role management issues
•Metadata & cipher security • Proprietary technology solutions
concerns


Data Obfuscation Methodology

Full Hybrid
Environmental environment Shared
Access • No access to Environment
Control PROD,
• Data obfuscation
obfuscation in UAT
• No data based on roles
based on roles
obfuscation, none and rules of
and rules, ETL
authorised users sensitivity
obfuscation into
have no access DEV


Environmental Control (Access Method)
Informatica Informatica

PROD UAT DEV

ETL ETL
Instance 1 (Apply Instance 2 (Apply Instance 3
Obfuscation Obfuscation
Rules)
(Obfuscated) Rules)
(Obfuscated)

Development & Support
Business Users
Users


Environmental Control (Hybrid Method)
Informatica
Periodic Refresh or
Duplex Feed
PROD UAT DEV

ETL
Instance 1 Instance 1 (Apply Instance 3
Obfuscation
or 2 Rules)
(Totally
Obfuscated)

Obfuscation
Layer

Development & Support
Business Users
Users


Appendix

Terms of Reference
Dynamic
Lingual Risk Non-Deterministic Monte Carlo Obfuscation
Reference Impact/Probability Obfuscation Method Function
Methods


Lingual Reference

To remain unidentified, nameless
i.e. NULL therefore a field that is
Anonymous/Anonymised anonymous would not show any
data at all and you could not verify
the structure of the data

To confuse, scramble i.e. encrypt,
therefore you could verify that a
date was a date albeit the wrong
one, a number is a number albeit
Obfuscate/Obfuscated the wrong one and alpha is alpha in
the same structure so you would
see the structure but the sensitive
data would be indecipherable

To cover, hide, this would normally
be used in password protection
Mask/Masked where the asterisk is displayed as
typed

Anonymous and Obfuscate are used in literature, an anonymous writer is unknown whereas writing under a nom de plume
is obfuscated


Risk impact/Probability

Probability - A risk is an event that "may"
occur. The probability of it occurring can
range anywhere from just above 0% to just
below 100%. (Note: It can't be exactly
100%, because then it would be a certainty,
not a risk. And it can't be exactly 0%, or it
wouldn't be a risk.)

Impact - A risk, by its very nature, always
has a negative impact. However, the size of
the impact varies in terms of cost and
impact on some other critical factor.

We apply these rules to determine when to
obfuscate data and when not to


Non-Deterministic Obfuscation

A variety of factors can cause an algorithm to
behave in a way which is not deterministic, or
non-deterministic:
• If it uses external state other than the input, such as user input, a
global variable, a hardware timer value, a random value, or stored A major problem with deterministic algorithms is that
disk data. sometimes, we don't want the results to be predictable.
• If it operates in a way that is timing-sensitive, for example if it has For example, if you are playing an on-line game of
multiple processors writing to the same data at the same time. In blackjack that shuffles its deck using a pseudorandom
this case, the precise order in which each processor writes its data
will affect the result. number generator, a clever gambler might guess precisely
• If a hardware error causes its state to change in an unexpected the numbers the generator will choose and so determine
way. the entire contents of the deck ahead of time, allowing
him to cheat. Similar problems arise in cryptography,
where private keys are often generated using such a
generator. This sort of problem is generally avoided using
a cryptographically secure pseudo-random number
generator.


The Monte Carlo Methods

Monte Carlo methods are computational algorithms that rely on
repeated random sampling to compute their results one of which is a
stochastic function to create an obfuscation layer

Stochastic programming is a framework for modelling optimization
problems that involve uncertainty.

Because of their reliance on repeated computation of random or
pseudo-random numbers, these methods are most suited and tend to
be used when it is unfeasible or impossible to compute an exact result
with a deterministic algorithm thus ensuring data obfuscation

These are the building blocks to secure obfuscation of highly
sensitive data within the banking environment and will satisfy an
external audit


Dynamic Obfuscation Function Methods
This is an example of a high level data
obfuscation function in which a decision
is made based on the previous criteria of
when to obfuscate and the process of
obfuscation for an alpha data type
(simplest form)

Data is obfuscated on the
decision point based on the
underlying technologies info-
gap non-probalistic theory
methods of random number
generation which creates seed
data for ASCII conversion of
real-data


Data Security Guidelines

Recommended

Recommended

More Related Content

Featured

Featured (20)

Data Security Guidelines