Data Security Guidelines

  • 257 views
Uploaded on

Data Security Guidelines is a paper I wrote when asked to come up with a solution for an audit issue for a financial institution.

Data Security Guidelines is a paper I wrote when asked to come up with a solution for an audit issue for a financial institution.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
257
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Security Guidelines May 2010 – Version 1.0Gary Waldrom
  • 2. Candidate Selection Entity Class A logical model of an identifiable party • Each instance of an Domains entity defined within the system should be identified and marked for drill down investigation A logical structure of attributes represented within Attributes a single entity • Each instance of a domain structure as listed within the spreadsheet (slides 5- Individual data fields under data type constraints and associated business 8) and being contained within an identified and integrity rules Entity should be • Each attribute type as listed within the spreadsheet (slides 5-8) and being contained within marked for further drill an identified domain is a candidate for data obfuscation based on the data obfuscation down investigation rules Data Security Guidelines 2010· Page 2
  • 3. Data SensitivityLevel 1• Sensitivity level 1 is a unique identifier in which a party can be identified without further reference to other sensitive information (High Cardinality), all instances should be obfuscated or masked Level 2 • Sensitivity level 2 is information which collectively i.e. more than 1 instance may form a positive identification of a party, in isolation this data, although deemed sensitive has no direct and unique identification of the party however the more attributes supplied ultimately form a sensitivity level 1 without a level 1 being involved (Normal-Cardinality). All combined instances must be obfuscated Level 3 • Sensitivity level 3 is data with a Low Cardinality ratio. All combined instances should be obfuscated although individual instances will not identify a party Data Security Guidelines 2010· Page 3
  • 4. Risk of Identification of Parties Composite Identifiers –•∞ Sensitivity Level 2 • n + Composite • Multiple composites • High risk, these • n exponent increase identification, identifiers will uniquely • Becomes an identifier cardinality increases as identify party and are instances are added traceable through as multiple instances various public domain increase cardinality, based systems exponent based on cardinality Low Cardinality Unique Identifier – Identifiers – Sensitivity Level 1 Sensitivity Level 3 Data Security Guidelines 2010· Page 4
  • 5. Attribute IdentificationEntity Domain Attribute Data Type (Generic) Classification Firstname(s) Character 2 Name Surnames / Family Name Character 2 Title / Prefix Denormalised: Character 3 Suffix Denormalised: Character 3 Salutation Denormalised: Character 2 House Number/Name Character 2 Address Line 1 Character 2 Address Address Line 2 Character 2 Address Line 3 Character 2 Client Address Line 4 Character 2 State / County / Canton / Region etc Denormalised: Character 3 Zip / Post Code Character 2 Country Denormalised: Character 3 Home Telephone Number Character 1 Work Telephone Number Character 1 Cell/Mobile Number Character 1 Contact Additional Telephone Numbers Character 1 Email1 Character 1 Email2 Character 1 Additional Email Accts Character 1 Data Security Guidelines 2010· Page 5
  • 6. Attribute IdentificationEntity Domain Attribute Data Type (Generic) Classification Date of Birth Date 3 Gender Denormalised: Character 3 Political Persuasion Denormalised: Character 3 Religious or Philosophical Beliefs Denormalised: Character 3 Sexual Persuasion Denormalised: Character 3 Race or Ethnic Origin Denormalised: Character 3 Accusations or Suspicions Denormalised: Character 3 Client Personal Details Convictions / Judgements / Criminal Records Denormalised: Character 3 Long Character (Free text could hold Notes sensitive details) 1 Internet usage & web tracking information Character / W3C Logs 2 Physical and/or Mental Health Character 3 Long Character (Free text could hold Source of Wealth sensitive details) 1 Nationality Denormalised: Character 3 Domicile Denormalised: Character 3 Spouse Name Domain 2 Children Name Domain 2 Data Security Guidelines 2010· Page 6
  • 7. Attribute IdentificationEntity Domain Attribute Data Type (Generic) Classification SSN / Tax ID / NI Number Character 1 Passport Number Character 1 Login IDs & Passwords Character 1 Natural Keys Union / Club / Society Membership Character 1 Bank Account Number(s) Number 1 Sort Code(s) Number 2 Client Account Name(s) Character 1 Residential Address Address Domain 2 Beneficiary Beneficiary Entity 1 IFA IFA Entity 2 Linked Data Intermediary Intermediary Entity 2 Sub Account Sub Account Entity 1 Accountant Accountant Entity 2 Data Security Guidelines 2010· Page 7
  • 8. Attribute IdentificationEntity Domain Attribute Data Type (Generic) Classification Beneficiary All Client Entity Domains 2 IFA All Client Entity Domains 3 Intermediary All Client Entity Domains 3 Sub Account All Client Entity Domains 1 Sensitivity level 1 is a unique identifier in which a party can be identified without further Sensitivity Level 1 reference to other sensitive information (High Cardinality), all instances should be obfuscated Sensitivity level 2 is information which collectively i.e. more than 1 instance may form a positive identification of a party, in isolation this data, although deemed sensitive has no Classification Sensitivity Level 2 direct and unique identification of the party ,however the more attributes supplied Key ultimately form a sensitivity level 1 without a level 1 being involved (Normal-Cardinality). All combined instances must be obfuscated Sensitivity Level 3 Sensitivity level 3 is data with a Low Cardinality ratio. All combined instances should be obfuscated although individual instances will not identify a party Note: Normalised data types obfuscated layer at the reference table level Data Security Guidelines 2010· Page 8
  • 9. Use-Case Example of Composite Identifiers (SensitivityLevel 2) Data is purely for reference • Cardinality First Name =>1,000,000 • Cardinality =>100,000 Surname • Cardinality =>10,000 Country Increase of positive • Cardinality =>100 identification Region by a cumulative of • Cardinality =>5 Obfuscation point sensitivity 2 Post Code attributes held within the same domain House • Cardinality =<2 Point of probability Number Data Security Guidelines 2010· Page 9
  • 10. Use-Case Example of Composite Identifiers (SensitivityLevel 3) Data is purely for reference • Cardinality Gender =>100,000,000 • Cardinality Country =>10,000,000 Little increase • Cardinality =>1,000,000 of positive Region identification by a • Cardinality =>3,000 cumulative of Date of Birth sensitivity 1 until the addition of a • Cardinality =>5 Obfuscation point Surname sensitivity level 2 attribute Post • Cardinality <=2 Point of probability Code Data Security Guidelines 2010· Page 10
  • 11. Numeric Obfuscation Numbers used in aggregate functions and checked to provide accuracy i.e. holdings, values, transactions, should not be obfuscated if all other attributes within the domain/entity structure have been obfuscated andthere is no method of reversing the obfuscation layer to identify sensitive data against the values, barring that: Fixed point numbers should be Floating point Integers should be Ordinal numbers obfuscated equal to numbers should be obfuscated equal to Currency/percentage should have the or less than the obfuscated equal to or less than the formatting over alphabetic element original precision or less than thelength of the original numeric values obfuscated in the and obfuscated but original precision number but still should be retained same way as an retain the original and scale number conform to any for verification alpha data element scale number but but still conform to specific business purposes retaining the same still conform to any any specific rules two character format specific business business rules rules Data Security Guidelines 2010· Page 11
  • 12. Alpha Obfuscation Alphabetic and Alphanumeric data types should beobfuscated retaining the original structure of the underlying data, however certain exceptions exist for search/view criteria SGML/XML/HTML/XHTML/RSS data formats must retain XML Embedded Java Code must be reserved characters in order for retained but underlying attributes them to be used in native views, obfuscated DTD, XLS, Web based formats etc. Data Security Guidelines 2010· Page 12
  • 13. Key Obfuscation Obfuscation of keys gives rise to the challenge of failure of Declarative Referential Integrity when presented to certain applications that rely upon them thus: Natural keys that are Natural keys that are identified as non- Surrogate keys areidentified as sensitive sensitive are out of out of scope and data can only be scope and may be should be retained anonymised/masked retained Data Security Guidelines 2010· Page 13
  • 14. Date Obfuscation 1 Dates should retain the original date format of the National Character set of the underlying data Day names should be Ordinal numbers obfuscated as per should have the Day of the week the alpha data alphabetic Day numbers numbers should Month numbers element, however element should be be obfuscated but should be the length of the obfuscated in the obfuscated but retain the 0-6 or obfuscated but day must be same way as an retain the 1-31 1-7 formatting retain the 1-12 changed to a alpha data format dependent on format length between 6 element retaining platform and 9 but not the the same two same length as character format the original day Data Security Guidelines 2010· Page 14
  • 15. Date Obfuscation 2 Dates should retain the original date format of the National Character set of the underlying data Year numbers should always retain the century 4-number format in the range (current year- any validation Month names should be criteria) to current year-1 for years obfuscated as per the alpha data in the past and current year + 1 toelement, however the length of the Decision support systems relying (current year +any validationmonth must be changed to a length on “roll-forward”/”roll-back” date criteria) for projected ranges. (Thisbetween 3 and 12 but not the same scenarios and date range queries potentially could cause problems length as the original month. must retain the requested period with date verification functions and Abbreviated month names should change between two dates any function code which performs be obfuscated retaining the 3- these verifications must utilise the character format same seed value as the date value and must fully enclose within the same block all other dates) Data Security Guidelines 2010· Page 15
  • 16. Granularity of Access to Sensitive Data Business Users only Business Users, Development & Support Development & Support only Production • Production environments must be fully obfuscated to all Development, Support, and Non- UAT Authorised users • UAT environments must • Business users may see be fully obfuscated to all sensitive data based on Development, Support, their individual levels of and Non-Authorised users authorisation • Business users may see • Access to data by sensitive data based on Support users should be their individual levels of disallowed if possible Development authorisation • If access is allowed for • Access to data by Support “fix-on-fail” functionality • Development environments users should be this must be keystroke must be fully obfuscated at disallowed if possible logged through an the data level (not auditing application obfuscated views) as • If access is allowed for developers usually hold “fix-on-fail” functionality higher privileges in these this must be keystroke environments logged through an auditing application Data Security Guidelines 2010· Page 16
  • 17. Deployment Methods Data Security Guideline Policy Shared Full Environment Hybrid Environment Access Control Environments Access Prod, UAT, SIT & Dev environment Prod, UAT, & SIT Data may share different Data is environments may Prod, UAT, SIT & obfuscation/anonym user types i.e. obfuscated/anonymi be obfuscated at a Data is obfuscated Dev environments isation/masking is business, sed/masked based user type level but to the same rulesare fully segregated performed through developers, on the authority transfers of data but the deployment by user type, or ETL tools from one support. The level level of the user into Dev method uses both privilege level. environment to the of granularity must type or privilege environments may technical methods next be defined on a per- level be performed user type or through ETL utilities privilege level basis. Data Security Guidelines 2010· Page 17
  • 18. Benefits & Drawbacks of Deployment Methods Full Shared Hybrid Environment Environment Access Control Environments Access Control Benefits Benefits Benefits • Leverage existing tools • Higher level of access • All prior mentioned capabilities and vendor support granularity, greater flexibility • Greater flexibility in defining a • Guaranteed obfuscation • Define the level of encryption to solution which fits with a current contained within the environment conform to national regulatory “modus operandi” • User access managed at controls different layer to data access • No load window issues all users • Access to environment share same data instance determines visibility Drawbacks Drawbacks Drawbacks •ETL tool license/platform • Development costs • All prior mentioned costs • Requires clear delineation of • Potential support complexity •Load window issues user roles and role management issues •Metadata & cipher security • Proprietary technology solutions concerns Data Security Guidelines 2010· Page 18
  • 19. Data Obfuscation Methodology Full Hybrid Environmental environment Shared Access • No access to Environment Control PROD, • Data obfuscation obfuscation in UAT • No data based on roles based on roles obfuscation, none and rules of and rules, ETL authorised users sensitivity obfuscation into have no access DEV Data Security Guidelines 2010· Page 19
  • 20. Environmental Control (Access Method) Informatica Informatica PROD UAT DEV ETL ETL Instance 1 (Apply Instance 2 (Apply Instance 3 Obfuscation Obfuscation Rules) (Obfuscated) Rules) (Obfuscated) Development & Support Business Users Users Data Security Guidelines 2010· Page 20
  • 21. Environmental Control (Hybrid Method) Informatica Periodic Refresh or Duplex Feed PROD UAT DEV ETL Instance 1 Instance 1 (Apply Instance 3 Obfuscation or 2 Rules) (Totally Obfuscated) Obfuscation Layer Development & Support Business Users Users Data Security Guidelines 2010· Page 21
  • 22. Appendix Terms of Reference Dynamic Lingual Risk Non-Deterministic Monte Carlo Obfuscation Reference Impact/Probability Obfuscation Method Function Methods Data Security Guidelines 2010· Page 22
  • 23. Lingual Reference To remain unidentified, nameless i.e. NULL therefore a field that is Anonymous/Anonymised anonymous would not show any data at all and you could not verify the structure of the data To confuse, scramble i.e. encrypt, therefore you could verify that a date was a date albeit the wrong one, a number is a number albeit Obfuscate/Obfuscated the wrong one and alpha is alpha in the same structure so you would see the structure but the sensitive data would be indecipherable To cover, hide, this would normally be used in password protection Mask/Masked where the asterisk is displayed as typedAnonymous and Obfuscate are used in literature, an anonymous writer is unknown whereas writing under a nom de plumeis obfuscated Data Security Guidelines 2010· Page 23
  • 24. Risk impact/ProbabilityProbability - A risk is an event that "may"occur. The probability of it occurring canrange anywhere from just above 0% to justbelow 100%. (Note: It cant be exactly100%, because then it would be a certainty,not a risk. And it cant be exactly 0%, or itwouldnt be a risk.)Impact - A risk, by its very nature, alwayshas a negative impact. However, the size ofthe impact varies in terms of cost andimpact on some other critical factor.We apply these rules to determine when toobfuscate data and when not to Data Security Guidelines 2010· Page 24
  • 25. Non-Deterministic Obfuscation A variety of factors can cause an algorithm to behave in a way which is not deterministic, or non-deterministic: • If it uses external state other than the input, such as user input, a global variable, a hardware timer value, a random value, or stored A major problem with deterministic algorithms is that disk data. sometimes, we dont want the results to be predictable. • If it operates in a way that is timing-sensitive, for example if it has For example, if you are playing an on-line game of multiple processors writing to the same data at the same time. In blackjack that shuffles its deck using a pseudorandom this case, the precise order in which each processor writes its data will affect the result. number generator, a clever gambler might guess precisely • If a hardware error causes its state to change in an unexpected the numbers the generator will choose and so determine way. the entire contents of the deck ahead of time, allowing him to cheat. Similar problems arise in cryptography, where private keys are often generated using such a generator. This sort of problem is generally avoided using a cryptographically secure pseudo-random number generator. Data Security Guidelines 2010· Page 25
  • 26. The Monte Carlo Methods Monte Carlo methods are computational algorithms that rely on repeated random sampling to compute their results one of which is a stochastic function to create an obfuscation layer Stochastic programming is a framework for modelling optimization problems that involve uncertainty. Because of their reliance on repeated computation of random or pseudo-random numbers, these methods are most suited and tend to be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithm thus ensuring data obfuscation These are the building blocks to secure obfuscation of highly sensitive data within the banking environment and will satisfy an external audit Data Security Guidelines 2010· Page 26
  • 27. Dynamic Obfuscation Function Methods This is an example of a high level data obfuscation function in which a decision is made based on the previous criteria of when to obfuscate and the process of obfuscation for an alpha data type (simplest form)Data is obfuscated on thedecision point based on theunderlying technologies info-gap non-probalistic theorymethods of random numbergeneration which creates seeddata for ASCII conversion ofreal-data Data Security Guidelines 2010· Page 27