Data Modeling for Security, Privacy and Data Protection

Data Modeling for Security and
Privacy
Karen Lopez
Data Evangelist
InfoAdvisors
www.datamodel.com
1

Abstract
Modern database systems have introduced more support
for security, privacy, and compliance over the last few years.
We expect this to increase as compliance issues such as
GDPR and other data compliance challenges arise. In this
session, Karen will be discussing the newer features from a
data modelers/database designers' point of view, including:
Data Masking
End-to-End encryption
Row Level Security
New Data Types
Data Categorization and Classification
We'll look at the new features, why you should consider
them, where they work, where they don't. We will also
discuss how to negotiate on behalf of data protection in a
world of Agile, MVP, Lean and DevOps. This session is
hands-on with demos and labs, so bring your own laptop to
participate.
3

Karen Lopez
• Karen has 20+ years of
data and information
architecture experience
on large, multi-project
programs.
• She is a frequent speaker
on data modeling, data-
driven methodologies and
pattern data models.
• She wants you to love
your data.

Why this topic?
•Because
•We
•Love
•Our
•Data

You’re Hired!
Case Study
Group introductions
Lab .5
7

About this
session
• Mostly
transactional
discussions
• Variety of skills &
experience in
teams
• Time limits
• Inspire you to
learn
• Our style
• “At another
company”
• Giving you tools &
approaches
• Some checklists
items
• Mostly analytical
and practical
learning
• Tools are for
examples
9

Outline
OVERVIEW DISCOVER CATEGORIZE
PROTECT MONITOR & ASSESS MORE
THOUGHTFUL
STUFF
10

Ready for 25 May?
Callers asked me:
• How can we get started?
• Can you help us get certified?
• Do you have software for this?
• Do you have a couple of weeks to
help us get this done?

Karen’s Governance Position
Security at the data level
Models capture security & privacy requirements
Management reports of reviews
Measurement
In other words, Governance

Data Models
• Karen’s Preference
• Track all kinds of
metadata
• Advanced Compare
features
• Support DevOps and
Iterative development
• Support Conceptual,
Logical and Physical
design

Data Quality is Also Data Protection
15

Discovery
What do we have?
Where is it? How do we
know?
18

Data
Cataloging
SCAN-BASED AI-BASED
METADATA! DATA PROFILING

Data
Classification
/Categorization
Syntax-based
Sematic-based
AI-based
Data Profiling vs. Data Naming

Data Curation
Related to Data
Stewardship
Covers more than Data
Categorization
Important part of Data
Governance
New-ish term going into
GDPR and other
protection concepts

One more time…
Every Design Decision
must be based on
Cost, Benefit and Risk
www.datamodel.com

C-I-A (confidentiality,
integrity, and availability)
method, classification of
data
23

Catalog Data
Assets
Every compliance effort starts with
inventory
Capture the hard work of every project
Build incrementally
Start with what exists physically
24

Azure Data Catalog
Azure Data Catalog is a
fully managed cloud
service whose users can
discover the data sources
they need and
understand the data
sources they find. At the
same time, Data Catalog
helps organizations get
more value from their
existing investments.

Microsoft
Oracle
Hadoop
DB2
Teradata
MySQL
HANA
Salesforce
..and more
Data Source

Data Objects/Assets
• A metadata representation in Data Catalog of a real-world data object.
Examples include: tables, views, files, reports, and so on.

Categorization Sensitive, Confidential,
PII and Special Data
31

Other Options
Informatica IBM Watson
Erwin Data
Governance
Data Modeling
Tool Portal
???
32

DEMO TIME
Azure Data Catalog, More
33

But really, who?
• End Users
• Self-Serve BI Users
• Machine Learning Users
• AI Users
• Reporting Users
• DBAs
• Devs
• Data Architects

DEMO TIME
Data Migration Assistant, SSMS, ERwin
35

Issues
• Data Scientists spend 80% of
their time sourcing, prepping and
cleansing data
• Likely everyone else has these
issues
• We are lousy at documenting
data and meta data
• This makes Karen sad

Lab 1 Discussion
• When would you be “done” discovering?
• How would you know you were done?
• Would you be able to do all the datasets?
• How would you prioritize the work?
• What skills would you need?
• What went right? Wrong?
• What would make this easier?
39

Assess
What sorts of data do
we steward? How
should we protect it?
40

Themes in Data Protection
Thoughtful Discussions
42

Data Masking
Exampes
XXXX XXXX XXXX 1234
kxxxxxx@ixxxxx.com
$99,9999
June, 99, 9999
KXXXXX Lopez
44

Privacy - Dynamic Data Masking
CREATE TABLE Membership(
MemberID int IDENTITY PRIMARY KEY,
FirstName varchar(100) MASKED WITH (FUNCTION =
'partial(1,"XXXXXXX",0)') NULL,
LastName varchar(100) NOT NULL,
Phone# varchar(12) MASKED WITH (FUNCTION = 'default()') NULL,
Email varchar(100) MASKED WITH (FUNCTION = 'email()') NULL);
INSERT Membership (FirstName, LastName, Phone#, Email) VALUES
('Roberto', 'Tamburello', '555.123.4567', 'RTamburello@contoso.com'),
('Janice', 'Galvin', '555.123.4568', 'JGalvin@contoso.com.co'),
('Zheng', 'Mu', '555.123.4569', 'ZMu@contoso.net');
45

Dynamic Data Masking
COLUMN LEVEL DATA IN THE
DATABASE, AT REST,
IS NOT MASKED
MEANT TO
COMPLEMENT
OTHER METHODS
PERFORMED AT THE
END OF A DATABASE
QUERY RIGHT
BEFORE DATA
RETURNED
PERFORMANCE
IMPACT SMALL
46

Security –
Dynamic Data
Masking in
SQL Server
4
functions
available.
today
• Default
• Email
• Custom String
• Random
47

DDM Functions
Function Mask Example
Default Based on Datatype
String – XXX
Numbers – 000000
Date & Times - 01.01.2000 00:00:00.0000000
Binary – Single Byte 0
xxxx
0
01.01.2000 00:00:00.0000000
0
Email First character of email, then Xs, then .com
Always .com
Kxxx@xxxx.com
Custom First and last values, with Xs in the middle kxxxn
Random For numeric types, with a range 12
48

Dynamic Data Masking
Data in database is
not changed
01
Ad-hoc queries
*can* expose data
02
Does not aim to
prevent users from
exposing pieces of
sensitive data
03
49

Dynamic Data
Masking
Cannot mask an encrypted column (AE)
Cannot be configured on computed column
But if computed column depends on a mask,
then mask is returned
Using SELECT INTO or INSERT INTO results in
masked data being inserted into target (also
for import/export)
50

Why would a DB Designer love
it?
• Allows central, reusable design for
standard masking
• Offers more reliable masking and
more usable masking
• Applies across applications
• Removes whining about “we can
do that later”
51

Security – Row Level Security
52

Security –
Row Level
Security
Filtering result sets (predicate-based
access)
Predicates applied when reading data
Can be used to block write access
User defined policies tied to inline table
functions
53

Row Level Security
No indication that results have been filtered
If all rows are filtered than NULL set returned
For block predicates, an error returned
Works even if you are dbo or db_owner role
54

it?
• Allows a designer to do this sort of
data protection IN THE DATABASE,
not just rely on code.
• Many, many pieces of code
• Applies across applications
55

Always!
Security – Always Encrypted
56

Security – Always Encrypted
ENABLED AT COLUMN LEVEL PROTECTS DATA AT REST
*AND* IN MEMORY
USES COLUMN MASTER KEY
(CLIENT) AND COLUMN
ENCRYPTION KEY (SERVER)
57

Security –
Always
Encrypted
Foreign keys must match
encryption types
Client code needs to support
AE (currently this means .NET
4.x)
59

Security –
Always
Encrypted
Wizard
60

it?
• Always Encrypted, yeah.
• Allows designers to not only specify
which columns need to be
protected, but how.
• Parameters are encrypted as well
• Built in to the engine, easier for
Devs
61

What should we STOP doing?
Nobody ever talks about this….
62

SQL Injection
• WE ARE STILL DOING THIS!
• IT’S STILL THE #1 (but
unsecured storage is
getting more popular)
• TEST. TEST SOME MORE
• Automated Testing
• Governance is important

Auto-incremental Data Access
65

Trusting good people
Good people don’t always stay that way
People mess up
Monitoring
Checking
Automatic alerting

Karen’s Rant Topic for
2019
67

Test Data
• Restoring Production to
Development
• Restoring Production, with
Masking
• Restoring Production, with
Randomizing
• Restoring
Production…anywhere
• Design Test Data
• Lorem Ipsum for Data
• Really, Design Test Data
68

What Skills Do
Data Professionals
Need for Data
Protection?
No one ever talks about this….
69

Big Data and Analytics
Level: Literacy and Hands On
Why: These new technologies and
techniques are making it mainstream
in most shops, whether they are
installed or software as a service.
Plus, we need to use them on our
own data
Who: All IT roles, especially data
stewarding ones.
70

Literacy with Deep Learning, AI, Machine Learning
Level: Literacy +++
• How are they used?
• What are the real life uses today?
• Future uses
• Privacy and Security requirements
• Compliance trade-offs
• Employee Monitoring
71

Data Quality & Reliability
Level: Active Skills
• Is the data right?
• Is it current?
• Should it be there at all?
• Do we Know where it came from?
• Do we know it was calculated correctly?
• Are there any know anomalies?
72

How can we do all this?
Cloud Services are a fantastic way
to learn and get hands on skills.
Online Tutorials are often free and
self guided
Learn from Experts & Case Studies
Deprioritize tasks that are really
just being done for tradition
Hire help
Automate away some tasks to
make more time 73

Karen Lopez
• Blogs at
www.datamodel.com
• She wants you to love
your data.

Thank You
• @DataChick
• karenlopez@infoadvisors.com
76

Data Modeling for Security, Privacy and Data Protection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Modeling for Security, Privacy and Data Protection

Similar to Data Modeling for Security, Privacy and Data Protection (20)

More from Karen Lopez

More from Karen Lopez (17)

Recently uploaded

Recently uploaded (20)

Data Modeling for Security, Privacy and Data Protection