The possibilities of DATPROF Subet en DATPROF Privacy. For Subsetting databases en masking databases. To improve testing of software and to comply to data privacy regulations
6. Agile Development
• Building the right product
• Room for change
• Every 2-4 weeks working increments of the software
• Progress in development
7. How to test all these iterations?
And… what data to use?
8. Team 1 Team 2 Team 3
6 TB 500 GB
Production
10 GB
6 TB 500 GB
Test
10 GB 6 TB 500 GB
Development
10 GB
Total
19,53 TB
9. Team 1 Team 2 Team 3
6 TB 500 GB
Production
10 GB
6 TB 500 GB
Test
10 GB 6 TB 500 GB
Development
10 GB
Total
19,53 TB
Team 1 Team 2 Team 3
Test
Team 1 Team 2 Team 3
Development
10. Team 1 Team 2 Team 3
6 TB 500 GB
Production
10 GB
6 TB 500 GB
Development
10 GB
6 TB 500 GB
Test
10 GB
6 TB 500 GB
Development
10 GB
6 TB 500 GB
Test
10 GB
6 TB 500 GB
Development
10 GB
6 TB 500 GB
Test
10 GB
Total
45,57 TB
11. Team 1 Team 2 Team 3
6 TB 500 GB
Production
10 GB
600 GB 50 GB
Development
1 GB
600 GB 50 GB
Test
1 GB
600 GB 50 GB
Development
1 GB
600 GB 50 GB
Test
1 GB
600 GB 50 GB
Development
1 GB
600 GB 50 GB
Test
1 GB
Total
10.4 TB
10 % Subset 10 % Subset 10 % Subset
14. Minimize data usage
Save on hardware & infra
Reduce throughput times
Efficient data management
Protect customer information
Comply with regislation
Prevent brand damage
Maintain competitive advantages
Subsetting Anonymizing
Advantages of subsetting data Advantages of scrambling & masking data
20. Data model classification
Subset – Process data
Example: Customers, Orders, Contracts, Invoices, Transactions
Full – Master data
Example: Application data, configuration, master tables
Embty – Logging, non relevant history
Example: Logging tables, temp tabellen
Determine data to be subsetted
21.
22. Chain of systems
Method for deriving consistent subsets from multiple systems
Production Test/Development
Start Filter
All customers from The
Netherlands
Start Filter
All orders from customers in
the previous subset.
25. - Bank account balance
- Dept
- Medication
- Illness
- Religion
- Political preference
- Salary
- Phone history
- Et cetera…
- Name
- Date of birth
- Email
- Bank account number
- Social security number
- Adress
- Insurance number
- Cellphone number
- Et cetera..
Personal data
Identifying Characteristics
“Any information relating to an identified or identifiable natural person ("data subject")
Source: Data Protection Directive - Directive 95/46/EC
27. Shuffle
Shuffle values within same column
Conditional
Manipulate specified rows+
First name Last name Type
John
Max
Joe
Clark
Smith
Williams
DATPROF
Customer
Customer
Customer
Company
28. 321
First name Last name Type Comment E-Mail
John
Max
Joe
Smith
Williams
Clark
Blank
Delete values from columns
Scramble
Replace existing characters
j.clark@live.com
Smith_max@mail.com
i_am@JoeWilliams.de
“Brother of J. Clark”
“Has dept”
Customer
Customer
Customer
CompanyDATPROF
29. Nr. First name Last name Type Co.. E-mail Date of Birth
John
Max
Joe
Smith
Williams
Clark
DATPROF
123
Customer
Customer
Customer
Company
321
789
456
First day
Change dates to first day within same month and year
01-02-1954
01-11-1984
01-03-1974
Postal code
Date of Birth 1st day of month 1st day of year
87% 3.7% 0.04%
Source: research anonimity by Prof. Dr. Latanya Sweeney (Harvard University)
x.xxxxx@xxxx...
Xxxxx_xxx@xx...
x_xx@XxxXxxx...
30. Nr. First name Last name Type .. E-mail Date of birth
123
321
789
01-02-1954
01-11-1984
01-03-1974
Look-up
Replace values with values from a lookup table
James
Adrian
Thomas
John
Max
Joe
First names
Chris
Thomas
James
Ruben
Adrian
Michael
David
Reference data
Smith
Williams
Clark
DATPROF
Customer
Customer
Customer
Company
x.xxxxx@xxxx...
Xxxxx_xxx@xx...
x_xx@XxxXxxx...
31. Nr. First name Last name Type Comment E-mail Date of birth
Thomas
James
Adrian
Smith
Williams
Clark
DATPROF
123
Customer
Customer
Customer
Company
321
789
456
01-02-1954
01-11-1984
01-03-1974
Expression
Use custom made functions
Scrambled T.Smith@datprof.com
J.Willams@datprof.com
A.Clark@datprof.com
Scrambled
Scrambled
Doordat het team zelf bepaald hoeveel werk zij van de backlog aankunnen en daarvoor commitment afgeven. Plus het feit dat na de sprint de rest van de organisatie zien wat hun voortgang is, zorgt voor een onzettend gemotiveerd en effectief team.
Het bouwen van software is ontzettend veranderlijk. Gebruikers weten vaak niet precies wat ze willen totdat ze het voor hun zien of ermee kunnen werken. Daarvoor is prototype ontwikkeling en de mogelijk om na een sprint bij te sturen onzettend belangrijk.
Zeggen Scrum te doen, maar niet doen…….. Uitleggen welke fouten