How to design the architecture and processes for the application which needs to process protected and personal data? This presentation is based on a real-life project, implemented in Xebia. Presented on AWS Community Day NL in Utrecht, NL. 20.09.2023.
2. xebia.com
Krzysztof Kąkol
Chief of Data Engineering and
Solutions Architect in Xebia Poland
AWS Community Builder & AWS Ambassador
https://www.linkedin.com/in/krzysztofkakol/
Other stuff:
Classical and jazz pianist
PhD in AI-driven sound processing
3. xebia.com
HRM systems store
user personal data and
lots of sensitive data
like salary.
Clients want to know
where the data are
stored, who has access
to it, what we do to
avoid compromising.
Clients have their
checklists with
requirements.
Strong vendor
responsibility comes
with great risk that must
be handled.
HR is about
privacy
Clients want to be
secure
Business
requirements
Vendor’s risk
Data Privacy by Design
in HRM system
5. xebia.com
Privacy by Design is a concept developed by Ann
Cavoukian in 90’s. It advances the view that the
future of privacy cannot be assured solely by
compliance with regulatory frameworks; rather,
privacy assurance must ideally become an
organization’s default mode of operation.
Data Privacy by Design
6. xebia.com
No action is required on the part
of the individual to protect their
privacy — it is built into the
system, by default.
Privacy as the
Default Setting
The result is that privacy
becomes an essential
component of the core
functionality being delivered.
Privacy Embedded
into Design
Privacy by Design seeks to
accommodate all legitimate
interests and objectives in a
positive-sum “win-win”
manner – without trade-offs.
Full Functionality –
Positive-Sum
Data Privacy by Design
PbD anticipates and
prevents privacy
invasive events before
they happen.
Proactive not
Reactive
Its component parts and
operations remain visible and
transparent, to users and
providers alike.
Visibility and Trans-
parency – Keep it Open
Keeping the interests of the individual uppermost by
offering such measures as strong privacy defaults,
appropriate notice, and empowering user-friendly
options.
Respect for User Privacy –
Keep it User-Centric
Privacy by Design ensures
cradle to grave, secure
lifecycle management of
information, end-to-end.
End-to-End Security – Full
Lifecycle Protection
7. xebia.com
The Norwegian Data Protection Authority has
developed these guidelines to help organisations
understand and comply with the requirement of data
protection by design and by default in article 25 of
the General Data Protection Regulation.
Data Protection by Design
and by Default
8. xebia.com
Setting requirements for
data protection and
information security for
the final product.
Requirements
Ensuring that requirements
for data protection and
information security are
reflected in the design.
Design
Writing secure code by
implementing the
requirements for data
protection and security.
Coding
Data Protection by Design
and by Default
Planning specific
trainings for
management and
employees
Training
Comprehensive and final
security review should be
done before the software is
released.
Release
Planning for incident
response handling
(prepared during the release
activity) and following it.
Maintenance
Checking that the requirements
for data protection and
information security have been
implemented as planned.
Testing
10. xebia.com
STRIDE is a model for identifying computer
security threats developed by Praerit Garg and Loren
Kohnfelder at Microsoft.
It provides a mnemonic for security threats in six
categories:
• Spoofing
• Tampering
• Repudiation
• Information Disclosure
• Denial of Service
• Elevation of Privileges
Prerequisites – Risk analysis
https://www.eccouncil.org/threat-modeling/
11. xebia.com
General review of the infrastructure and workload
(including the software architecture used) should be
done.
It should be a high-level review, not a detailed project
review:
• General architecture description
• Security setup (firewalls, network ACLs etc.)
• High-level recommendations
Prerequisites – Workload review
12. xebia.com
Well-Architected Review (WAR) should be done periodically. It
can be performed using the Well-Architected Tool in AWS
WAR describes key concepts, design principles, and best
practices for creating and running workloads in the cloud.
Most important pillars from Data Privacy perspective: Security
and Reliability.
Prerequisites – Well-Architected Review
13. xebia.com
• Securely operating workloads
• Managing permissions
• Protecting infrastructure
• Protect data at rest and in transit
• Testing against security issues
• Responding to security incidents
Security Pillar
• Managing workloads
• Designing high-available, resilient and self-healing
infrastructure
• Reliable deployment processes
• Business continuity
• Managing failures, disaster recovery
Reliability Pillar
Prerequisites – Well-Architected Review
14. xebia.com
Draw the borders of your "world of concerns" first – model your threats with simple yet comprehensive tools like
STRIDE. Once you have done it, you know what you need to focus on!
Use available tools and sources to obtain as much knowledge about security and reliability perspectives as
possible. One of the best tools (if not the best) in this area is the AWS Well-Architected Tool.
Prerequisites – Summary
17. xebia.com
Limited access to
production infra and DB
Using IaaC to manage
infrastructure
Managed access to
codebase
Secrets stored in vault
(eg. Secrets Manager)
Good password policy
2FA used and enforced
wherever possible
Access restriction
policies (like bastion
hosts, VPNs etc.)
Resources encryption
(EBS, RDS, S3 etc.)
Content encryption
Communication
encryption
Control of the key – KMS
CMK
Permission model must
be flexible enough
Sensitive data – special
behavior
Elevating privileges
should be impossible
Tested permission model
Principle of least
privilege
Authentication
policies
Encryption Well-designed
features
Confidentiality
Confidentiality refers to protecting sensitive data from unauthorized access
18. xebia.com
Versioning is a default
setting
Used for all files,
including contracts,
internal documents,
agreements etc.
No long-living AWS
credentials
Only roles applied to
resources
Semantic roles
Validation for all
incoming data
Defined allowed values
for most fields (eg.
min/max for numeric
fields)
Cloud Trail for API
activity
Logs for S3, database,
workload – sent to
Cloudwatch
Audit trail in the
application
S3 object
versioning
IAM roles Data validation Monitoring and
logging
Integrity
Integrity means that data is protected against unwanted alteration, destruction, or loss.
19. xebia.com
Using landing zones or
Identity Center
Managing access
through IaaC
Limited and defined
access to production
resources
Flexible permission
model
Using RBAC, ABAC, ACL
and access levels
Data and infrastructure
redundancy
Access to data during
contingency plans
Data lifecycle policies
(eg. in S3)
Documentation of the
incident management
process
Well-described scenarios
for potential incidents
(STRIDE)
Access to
infrastructure
Permission model Availability of data Incident
management
Accessibility
Personal data must be available to authorised personnel who require it for their work.
20. xebia.com
Workload in multiple AZs
Self-healing
infrastructure
All components highly-
available: ALB, multi-AZ
RDS, EKS nodes in
ASG, replicated NATs
Using managed services
Well-chosen strategy for
DR (B&R, pilot light,
warm standby, multisite)
Preparing DR plan – pilot
light
Testing DR plan
Backup for database –
automatic snapshots,
PITR, manual snapshots
Object replication in S3
(multi-region)
Backup retention
Using DDoS protection
(CDN)
Resources firewall setup
– using semantic
Security Groups
Application firewalls –
WAF
Workload in private
subnets
High availability Disaster recovery Backup strategy Resilience to
attacks
Resilience
Software that is processing personal data must be able to resist vulnerabilities, attacks, and
accidents.
21. xebia.com
VPC flow logs, S3 logs,
ALB logs etc.
Meaningful application
logs
Request tracking (X-Ray)
Analyze logs –
Cloudwatch Logs
Insights
Notify when something is
detected (SNS)
History of inserts,
updates and deletes
Every trail record
contains ”before” and
„after” state
Who and when made the
change
Some data have
historical changes and
planned changes (eg.
employees)
Implementing SCD is
difficult but builds a
history of the record
Log everything Analyse & notify Audit trail Slowly Changing
Dimensions
Traceability
Traceability is documentation of changes made within the software, infrastructure and to personal
data.
23. xebia.com
Security documentation
• OWASP Top 10 review
• Personal Data Access – document describing the processes to access personal/private data, who has access
to it, where they are stored etc.
• Personal Data Management rules, based on Requirements from NDPA checklist – generally speaking, what is
the reason of storing personal data, what is the legal basis of storing them, what preventive measures have
been implemented in terms of securing the data, how the data should be accessed, what are the
requirements of data backup etc.
Documentation
24. xebia.com
Implementation practices documentation
• Coding practices – describing the current coding practices in the project, including libraries used,
vulnerability scanning, code review process, branching model and all other components mentioned in NDPA
checklist.
• Testing practices – describing the current testing implemented in the project – testing frameworks, unit,
integration, end-to-end, system, performance testing practices etc. (NDPA checklist).
Documentation
25. xebia.com
Maintenance practices documentation
• Disaster Recovery plan (with test)
• Backup strategy
• Incident management plan - what to do after an incident occurs in terms of formal and implementation
practices
• Release management – what are the rules of release process, who’s responsible for it, how the process is
generally handled, since it usually implies accessing personal data directly or indirectly
• Maintenance process – what are the rules of accessing production environment in case of failure or hotfix.
Documentation
27. xebia.com
• Prepare well – define your world of concerns
• Analyse the risks
• Recognize best practices
• Document everything
• Evolution over revolution
Key takeaways