SlideShare a Scribd company logo
1 of 14
Techniques to Handle PII Data in
Data Engineering Workflows to
Ensure Compliance to Data
Protection Laws
Sri Lanka
2023
Nuzhi Meyen
Importance of Compliance
Introduction
• The Personal Data Protection Act No. 9 of 2022
(PDPA) in Sri Lanka.
• Digital Personal Data Protection (DPDP) Act in
India (2023)
• GDPR in the EU (201 8)
• CCPA in California, USA (201 8)
• COPPA (1 998)
• HIPPA (1 996)
• PCI DSS (2004)
Source - Data Privacy Vocabulary - W3C Data Privacy
Vocabularies and Controls CG (DPVCG)
Maximum of
2.5 Billion INR
and Minimum
of 500 Million
INR
DPDP
Upto 1 0
Million Euros
or 2% of
preceding
fiscal year
turnover*
GDPR
Upto a
maximum of
1 0 Million
LKR
PDPA
Sri Lanka
2023
What is PII Data ?
Sensitive Data
Confidential Data
PII stands for Personally Identifable Infformation. It is any data that
could potentially identify a specific individual.
Sometimes referred to as “Public” data, sensitive data is any
information that can be found in public records like newspapers,
telephone books, or social media sites
Confidential (or “private”) Data is information that an individual
would prefer not be made public. This can include information such
as:
• Physical home address
• Telephone number (mobile, business, and personal numbers)
• Date or location of their birth
High-Risk Data
Sometimes labeled “Restricted” data, high-risk data is the highly
confidential information that supports cyber-crime activities and
typically can’t be found through legal means of inquiry. This can
include data such as:
• Credit card information
• Medical records
• Social Security or TIN (Tax Identification Number)
Sources - dataprivacymanager.net & digitalguardian.com
Sri Lanka
2023
Data Minimization
Purpose Limitation
Storage Limitation
General Principles
for Handling PII
The principle of data minimization encourages organizations to
only collect the data that is absolutely necessary for the specific
purpose it will serve.
This principle states that data should only be used for the purpose
for which it was initially collected.
This principle advocates for the deletion of personal data once it is
no longer necessary for the purpose it was collected for.
Sri Lanka
2023
Data Engineering
Techniques
Tokenization
Replace sensitive data
with non-sensitive
placeholders.
There are several data engineering techniques which can be
considered in the context of handling PII data. A few of them are
given below.
Encryption
• At-rest: Encrypte
data when it's stored
• In- transit: Use
SSL/TLS encryption
during data transfer.
Masking
Conceal portions of the
data to protect it.
Role-Based Access
Control (RBAC)
Limit access to data based on
roles within the organization
Auditing &
Monitoring
Track who access what data, when
and why.
Sri Lanka
2023
What it is : This algorithm
keeps the format of the input
data. For example, if a 1 6-digit
credit card number is
tokenized, the token will also
be a 1 6-digit number.
Use Case: Useful in scenarios
where the format of the
tokenized data needs to be
similar to the original data,
such as in legacy systems.
Format-Preserving
Encryption (FPE)
Tokenization
Secure Hash Algorithm
(SHA) Tokenization
What it is : Uses a one-way
hash function to create a hash
of the original data. A random
salt is then added to the hash.
The salted hash is then stored
as a token.
Use Case: Suited for
situations where you don't
need to retrieve the original
data but do need to verify the
integrity of the data..
Random
Tokenization
What it is : Generates a
completely random string as
a token and maps it to the
original data in a secure
lookup table.
Use Case: Good for general-
purpose tokenization where
format preservation is not
necessary.
Cipher-Based
Tokenization
What it is : Generates a
completely random string as
a token and maps it to the
original data in a secure
lookup table.
Use Case: Good for general-
purpose tokenization where
format preservation is not
necessary.
Vault-Based
Tokenization
What it is : Stores the original
data in a highly secure data
vault. Each piece of stored
data is mapped to a unique
token.
Use Case: Ideal for
applications that require high
levels of security but also
need to detokenize data
frequently.
Sri Lanka
2023
# Install with: pip install pyffx (Format-preserving, Feistel based Encryption - FFX)
import pyffx
key = b'secret-key'
credit_card = '1234567812345678'
# Create an FPE cipher object
e = pyffx.String(key, alphabet='0123456789', length=len(credit_card))
# Tokenize
token = e.encrypt(credit_card)
# Detokenize
original = e.decrypt(token)
print(f'Token: {token}, Original: {original}')
Tokenization - FPE
Sri Lanka
2023
import hashlib
import os
def sha_tokenization(data):
salt = os.urandom(16)
hash_obj = hashlib.sha256()
hash_obj.update(data.encode('utf-8'))
hash_obj.update(salt)
return hash_obj.hexdigest()
original_data = "sensitive_information"
token = sha_tokenization(original_data)
print(f'Token: {token}')
)
Tokenization - SHA
Sri Lanka
2023
import uuid
def random_tokenization(data, token_map):
token = str(uuid.uuid4())
token_map[token] = data
return token
token_map = {}
original_data = "sensitive_data"
token = random_tokenization(original_data, token_map)
print(f'Token: {token}, Original: {token_map[token]}')
)
Tokenization - Random
Sri Lanka
2023
# Install with: pip install cryptography
from cryptography.fernet import Fernet
key = Fernet.generate_key()
cipher_suite = Fernet(key)
# Tokenize
token = cipher_suite.encrypt(b"Sensitive Data")
# Detokenize
original = cipher_suite.decrypt(token)
print(f'Token: {token}, Original: {original.decode()}')
Tokenization - Cipher based
Sri Lanka
2023
import hvac
# Initialize Vault client
client = hvac.Client()
# Verify if Vault is initialized and unsealed
assert client.is_initialized() is True
assert client.sys.is_sealed() is False
# Create a secret in the Vault (Tokenization)
write_response = client.secrets.kv.v2.create_or_update_secret(
path='my-secret',
secret=dict(sensitive_data="This is very secret information"),
)
# The returned `write_response` will contain metadata, not the token
# In Vault, the token is usually the path ('my-secret' in this case)
# Retrieve the secret from the Vault (Detokenization)
read_response = client.secrets.kv.read_secret_version(
path='my-secret',
)
sensitive_data = read_response['data']['data']['sensitive_data']
print(f"Sensitive Data Retrieved: {sensitive_data}")
Tokenization - Vault based
Sri Lanka
2023
What it is : To completely
hide the original data.
Use Case: Useful for fields
where the actual information
is extremely sensitive and
should not be exposed under
any circumstances, such as
Social Security numbers or
passwords in a system log.
Redaction
Masking
Partial Masking
What it is : To conceal only
part of the data, leaving some
characters visible.
Use Case: Commonly used for
email addresses or phone
numbers in customer
interfaces, where the full
visibility of the data is not
necessary but some context is
useful. For instance, showing
only the last four digits of a
credit card number.
Shuffling
What it is : To randomize the
order of the characters in the
data.
Use Case: Suitable for textual
data where the format needs
to be preserved but the data
should not be recognizable.
It's not ideal for numerical
data or data with a specific
pattern
Substitution
What it is : To replace each
character or substring with
another character or
substring based on a
mapping.
Use Case: Useful when you
need a reversible masking
process. For example, during
software testing, you might
want to mask sensitive data
but will need to revert it back
to its original form for
verification
Number
Variance
What it is : To add a random
variance to numerical data.
Use Case: Useful for datasets
involving numbers where the
exact number is sensitive but
the general range is not. For
instance, in a dataset used for
salary analysis, you might add
variance to the actual salaries
to protect individual privacy
while maintaining the overall
distribution for analytical
purposes.
Sri Lanka
2023
Sri Lanka
2023
Real World
Implications
1 0%
Points to consider ...
• What will privacy look like in a post quantum
encryption timeline ? NIST has already developed
standards for quantum-safe cryptographic
algorithms.
• How will Generative AI technology such as LLM in
the form of ChatGPT etc. impact how developers in
the middle develop secure code without privacy
leaks in the context of code generation? eg:
CodexLeaks: Privacy Leaks from Code Generation
Language Models in GitHub Copilot
Sri Lanka
2023

More Related Content

Similar to PyData Sri Lanka 2023 Presentation - Nuzhi Meyen-V2.pptx

Enhanced Hybrid Blowfish and ECC Encryption to Secure cloud Data Access and S...
Enhanced Hybrid Blowfish and ECC Encryption to Secure cloud Data Access and S...Enhanced Hybrid Blowfish and ECC Encryption to Secure cloud Data Access and S...
Enhanced Hybrid Blowfish and ECC Encryption to Secure cloud Data Access and S...JobandeepKaur2
 
Jun 15 privacy in the cloud at financial institutions at the object managemen...
Jun 15 privacy in the cloud at financial institutions at the object managemen...Jun 15 privacy in the cloud at financial institutions at the object managemen...
Jun 15 privacy in the cloud at financial institutions at the object managemen...Ulf Mattsson
 
IBM Share Conference 2010, Boston, Ulf Mattsson
IBM Share Conference 2010, Boston, Ulf MattssonIBM Share Conference 2010, Boston, Ulf Mattsson
IBM Share Conference 2010, Boston, Ulf MattssonUlf Mattsson
 
Big data security_issues_research_paper
Big data security_issues_research_paperBig data security_issues_research_paper
Big data security_issues_research_paperLuisa Francisco
 
eBook: Level Up Your Data Security with Tokenization
eBook: Level Up Your Data Security with TokenizationeBook: Level Up Your Data Security with Tokenization
eBook: Level Up Your Data Security with TokenizationKim Cook
 
Cisco cybersecurity essentials chapter 4
Cisco cybersecurity essentials chapter 4Cisco cybersecurity essentials chapter 4
Cisco cybersecurity essentials chapter 4Mukesh Chinta
 
IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET-  	  Privacy Preserving Cloud Storage based on a Three Layer Security M...IRJET-  	  Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...IRJET Journal
 
New York Metro ISSA - PCI DSS Compliance - Ulf Mattsson 2009
New York Metro ISSA - PCI DSS Compliance - Ulf Mattsson 2009New York Metro ISSA - PCI DSS Compliance - Ulf Mattsson 2009
New York Metro ISSA - PCI DSS Compliance - Ulf Mattsson 2009Ulf Mattsson
 
How to Maximize Data Governance in Snowflake Test Environment
How to Maximize Data Governance in Snowflake Test EnvironmentHow to Maximize Data Governance in Snowflake Test Environment
How to Maximize Data Governance in Snowflake Test EnvironmentJade Global
 
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...IRJET Journal
 
AWS Cloud Based Encryption Decryption System
AWS Cloud Based Encryption Decryption SystemAWS Cloud Based Encryption Decryption System
AWS Cloud Based Encryption Decryption SystemIRJET Journal
 
SECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEM
SECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEMSECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEM
SECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEMJournal For Research
 
Target Unncryption Case Study
Target Unncryption Case StudyTarget Unncryption Case Study
Target Unncryption Case StudyEvelyn Donaldson
 
IRJET- Ensuring Security in Cloud Computing Cryptography using Cryptography
IRJET-  	  Ensuring Security in Cloud Computing Cryptography using CryptographyIRJET-  	  Ensuring Security in Cloud Computing Cryptography using Cryptography
IRJET- Ensuring Security in Cloud Computing Cryptography using CryptographyIRJET Journal
 
Implementation of De-Duplication Algorithm
Implementation of De-Duplication AlgorithmImplementation of De-Duplication Algorithm
Implementation of De-Duplication AlgorithmIRJET Journal
 
Application Security
Application SecurityApplication Security
Application Securityflorinc
 
Data Partitioning In Cloud Storage Using DESD Crypto Technique
Data Partitioning In Cloud Storage Using DESD Crypto TechniqueData Partitioning In Cloud Storage Using DESD Crypto Technique
Data Partitioning In Cloud Storage Using DESD Crypto TechniqueIJCSIS Research Publications
 
DNS Data Exfiltration Detection
DNS Data Exfiltration DetectionDNS Data Exfiltration Detection
DNS Data Exfiltration DetectionIRJET Journal
 
ISSA: Next Generation Tokenization for Compliance and Cloud Data Protection
ISSA: Next Generation Tokenization for Compliance and Cloud Data ProtectionISSA: Next Generation Tokenization for Compliance and Cloud Data Protection
ISSA: Next Generation Tokenization for Compliance and Cloud Data ProtectionUlf Mattsson
 

Similar to PyData Sri Lanka 2023 Presentation - Nuzhi Meyen-V2.pptx (20)

Coding And Decoding
Coding And DecodingCoding And Decoding
Coding And Decoding
 
Enhanced Hybrid Blowfish and ECC Encryption to Secure cloud Data Access and S...
Enhanced Hybrid Blowfish and ECC Encryption to Secure cloud Data Access and S...Enhanced Hybrid Blowfish and ECC Encryption to Secure cloud Data Access and S...
Enhanced Hybrid Blowfish and ECC Encryption to Secure cloud Data Access and S...
 
Jun 15 privacy in the cloud at financial institutions at the object managemen...
Jun 15 privacy in the cloud at financial institutions at the object managemen...Jun 15 privacy in the cloud at financial institutions at the object managemen...
Jun 15 privacy in the cloud at financial institutions at the object managemen...
 
IBM Share Conference 2010, Boston, Ulf Mattsson
IBM Share Conference 2010, Boston, Ulf MattssonIBM Share Conference 2010, Boston, Ulf Mattsson
IBM Share Conference 2010, Boston, Ulf Mattsson
 
Big data security_issues_research_paper
Big data security_issues_research_paperBig data security_issues_research_paper
Big data security_issues_research_paper
 
eBook: Level Up Your Data Security with Tokenization
eBook: Level Up Your Data Security with TokenizationeBook: Level Up Your Data Security with Tokenization
eBook: Level Up Your Data Security with Tokenization
 
Cisco cybersecurity essentials chapter 4
Cisco cybersecurity essentials chapter 4Cisco cybersecurity essentials chapter 4
Cisco cybersecurity essentials chapter 4
 
IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET-  	  Privacy Preserving Cloud Storage based on a Three Layer Security M...IRJET-  	  Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...
 
New York Metro ISSA - PCI DSS Compliance - Ulf Mattsson 2009
New York Metro ISSA - PCI DSS Compliance - Ulf Mattsson 2009New York Metro ISSA - PCI DSS Compliance - Ulf Mattsson 2009
New York Metro ISSA - PCI DSS Compliance - Ulf Mattsson 2009
 
How to Maximize Data Governance in Snowflake Test Environment
How to Maximize Data Governance in Snowflake Test EnvironmentHow to Maximize Data Governance in Snowflake Test Environment
How to Maximize Data Governance in Snowflake Test Environment
 
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
 
AWS Cloud Based Encryption Decryption System
AWS Cloud Based Encryption Decryption SystemAWS Cloud Based Encryption Decryption System
AWS Cloud Based Encryption Decryption System
 
SECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEM
SECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEMSECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEM
SECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEM
 
Target Unncryption Case Study
Target Unncryption Case StudyTarget Unncryption Case Study
Target Unncryption Case Study
 
IRJET- Ensuring Security in Cloud Computing Cryptography using Cryptography
IRJET-  	  Ensuring Security in Cloud Computing Cryptography using CryptographyIRJET-  	  Ensuring Security in Cloud Computing Cryptography using Cryptography
IRJET- Ensuring Security in Cloud Computing Cryptography using Cryptography
 
Implementation of De-Duplication Algorithm
Implementation of De-Duplication AlgorithmImplementation of De-Duplication Algorithm
Implementation of De-Duplication Algorithm
 
Application Security
Application SecurityApplication Security
Application Security
 
Data Partitioning In Cloud Storage Using DESD Crypto Technique
Data Partitioning In Cloud Storage Using DESD Crypto TechniqueData Partitioning In Cloud Storage Using DESD Crypto Technique
Data Partitioning In Cloud Storage Using DESD Crypto Technique
 
DNS Data Exfiltration Detection
DNS Data Exfiltration DetectionDNS Data Exfiltration Detection
DNS Data Exfiltration Detection
 
ISSA: Next Generation Tokenization for Compliance and Cloud Data Protection
ISSA: Next Generation Tokenization for Compliance and Cloud Data ProtectionISSA: Next Generation Tokenization for Compliance and Cloud Data Protection
ISSA: Next Generation Tokenization for Compliance and Cloud Data Protection
 

Recently uploaded

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 

Recently uploaded (20)

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

PyData Sri Lanka 2023 Presentation - Nuzhi Meyen-V2.pptx

  • 1. Techniques to Handle PII Data in Data Engineering Workflows to Ensure Compliance to Data Protection Laws Sri Lanka 2023 Nuzhi Meyen
  • 2. Importance of Compliance Introduction • The Personal Data Protection Act No. 9 of 2022 (PDPA) in Sri Lanka. • Digital Personal Data Protection (DPDP) Act in India (2023) • GDPR in the EU (201 8) • CCPA in California, USA (201 8) • COPPA (1 998) • HIPPA (1 996) • PCI DSS (2004) Source - Data Privacy Vocabulary - W3C Data Privacy Vocabularies and Controls CG (DPVCG) Maximum of 2.5 Billion INR and Minimum of 500 Million INR DPDP Upto 1 0 Million Euros or 2% of preceding fiscal year turnover* GDPR Upto a maximum of 1 0 Million LKR PDPA Sri Lanka 2023
  • 3. What is PII Data ? Sensitive Data Confidential Data PII stands for Personally Identifable Infformation. It is any data that could potentially identify a specific individual. Sometimes referred to as “Public” data, sensitive data is any information that can be found in public records like newspapers, telephone books, or social media sites Confidential (or “private”) Data is information that an individual would prefer not be made public. This can include information such as: • Physical home address • Telephone number (mobile, business, and personal numbers) • Date or location of their birth High-Risk Data Sometimes labeled “Restricted” data, high-risk data is the highly confidential information that supports cyber-crime activities and typically can’t be found through legal means of inquiry. This can include data such as: • Credit card information • Medical records • Social Security or TIN (Tax Identification Number) Sources - dataprivacymanager.net & digitalguardian.com Sri Lanka 2023
  • 4. Data Minimization Purpose Limitation Storage Limitation General Principles for Handling PII The principle of data minimization encourages organizations to only collect the data that is absolutely necessary for the specific purpose it will serve. This principle states that data should only be used for the purpose for which it was initially collected. This principle advocates for the deletion of personal data once it is no longer necessary for the purpose it was collected for. Sri Lanka 2023
  • 5. Data Engineering Techniques Tokenization Replace sensitive data with non-sensitive placeholders. There are several data engineering techniques which can be considered in the context of handling PII data. A few of them are given below. Encryption • At-rest: Encrypte data when it's stored • In- transit: Use SSL/TLS encryption during data transfer. Masking Conceal portions of the data to protect it. Role-Based Access Control (RBAC) Limit access to data based on roles within the organization Auditing & Monitoring Track who access what data, when and why. Sri Lanka 2023
  • 6. What it is : This algorithm keeps the format of the input data. For example, if a 1 6-digit credit card number is tokenized, the token will also be a 1 6-digit number. Use Case: Useful in scenarios where the format of the tokenized data needs to be similar to the original data, such as in legacy systems. Format-Preserving Encryption (FPE) Tokenization Secure Hash Algorithm (SHA) Tokenization What it is : Uses a one-way hash function to create a hash of the original data. A random salt is then added to the hash. The salted hash is then stored as a token. Use Case: Suited for situations where you don't need to retrieve the original data but do need to verify the integrity of the data.. Random Tokenization What it is : Generates a completely random string as a token and maps it to the original data in a secure lookup table. Use Case: Good for general- purpose tokenization where format preservation is not necessary. Cipher-Based Tokenization What it is : Generates a completely random string as a token and maps it to the original data in a secure lookup table. Use Case: Good for general- purpose tokenization where format preservation is not necessary. Vault-Based Tokenization What it is : Stores the original data in a highly secure data vault. Each piece of stored data is mapped to a unique token. Use Case: Ideal for applications that require high levels of security but also need to detokenize data frequently. Sri Lanka 2023
  • 7. # Install with: pip install pyffx (Format-preserving, Feistel based Encryption - FFX) import pyffx key = b'secret-key' credit_card = '1234567812345678' # Create an FPE cipher object e = pyffx.String(key, alphabet='0123456789', length=len(credit_card)) # Tokenize token = e.encrypt(credit_card) # Detokenize original = e.decrypt(token) print(f'Token: {token}, Original: {original}') Tokenization - FPE Sri Lanka 2023
  • 8. import hashlib import os def sha_tokenization(data): salt = os.urandom(16) hash_obj = hashlib.sha256() hash_obj.update(data.encode('utf-8')) hash_obj.update(salt) return hash_obj.hexdigest() original_data = "sensitive_information" token = sha_tokenization(original_data) print(f'Token: {token}') ) Tokenization - SHA Sri Lanka 2023
  • 9. import uuid def random_tokenization(data, token_map): token = str(uuid.uuid4()) token_map[token] = data return token token_map = {} original_data = "sensitive_data" token = random_tokenization(original_data, token_map) print(f'Token: {token}, Original: {token_map[token]}') ) Tokenization - Random Sri Lanka 2023
  • 10. # Install with: pip install cryptography from cryptography.fernet import Fernet key = Fernet.generate_key() cipher_suite = Fernet(key) # Tokenize token = cipher_suite.encrypt(b"Sensitive Data") # Detokenize original = cipher_suite.decrypt(token) print(f'Token: {token}, Original: {original.decode()}') Tokenization - Cipher based Sri Lanka 2023
  • 11. import hvac # Initialize Vault client client = hvac.Client() # Verify if Vault is initialized and unsealed assert client.is_initialized() is True assert client.sys.is_sealed() is False # Create a secret in the Vault (Tokenization) write_response = client.secrets.kv.v2.create_or_update_secret( path='my-secret', secret=dict(sensitive_data="This is very secret information"), ) # The returned `write_response` will contain metadata, not the token # In Vault, the token is usually the path ('my-secret' in this case) # Retrieve the secret from the Vault (Detokenization) read_response = client.secrets.kv.read_secret_version( path='my-secret', ) sensitive_data = read_response['data']['data']['sensitive_data'] print(f"Sensitive Data Retrieved: {sensitive_data}") Tokenization - Vault based Sri Lanka 2023
  • 12. What it is : To completely hide the original data. Use Case: Useful for fields where the actual information is extremely sensitive and should not be exposed under any circumstances, such as Social Security numbers or passwords in a system log. Redaction Masking Partial Masking What it is : To conceal only part of the data, leaving some characters visible. Use Case: Commonly used for email addresses or phone numbers in customer interfaces, where the full visibility of the data is not necessary but some context is useful. For instance, showing only the last four digits of a credit card number. Shuffling What it is : To randomize the order of the characters in the data. Use Case: Suitable for textual data where the format needs to be preserved but the data should not be recognizable. It's not ideal for numerical data or data with a specific pattern Substitution What it is : To replace each character or substring with another character or substring based on a mapping. Use Case: Useful when you need a reversible masking process. For example, during software testing, you might want to mask sensitive data but will need to revert it back to its original form for verification Number Variance What it is : To add a random variance to numerical data. Use Case: Useful for datasets involving numbers where the exact number is sensitive but the general range is not. For instance, in a dataset used for salary analysis, you might add variance to the actual salaries to protect individual privacy while maintaining the overall distribution for analytical purposes. Sri Lanka 2023
  • 14. 1 0% Points to consider ... • What will privacy look like in a post quantum encryption timeline ? NIST has already developed standards for quantum-safe cryptographic algorithms. • How will Generative AI technology such as LLM in the form of ChatGPT etc. impact how developers in the middle develop secure code without privacy leaks in the context of code generation? eg: CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot Sri Lanka 2023

Editor's Notes

  1. 1.7.2013
  2. 1.7.2013
  3. 1.7.2013
  4. 1.7.2013
  5. 1.7.2013
  6. 1.7.2013
  7. 1.7.2013
  8. 1.7.2013
  9. 1.7.2013
  10. 1.7.2013
  11. 1.7.2013