SlideShare a Scribd company logo
1 of 9
T E S T D ATA
A N O N Y M I Z AT I O N
Prateek Gupta
T R A N S F O R M I N G
R E A L D A T A I N T O
R E A L I S T I C T E S T
D A T A
T E S T D ATA A N O N Y M I Z AT I O N
Test data anonymization is a critical practice in the realm of data privacy
and software testing. It involves the process of transforming sensitive
information in a dataset used for testing purposes to protect individuals'
privacy and adhere to data protection regulations. This ensures that
personal or sensitive data does not get exposed during testing while still
allowing organizations to effectively evaluate the functionality, performance,
and security of their software or systems.
2
T H E N E E D F O R T E S T D ATA
A N O N Y M I Z AT I O N
Today's digital age collects
vast amounts of data for
various purposes,
including software
development and testing
This data often
contains personally
identifiable information
(PII) and other sensitive
information.
Data protection regulations
like GDPR and HIPAA
require organizations to
protect this data.
Test data
anonymization is
necessary to fulfill
these obligations and
protect individuals'
privacy.
C O M M O N T E C H N I Q U E S F O R T E S T D ATA
A N O N Y M I Z AT I O N
Tokenization: Replace
sensitive data with
tokens that require
access to a secure
database.
Synthetic Data Generation:
Generate fictional data
mirroring the
characteristics of the
original data
Data Masking: Replace
sensitive data with fake or
pseudonymous data.
Data Encryption: Convert
sensitive data into a
scrambled format
Data Subset Selection: Use
a subset of non-sensitive
data for testing.
10/19/2023
B E N E F I T S O F T E S T D ATA
A N O N Y M I Z AT I O N
Risk Mitigation:
Minimizes the risk of
exposing sensitive data
during testing
Effective Testing:
Allows thorough
testing without
compromising data
privacy.
Privacy Protection:
Safeguards individuals'
sensitive information.
Privacy Protection:
Safeguards individuals'
sensitive information.
S O L U T I O N F O R D ATA M A S K I N G
A Python script was created to mask data in acceptable
form. The script takes an input CSV file with column
headers and prompts the user to choose the output file
type from CSV, XML, EXCEL, JSON, or SQL. Based on
the user's chosen data type for each column, the script
generates mock data and writes the output to the
selected file type in the output folder. The output file is
named as <”input_CSV__file_name" + "mock_data" +
{timestap}>.
U T I L I T Y A R C H I T E C T U R E
T E C H N O L O G Y U S E D A N D
A D VA N TA G E S
 Faker Library: A Python library for generating
fake data with various customizable data
types.
 YAML configuration file: A human-readable
data serialization format used to specify the
script's input and output file locations.
 pandas: A Python library used for data
manipulation and analysis.
 Element Tree: A Python library for working
with XML documents, which are a popular
format for storing structured data.
The solution can be used across various
environments such as for Load Testing , Performance
Testing , User-acceptance Testing, Pre-production and
Production.
And it offers the capability to generate output data in
multiple formats like CSV, XML, Excel, JSON, and SQL
T H A N K Y O U F O R D I V I N G I N T H E T E S T
D ATA A N O N Y M I Z AT I O N . .
P R A T E E K . G U P T A @ T H E P S I . C O
M
P R E S E N T E D B Y:

More Related Content

Similar to Automation for test data anonymization

Next generation data protection and security for oracle users - the block cha...
Next generation data protection and security for oracle users - the block cha...Next generation data protection and security for oracle users - the block cha...
Next generation data protection and security for oracle users - the block cha...
Ulf Mattsson
 
A Study on Big Data Privacy Protection Models using Data Masking Methods
A Study on Big Data Privacy Protection Models using Data Masking Methods A Study on Big Data Privacy Protection Models using Data Masking Methods
A Study on Big Data Privacy Protection Models using Data Masking Methods
IJECEIAES
 

Similar to Automation for test data anonymization (20)

Splunk for cyber_threat
Splunk for cyber_threatSplunk for cyber_threat
Splunk for cyber_threat
 
Pan Dhoni - Modernizing Data And Analytics using AI.pdf
Pan Dhoni - Modernizing Data And Analytics using AI.pdfPan Dhoni - Modernizing Data And Analytics using AI.pdf
Pan Dhoni - Modernizing Data And Analytics using AI.pdf
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
STEALTHbits Sensitive Data Discovery Solutions
STEALTHbits Sensitive Data Discovery SolutionsSTEALTHbits Sensitive Data Discovery Solutions
STEALTHbits Sensitive Data Discovery Solutions
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
 
Privacy-Preserving Data Analysis, Adria Gascon
Privacy-Preserving Data Analysis, Adria GasconPrivacy-Preserving Data Analysis, Adria Gascon
Privacy-Preserving Data Analysis, Adria Gascon
 
8 Guiding Principles to Kickstart Your Healthcare Big Data Project
8 Guiding Principles to Kickstart Your Healthcare Big Data Project8 Guiding Principles to Kickstart Your Healthcare Big Data Project
8 Guiding Principles to Kickstart Your Healthcare Big Data Project
 
Securitarian
SecuritarianSecuritarian
Securitarian
 
Next generation data protection and security for oracle users - the block cha...
Next generation data protection and security for oracle users - the block cha...Next generation data protection and security for oracle users - the block cha...
Next generation data protection and security for oracle users - the block cha...
 
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptxINTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
 
An Overview of Python for Data Analytics
An Overview of Python for Data AnalyticsAn Overview of Python for Data Analytics
An Overview of Python for Data Analytics
 
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
 
Computer Forensics
Computer ForensicsComputer Forensics
Computer Forensics
 
Secure Phrase Search for Intelligent Processing ofEncrypted Data in Cloud-Bas...
Secure Phrase Search for Intelligent Processing ofEncrypted Data in Cloud-Bas...Secure Phrase Search for Intelligent Processing ofEncrypted Data in Cloud-Bas...
Secure Phrase Search for Intelligent Processing ofEncrypted Data in Cloud-Bas...
 
Webinar: Enable Insight Driven Data Risk Assessments with AI
Webinar: Enable Insight Driven Data Risk Assessments with AIWebinar: Enable Insight Driven Data Risk Assessments with AI
Webinar: Enable Insight Driven Data Risk Assessments with AI
 
Data analytics using R programming
Data analytics using R programmingData analytics using R programming
Data analytics using R programming
 
IRJET - Virtual Data Auditing at Overcast Environment
IRJET - Virtual Data Auditing at Overcast EnvironmentIRJET - Virtual Data Auditing at Overcast Environment
IRJET - Virtual Data Auditing at Overcast Environment
 
A Study on Big Data Privacy Protection Models using Data Masking Methods
A Study on Big Data Privacy Protection Models using Data Masking Methods A Study on Big Data Privacy Protection Models using Data Masking Methods
A Study on Big Data Privacy Protection Models using Data Masking Methods
 
GDPR READY SOLUTION FOR UNSTRUCTURED DATA
GDPR READY SOLUTION FOR UNSTRUCTURED DATAGDPR READY SOLUTION FOR UNSTRUCTURED DATA
GDPR READY SOLUTION FOR UNSTRUCTURED DATA
 
PyData Sri Lanka 2023 Presentation - Nuzhi Meyen-V2.pptx
PyData Sri Lanka 2023 Presentation - Nuzhi Meyen-V2.pptxPyData Sri Lanka 2023 Presentation - Nuzhi Meyen-V2.pptx
PyData Sri Lanka 2023 Presentation - Nuzhi Meyen-V2.pptx
 

More from Agile Testing Alliance

More from Agile Testing Alliance (20)

#Interactive Session by Anindita Rath and Mahathee Dandibhotla, "From Good to...
#Interactive Session by Anindita Rath and Mahathee Dandibhotla, "From Good to...#Interactive Session by Anindita Rath and Mahathee Dandibhotla, "From Good to...
#Interactive Session by Anindita Rath and Mahathee Dandibhotla, "From Good to...
 
#Interactive Session by Ajay Balamurugadas, "Where Are The Real Testers In T...
#Interactive Session by  Ajay Balamurugadas, "Where Are The Real Testers In T...#Interactive Session by  Ajay Balamurugadas, "Where Are The Real Testers In T...
#Interactive Session by Ajay Balamurugadas, "Where Are The Real Testers In T...
 
#Interactive Session by Jishnu Nambiar and Mayur Ovhal, "Monitoring Web Per...
#Interactive Session by  Jishnu Nambiar and  Mayur Ovhal, "Monitoring Web Per...#Interactive Session by  Jishnu Nambiar and  Mayur Ovhal, "Monitoring Web Per...
#Interactive Session by Jishnu Nambiar and Mayur Ovhal, "Monitoring Web Per...
 
#Interactive Session by Pradipta Biswas and Sucheta Saurabh Chitale, "Navigat...
#Interactive Session by Pradipta Biswas and Sucheta Saurabh Chitale, "Navigat...#Interactive Session by Pradipta Biswas and Sucheta Saurabh Chitale, "Navigat...
#Interactive Session by Pradipta Biswas and Sucheta Saurabh Chitale, "Navigat...
 
#Interactive Session by Apoorva Ram, "The Art of Storytelling for Testers" at...
#Interactive Session by Apoorva Ram, "The Art of Storytelling for Testers" at...#Interactive Session by Apoorva Ram, "The Art of Storytelling for Testers" at...
#Interactive Session by Apoorva Ram, "The Art of Storytelling for Testers" at...
 
#Interactive Session by Nikhil Jain, "Catch All Mail With Graph" at #ATAGTR2023.
#Interactive Session by Nikhil Jain, "Catch All Mail With Graph" at #ATAGTR2023.#Interactive Session by Nikhil Jain, "Catch All Mail With Graph" at #ATAGTR2023.
#Interactive Session by Nikhil Jain, "Catch All Mail With Graph" at #ATAGTR2023.
 
#Interactive Session by Ashok Kumar S, "Test Data the key to robust test cove...
#Interactive Session by Ashok Kumar S, "Test Data the key to robust test cove...#Interactive Session by Ashok Kumar S, "Test Data the key to robust test cove...
#Interactive Session by Ashok Kumar S, "Test Data the key to robust test cove...
 
#Interactive Session by Seema Kohli, "Test Leadership in the Era of Artificia...
#Interactive Session by Seema Kohli, "Test Leadership in the Era of Artificia...#Interactive Session by Seema Kohli, "Test Leadership in the Era of Artificia...
#Interactive Session by Seema Kohli, "Test Leadership in the Era of Artificia...
 
#Interactive Session by Ashwini Lalit, RRR of Test Automation Maintenance" at...
#Interactive Session by Ashwini Lalit, RRR of Test Automation Maintenance" at...#Interactive Session by Ashwini Lalit, RRR of Test Automation Maintenance" at...
#Interactive Session by Ashwini Lalit, RRR of Test Automation Maintenance" at...
 
#Interactive Session by Srithanga Aishvarya T, "Machine Learning Model to aut...
#Interactive Session by Srithanga Aishvarya T, "Machine Learning Model to aut...#Interactive Session by Srithanga Aishvarya T, "Machine Learning Model to aut...
#Interactive Session by Srithanga Aishvarya T, "Machine Learning Model to aut...
 
#Interactive Session by Kirti Ranjan Satapathy and Nandini K, "Elements of Qu...
#Interactive Session by Kirti Ranjan Satapathy and Nandini K, "Elements of Qu...#Interactive Session by Kirti Ranjan Satapathy and Nandini K, "Elements of Qu...
#Interactive Session by Kirti Ranjan Satapathy and Nandini K, "Elements of Qu...
 
#Interactive Session by Sudhir Upadhyay and Ashish Kumar, "Strengthening Test...
#Interactive Session by Sudhir Upadhyay and Ashish Kumar, "Strengthening Test...#Interactive Session by Sudhir Upadhyay and Ashish Kumar, "Strengthening Test...
#Interactive Session by Sudhir Upadhyay and Ashish Kumar, "Strengthening Test...
 
#Interactive Session by Sayan Deb Kundu, "Testing Gen AI Applications" at #AT...
#Interactive Session by Sayan Deb Kundu, "Testing Gen AI Applications" at #AT...#Interactive Session by Sayan Deb Kundu, "Testing Gen AI Applications" at #AT...
#Interactive Session by Sayan Deb Kundu, "Testing Gen AI Applications" at #AT...
 
#Interactive Session by Dinesh Boravke, "Zero Defects – Myth or Reality" at #...
#Interactive Session by Dinesh Boravke, "Zero Defects – Myth or Reality" at #...#Interactive Session by Dinesh Boravke, "Zero Defects – Myth or Reality" at #...
#Interactive Session by Dinesh Boravke, "Zero Defects – Myth or Reality" at #...
 
#Interactive Session by Saby Saurabh Bhardwaj, "Redefine Quality Assurance –...
#Interactive Session by  Saby Saurabh Bhardwaj, "Redefine Quality Assurance –...#Interactive Session by  Saby Saurabh Bhardwaj, "Redefine Quality Assurance –...
#Interactive Session by Saby Saurabh Bhardwaj, "Redefine Quality Assurance –...
 
#Keynote Session by Sanjay Kumar, "Innovation Inspired Testing!!" at #ATAGTR2...
#Keynote Session by Sanjay Kumar, "Innovation Inspired Testing!!" at #ATAGTR2...#Keynote Session by Sanjay Kumar, "Innovation Inspired Testing!!" at #ATAGTR2...
#Keynote Session by Sanjay Kumar, "Innovation Inspired Testing!!" at #ATAGTR2...
 
#Keynote Session by Schalk Cronje, "Don’t Containerize me" at #ATAGTR2023.
#Keynote Session by Schalk Cronje, "Don’t Containerize me" at #ATAGTR2023.#Keynote Session by Schalk Cronje, "Don’t Containerize me" at #ATAGTR2023.
#Keynote Session by Schalk Cronje, "Don’t Containerize me" at #ATAGTR2023.
 
#Interactive Session by Chidambaram Vetrivel and Venkatesh Belde, "Revolution...
#Interactive Session by Chidambaram Vetrivel and Venkatesh Belde, "Revolution...#Interactive Session by Chidambaram Vetrivel and Venkatesh Belde, "Revolution...
#Interactive Session by Chidambaram Vetrivel and Venkatesh Belde, "Revolution...
 
#Interactive Session by Aniket Diwakar Kadukar and Padimiti Vaidik Eswar Dat...
#Interactive Session by Aniket Diwakar Kadukar and  Padimiti Vaidik Eswar Dat...#Interactive Session by Aniket Diwakar Kadukar and  Padimiti Vaidik Eswar Dat...
#Interactive Session by Aniket Diwakar Kadukar and Padimiti Vaidik Eswar Dat...
 
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Automation for test data anonymization

  • 1. T E S T D ATA A N O N Y M I Z AT I O N Prateek Gupta T R A N S F O R M I N G R E A L D A T A I N T O R E A L I S T I C T E S T D A T A
  • 2. T E S T D ATA A N O N Y M I Z AT I O N Test data anonymization is a critical practice in the realm of data privacy and software testing. It involves the process of transforming sensitive information in a dataset used for testing purposes to protect individuals' privacy and adhere to data protection regulations. This ensures that personal or sensitive data does not get exposed during testing while still allowing organizations to effectively evaluate the functionality, performance, and security of their software or systems. 2
  • 3. T H E N E E D F O R T E S T D ATA A N O N Y M I Z AT I O N Today's digital age collects vast amounts of data for various purposes, including software development and testing This data often contains personally identifiable information (PII) and other sensitive information. Data protection regulations like GDPR and HIPAA require organizations to protect this data. Test data anonymization is necessary to fulfill these obligations and protect individuals' privacy.
  • 4. C O M M O N T E C H N I Q U E S F O R T E S T D ATA A N O N Y M I Z AT I O N Tokenization: Replace sensitive data with tokens that require access to a secure database. Synthetic Data Generation: Generate fictional data mirroring the characteristics of the original data Data Masking: Replace sensitive data with fake or pseudonymous data. Data Encryption: Convert sensitive data into a scrambled format Data Subset Selection: Use a subset of non-sensitive data for testing.
  • 5. 10/19/2023 B E N E F I T S O F T E S T D ATA A N O N Y M I Z AT I O N Risk Mitigation: Minimizes the risk of exposing sensitive data during testing Effective Testing: Allows thorough testing without compromising data privacy. Privacy Protection: Safeguards individuals' sensitive information. Privacy Protection: Safeguards individuals' sensitive information.
  • 6. S O L U T I O N F O R D ATA M A S K I N G A Python script was created to mask data in acceptable form. The script takes an input CSV file with column headers and prompts the user to choose the output file type from CSV, XML, EXCEL, JSON, or SQL. Based on the user's chosen data type for each column, the script generates mock data and writes the output to the selected file type in the output folder. The output file is named as <”input_CSV__file_name" + "mock_data" + {timestap}>.
  • 7. U T I L I T Y A R C H I T E C T U R E
  • 8. T E C H N O L O G Y U S E D A N D A D VA N TA G E S  Faker Library: A Python library for generating fake data with various customizable data types.  YAML configuration file: A human-readable data serialization format used to specify the script's input and output file locations.  pandas: A Python library used for data manipulation and analysis.  Element Tree: A Python library for working with XML documents, which are a popular format for storing structured data. The solution can be used across various environments such as for Load Testing , Performance Testing , User-acceptance Testing, Pre-production and Production. And it offers the capability to generate output data in multiple formats like CSV, XML, Excel, JSON, and SQL
  • 9. T H A N K Y O U F O R D I V I N G I N T H E T E S T D ATA A N O N Y M I Z AT I O N . . P R A T E E K . G U P T A @ T H E P S I . C O M P R E S E N T E D B Y: