This document summarizes a webcast about anonymizing health data. It discusses various methods and standards for anonymizing data, including masking identifiable information, de-identifying data while preserving utility, and a risk-based methodology. It also provides two case studies, one involving anonymizing data from a research registry of births to provide to a researcher, and calculating the risk of re-identification under various plausible attacks. The goal is to balance privacy and appropriate data sharing or use.
Data Privacy: Anonymization & Re-IdentificationMike Nowakowski
With the rise of the Internet of Things, Big Data and Open Data, data privacy is increasingly important to organizations. Data de-identification is a process to remove identifying information from a data set. This presentation will provide a gentle introduction to data de-identification, anonymization and the reverse process of re-identification.
This document discusses data anonymization, which is the process of modifying data to prevent individuals from being identified. It describes how data can be anonymized by choosing which attributes like name, address, and phone number to anonymize or whitelist. The document also mentions challenges like dealing with foreign key and unique constraints and provides alternatives to data anonymization like k-anonymity, l-diversity, and t-closeness models as well as other anonymization tools.
The document outlines an agenda for an information security essentials workshop. It discusses key topics like the principles of information security around confidentiality, integrity and availability. It also covers security governance structures, roles and responsibilities, risk management, information system controls and auditing information security. The objectives are to provide an overview of information security, describe approaches to auditing it, and discuss current trends.
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Alan McSweeney
This paper describes how technologies such as data pseudonymisation and differential privacy technology enables access to sensitive data and unlocks data opportunities and value while ensuring compliance with data privacy legislation and regulations.
Cyber Security Awareness introduction. Why is Cyber Security important? What do I have to do to protect me from Cyber attacks? How to create a IT Security Awareness Plan ?
This document discusses cyber security. It defines cyber security as technologies and processes designed to protect computers, networks, and data from unauthorized access and attacks over the internet. The three core principles of cyber security are confidentiality, integrity, and availability. Several types of cyber attacks are described such as malware, phishing, and denial of service attacks. Major historical cyber attacks are outlined including the Morris Worm in 1988 and the Anthem hack in 2015 that breached 80 million records. Common attack patterns and measures to prevent cyber attacks like using complex passwords and encryption are also summarized.
The document provides an in-depth analysis of India's newly introduced Digital Personal Data Protection Act, 2023. It highlights the Act's key provisions, including the scope of applicability, lawful grounds for processing personal data, consent and notice requirements, obligations of data fiduciaries and significant data fiduciaries, and more. The analysis compares the Act to its previous iterations and other data protection laws. It also provides a compliance roadmap to help organizations adhere to the Act's mandates.
Data Privacy: Anonymization & Re-IdentificationMike Nowakowski
With the rise of the Internet of Things, Big Data and Open Data, data privacy is increasingly important to organizations. Data de-identification is a process to remove identifying information from a data set. This presentation will provide a gentle introduction to data de-identification, anonymization and the reverse process of re-identification.
This document discusses data anonymization, which is the process of modifying data to prevent individuals from being identified. It describes how data can be anonymized by choosing which attributes like name, address, and phone number to anonymize or whitelist. The document also mentions challenges like dealing with foreign key and unique constraints and provides alternatives to data anonymization like k-anonymity, l-diversity, and t-closeness models as well as other anonymization tools.
The document outlines an agenda for an information security essentials workshop. It discusses key topics like the principles of information security around confidentiality, integrity and availability. It also covers security governance structures, roles and responsibilities, risk management, information system controls and auditing information security. The objectives are to provide an overview of information security, describe approaches to auditing it, and discuss current trends.
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Alan McSweeney
This paper describes how technologies such as data pseudonymisation and differential privacy technology enables access to sensitive data and unlocks data opportunities and value while ensuring compliance with data privacy legislation and regulations.
Cyber Security Awareness introduction. Why is Cyber Security important? What do I have to do to protect me from Cyber attacks? How to create a IT Security Awareness Plan ?
This document discusses cyber security. It defines cyber security as technologies and processes designed to protect computers, networks, and data from unauthorized access and attacks over the internet. The three core principles of cyber security are confidentiality, integrity, and availability. Several types of cyber attacks are described such as malware, phishing, and denial of service attacks. Major historical cyber attacks are outlined including the Morris Worm in 1988 and the Anthem hack in 2015 that breached 80 million records. Common attack patterns and measures to prevent cyber attacks like using complex passwords and encryption are also summarized.
The document provides an in-depth analysis of India's newly introduced Digital Personal Data Protection Act, 2023. It highlights the Act's key provisions, including the scope of applicability, lawful grounds for processing personal data, consent and notice requirements, obligations of data fiduciaries and significant data fiduciaries, and more. The analysis compares the Act to its previous iterations and other data protection laws. It also provides a compliance roadmap to help organizations adhere to the Act's mandates.
This document summarizes a seminar on data leakage detection. It introduces the topics of data leakage, the objectives of detecting and identifying the source of leaked data, existing watermarking techniques, and a proposed system using perturbation to detect leakage and assess guilt of agents. The proposed system aims to distribute data across agents in a way that improves the ability to identify the source of any leaks. The document also discusses types of employees that increase leakage risk and the impact of leaks on organizations.
Most investigators turn to Google and common social media platforms such as Facebook and Twitter to conduct research for their investigations. However, much of the Internet is inaccessible through simple searches, and criminals are increasingly turning to the dark web to conduct illicit business.
The dark web is anonymous and requires a special browser to access and some knowledge of how to navigate it safely. However, used properly, it can be a valuable source of information for investigators. It’s worthwhile for every investigator to develop the skills and knowledge to mine this treasure trove of dark data.
Join Chad Los Schumacher, investigator and researcher at iThreat Cyber Group, as he leads participants on an exploration of investigations in the dark web.
Webinar attendees will learn:
What the dark web is and how it fits into the rest of the worldwide web
What can be found on the dark web
How to get to the dark web using Tor and other browsers
How to locate common hubs and resources on the dark web and explore what they have to offer
How to bring leads from the dark web to the surface in an investigation
Presented at: 2nd Annual Gulf Cooperation Council e-Participation & e-Governance Forum – Organised by: Abu Dhabi University Knowledge Group and UAE Telecommunications Regulatory Authority.
9 – 11 September 2013 | Dusit Thani Hotel | Abu Dhabi | UAE.
The document provides an overview of threat landscapes, common threat actors, and tools used in cyber attacks against corporations. It discusses how threat landscapes change over time due to new vulnerabilities, software/hardware, and global events. Common threat actors described include white hat, gray hat, and black hat hackers. A variety of penetration testing and hacking tools are outlined that threat actors use, such as password crackers, wireless hacking tools, network scanners, packet sniffers, and vulnerability exploitation tools. Different types of attacks like eavesdropping, data modification, and IP spoofing are also summarized.
OSINT is defined by both the U.S. Director of National Intelligence and the U.S. Department of Defense (DoD), as "produced from publicly available information that is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement.
SOURCE :https://en.wikipedia.org/wiki/Open-source_intelligence
This document proposes a system to detect data leakage from agents by improving data allocation strategies and injecting fake records. The objective is to identify guilty agents that leak data by giving them enough evidence. It describes how current systems can detect hackers but lack sufficient evidence. The proposed system addresses this by using algorithms to intelligently distribute data and add fake objects, allowing accurate tracing of leakers. It outlines the modules for data allocation, fake objects, optimization, and distribution. The goal is to satisfy agent requests while enabling detection of any agents that leak portions of distributed data.
Cyber threat intelligence (CTI) involves collecting, evaluating, and analyzing cyber threat information using expertise and all-source information to provide insight and understanding of complex cyber situations. CTI can include tactical, operational, and strategic intelligence about security events, indicators of compromise, malware behavior, threat actors, and mapping online threats to geopolitical events over short, medium, and long timeframes. Implementing CTI enables organizations to prepare for and respond to existing and unknown threats through evidence-based knowledge and actionable advice beyond just reactive defense measures.
Social engineering relies on human interaction and manipulation to gain access or information through psychological attacks. It works by exploiting human weaknesses like insecurity, authority, urgency or familiarity. There are physical techniques like impersonation or tailgating and digital methods like phishing emails, spam, hoaxes or hijacking common website misspellings to trick users into giving away access or sensitive information. Social engineering is an ongoing tactic that requires constant vigilance against evolving methods of deception targeting human vulnerabilities.
This document discusses data explosion and the lack of privacy protections for personally identifiable information. It provides examples of how much user data companies collect and store. It then discusses how anonymizing data by removing personally identifying information like names is not enough to protect privacy, as external data can be linked to the anonymized data using quasi-identifiers like zip codes, dates of birth, and other attributes. Methods for achieving k-anonymity are discussed, but these are shown to still leak private information through various attacks. The document examines different privacy models and their limitations in protecting user privacy.
Data Privacy: What you need to know about privacy, from compliance to ethicsAT Internet
Today, balancing business opportunity and customer's data protection has become a difficult challenge. As technology, data sources and targeting abilities grow, so does the crucial need to respect user privacy and ensure a good data protection. But with laws, practices and definitions that are constantly evolving around the world, it can all seem a bit confusing.
Not sure where to start? Wondering how you can better align with privacy law? Then this webinar is for you.
The document discusses various concepts related to information security including:
- Information is obtained by processing raw data and facts;
- Information security aims to protect information and systems from unauthorized access, use, disclosure, disruption or destruction;
- Basic principles of information security are confidentiality, integrity and availability.
This document discusses cyber security. It begins by defining cyber security as the body of technologies, processes, and practices designed to protect networks, devices, programs, and data from attacks, damage, or unauthorized access. It notes that cyber security is important because organizations collect, store, and process unprecedented amounts of data that needs protection. Some common cyber threats discussed include cyberterrorism, cyberwarfare, cyberspionage, and attacks targeting critical infrastructure, networks, applications, cloud systems, and internet of things devices. The document also examines cyber attack life cycles and common prevention methods.
With 1.2 billion monthly active users on Facebook alone, it’s not surprising that social media networks can be a rich source of information for investigators. And because Americans spend more time on social media than any other major Internet activity, including email, social media information and evidence is plentiful. You just need to know how to get it.
Finding, preserving and collecting social media evidence often requires some forensic skills, as well as an understanding of the laws that govern its collection and use. It’s important for investigators to be aware of both the possibilities and limitations of social media forensics.
The presentation explains about Data Security as an industrial concept. It addresses
its concern on Data Loss Prevention in detail, from what it is, its approach, the best practices and
common mistakes people make for the same. The presentation concludes with highlighting
Happiest Minds' expertise in the domain.
Learn more about Happiest Minds Data Security Service Offerings
http://www.happiestminds.com/IT-security-services/data-security-services/
Cyber threat Intelligence and Incident Response by:-Sandeep SinghOWASP Delhi
The broad list of topics include (but not limited to):
- What is Threat Intelligence?
- Type of Threat Intelligence?
- Intelligence Lifecycle
- Threat Intelligence - Classification & Vendor Landscape
- Threat Intelligence Standards (STIX, TAXII, etc.)
- Open Source Threat Intel Tools
- Incident Response
- Role of Threat Intel in Incident Response
- Bonus Agenda
This document provides an overview of digital forensics. It defines digital forensics and forensic science. Digital forensics involves the preservation, collection, analysis and presentation of digital evidence. There are different branches of digital forensics related to different devices. Examples of digital evidence include emails, photos, transaction logs, documents and computer memory contents. Characteristics of good digital evidence are that it is admissible, authentic, fragile, accurate and convincing. Several digital forensic models are described that involve multiple phases of an investigation. The benefits of digital forensics include protecting against theft, fraud, hacking and viruses. Skills required for digital forensics include technical experience, strong analysis and evidence handling skills.
Décryptage de l'Internet des objets au travers des 4 axes majeurs de la transformation digitale (Data, Cloud, Mobile, Empowerment). Présentation de l'AWT dans le cadre du Café Numérique spécial "Internet des objets" à Louvain-la-Neuve, le 20 octobre 2014
Social engineering is manipulating people into taking actions that may not be in their best interest. Hackers use social engineering because it is less resistant than other attacks and more successful. Common goals are money, ego, revenge, or knowledge. Typical attacks involve posing as customer service, deliveries, tech support, etc. Social engineering methods include dumpster diving, shoulder surfing, baiting, vishing, phishing, and whaling. Dumpster diving involves searching trash for useful information. Shoulder surfing means watching people enter passwords. Baiting uses enticing offers to infect systems with malware. Phishing fools people via email into revealing sensitive info or installing malware. Vishing and smishing are phone- and text-
A short overview of the Capital One Data Breach. This presentation was a component of an online webinar for the Caribbean Developer Month 2019 hosted by the Caribbean Developers Group.
Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...Khaled El Emam
The document discusses de-identification and the De-identification Maturity Model (DMM). The DMM is a framework that evaluates an organization's maturity in de-identifying data based on their people, processes, technologies, and measurement practices. It assesses an organization across three dimensions: practice, implementation, and automation. Higher levels of maturity indicate more robust de-identification processes that better balance privacy and data utility. The document provides examples of how the DMM could be used to evaluate different organizations' de-identification practices.
This document provides an overview of various methods for anonymizing data. It discusses three main types of anonymization: 1) masking identifiers in unstructured data, 2) privacy-preserving data analysis for interactive scenarios, and 3) transforming structured data for non-interactive scenarios. For each type, it provides examples of relevant methods and implementations. It also discusses important considerations like balancing data utility and privacy, and the relationships between different aspects of anonymization like use cases, data types, privacy models, transformation models, and more.
This document summarizes a seminar on data leakage detection. It introduces the topics of data leakage, the objectives of detecting and identifying the source of leaked data, existing watermarking techniques, and a proposed system using perturbation to detect leakage and assess guilt of agents. The proposed system aims to distribute data across agents in a way that improves the ability to identify the source of any leaks. The document also discusses types of employees that increase leakage risk and the impact of leaks on organizations.
Most investigators turn to Google and common social media platforms such as Facebook and Twitter to conduct research for their investigations. However, much of the Internet is inaccessible through simple searches, and criminals are increasingly turning to the dark web to conduct illicit business.
The dark web is anonymous and requires a special browser to access and some knowledge of how to navigate it safely. However, used properly, it can be a valuable source of information for investigators. It’s worthwhile for every investigator to develop the skills and knowledge to mine this treasure trove of dark data.
Join Chad Los Schumacher, investigator and researcher at iThreat Cyber Group, as he leads participants on an exploration of investigations in the dark web.
Webinar attendees will learn:
What the dark web is and how it fits into the rest of the worldwide web
What can be found on the dark web
How to get to the dark web using Tor and other browsers
How to locate common hubs and resources on the dark web and explore what they have to offer
How to bring leads from the dark web to the surface in an investigation
Presented at: 2nd Annual Gulf Cooperation Council e-Participation & e-Governance Forum – Organised by: Abu Dhabi University Knowledge Group and UAE Telecommunications Regulatory Authority.
9 – 11 September 2013 | Dusit Thani Hotel | Abu Dhabi | UAE.
The document provides an overview of threat landscapes, common threat actors, and tools used in cyber attacks against corporations. It discusses how threat landscapes change over time due to new vulnerabilities, software/hardware, and global events. Common threat actors described include white hat, gray hat, and black hat hackers. A variety of penetration testing and hacking tools are outlined that threat actors use, such as password crackers, wireless hacking tools, network scanners, packet sniffers, and vulnerability exploitation tools. Different types of attacks like eavesdropping, data modification, and IP spoofing are also summarized.
OSINT is defined by both the U.S. Director of National Intelligence and the U.S. Department of Defense (DoD), as "produced from publicly available information that is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement.
SOURCE :https://en.wikipedia.org/wiki/Open-source_intelligence
This document proposes a system to detect data leakage from agents by improving data allocation strategies and injecting fake records. The objective is to identify guilty agents that leak data by giving them enough evidence. It describes how current systems can detect hackers but lack sufficient evidence. The proposed system addresses this by using algorithms to intelligently distribute data and add fake objects, allowing accurate tracing of leakers. It outlines the modules for data allocation, fake objects, optimization, and distribution. The goal is to satisfy agent requests while enabling detection of any agents that leak portions of distributed data.
Cyber threat intelligence (CTI) involves collecting, evaluating, and analyzing cyber threat information using expertise and all-source information to provide insight and understanding of complex cyber situations. CTI can include tactical, operational, and strategic intelligence about security events, indicators of compromise, malware behavior, threat actors, and mapping online threats to geopolitical events over short, medium, and long timeframes. Implementing CTI enables organizations to prepare for and respond to existing and unknown threats through evidence-based knowledge and actionable advice beyond just reactive defense measures.
Social engineering relies on human interaction and manipulation to gain access or information through psychological attacks. It works by exploiting human weaknesses like insecurity, authority, urgency or familiarity. There are physical techniques like impersonation or tailgating and digital methods like phishing emails, spam, hoaxes or hijacking common website misspellings to trick users into giving away access or sensitive information. Social engineering is an ongoing tactic that requires constant vigilance against evolving methods of deception targeting human vulnerabilities.
This document discusses data explosion and the lack of privacy protections for personally identifiable information. It provides examples of how much user data companies collect and store. It then discusses how anonymizing data by removing personally identifying information like names is not enough to protect privacy, as external data can be linked to the anonymized data using quasi-identifiers like zip codes, dates of birth, and other attributes. Methods for achieving k-anonymity are discussed, but these are shown to still leak private information through various attacks. The document examines different privacy models and their limitations in protecting user privacy.
Data Privacy: What you need to know about privacy, from compliance to ethicsAT Internet
Today, balancing business opportunity and customer's data protection has become a difficult challenge. As technology, data sources and targeting abilities grow, so does the crucial need to respect user privacy and ensure a good data protection. But with laws, practices and definitions that are constantly evolving around the world, it can all seem a bit confusing.
Not sure where to start? Wondering how you can better align with privacy law? Then this webinar is for you.
The document discusses various concepts related to information security including:
- Information is obtained by processing raw data and facts;
- Information security aims to protect information and systems from unauthorized access, use, disclosure, disruption or destruction;
- Basic principles of information security are confidentiality, integrity and availability.
This document discusses cyber security. It begins by defining cyber security as the body of technologies, processes, and practices designed to protect networks, devices, programs, and data from attacks, damage, or unauthorized access. It notes that cyber security is important because organizations collect, store, and process unprecedented amounts of data that needs protection. Some common cyber threats discussed include cyberterrorism, cyberwarfare, cyberspionage, and attacks targeting critical infrastructure, networks, applications, cloud systems, and internet of things devices. The document also examines cyber attack life cycles and common prevention methods.
With 1.2 billion monthly active users on Facebook alone, it’s not surprising that social media networks can be a rich source of information for investigators. And because Americans spend more time on social media than any other major Internet activity, including email, social media information and evidence is plentiful. You just need to know how to get it.
Finding, preserving and collecting social media evidence often requires some forensic skills, as well as an understanding of the laws that govern its collection and use. It’s important for investigators to be aware of both the possibilities and limitations of social media forensics.
The presentation explains about Data Security as an industrial concept. It addresses
its concern on Data Loss Prevention in detail, from what it is, its approach, the best practices and
common mistakes people make for the same. The presentation concludes with highlighting
Happiest Minds' expertise in the domain.
Learn more about Happiest Minds Data Security Service Offerings
http://www.happiestminds.com/IT-security-services/data-security-services/
Cyber threat Intelligence and Incident Response by:-Sandeep SinghOWASP Delhi
The broad list of topics include (but not limited to):
- What is Threat Intelligence?
- Type of Threat Intelligence?
- Intelligence Lifecycle
- Threat Intelligence - Classification & Vendor Landscape
- Threat Intelligence Standards (STIX, TAXII, etc.)
- Open Source Threat Intel Tools
- Incident Response
- Role of Threat Intel in Incident Response
- Bonus Agenda
This document provides an overview of digital forensics. It defines digital forensics and forensic science. Digital forensics involves the preservation, collection, analysis and presentation of digital evidence. There are different branches of digital forensics related to different devices. Examples of digital evidence include emails, photos, transaction logs, documents and computer memory contents. Characteristics of good digital evidence are that it is admissible, authentic, fragile, accurate and convincing. Several digital forensic models are described that involve multiple phases of an investigation. The benefits of digital forensics include protecting against theft, fraud, hacking and viruses. Skills required for digital forensics include technical experience, strong analysis and evidence handling skills.
Décryptage de l'Internet des objets au travers des 4 axes majeurs de la transformation digitale (Data, Cloud, Mobile, Empowerment). Présentation de l'AWT dans le cadre du Café Numérique spécial "Internet des objets" à Louvain-la-Neuve, le 20 octobre 2014
Social engineering is manipulating people into taking actions that may not be in their best interest. Hackers use social engineering because it is less resistant than other attacks and more successful. Common goals are money, ego, revenge, or knowledge. Typical attacks involve posing as customer service, deliveries, tech support, etc. Social engineering methods include dumpster diving, shoulder surfing, baiting, vishing, phishing, and whaling. Dumpster diving involves searching trash for useful information. Shoulder surfing means watching people enter passwords. Baiting uses enticing offers to infect systems with malware. Phishing fools people via email into revealing sensitive info or installing malware. Vishing and smishing are phone- and text-
A short overview of the Capital One Data Breach. This presentation was a component of an online webinar for the Caribbean Developer Month 2019 hosted by the Caribbean Developers Group.
Big Data Meets Privacy:De-identification Maturity Model for Benchmarking and ...Khaled El Emam
The document discusses de-identification and the De-identification Maturity Model (DMM). The DMM is a framework that evaluates an organization's maturity in de-identifying data based on their people, processes, technologies, and measurement practices. It assesses an organization across three dimensions: practice, implementation, and automation. Higher levels of maturity indicate more robust de-identification processes that better balance privacy and data utility. The document provides examples of how the DMM could be used to evaluate different organizations' de-identification practices.
This document provides an overview of various methods for anonymizing data. It discusses three main types of anonymization: 1) masking identifiers in unstructured data, 2) privacy-preserving data analysis for interactive scenarios, and 3) transforming structured data for non-interactive scenarios. For each type, it provides examples of relevant methods and implementations. It also discusses important considerations like balancing data utility and privacy, and the relationships between different aspects of anonymization like use cases, data types, privacy models, transformation models, and more.
1. The document discusses privacy challenges in the era of big data. It defines big data as extremely large data sets that are difficult to store, manage, and process using traditional methods due to the volume of data and processing speed/costs.
2. While big data provides benefits from insights discovered through analysis, it also challenges core privacy principles. Data collected and analyzed at large scale may not be truly anonymous, and re-identification is possible using additional data sources. Existing privacy laws may not cover analysis of non-personal data.
3. To address privacy risks, the document recommends expanding definitions of personal data and consent under privacy laws. Organizations collecting and processing big data should also implement privacy impact assessments, be
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataarx-deidentifier
Website with further information: http://arx.deidentifier.org
Description of this talk:
Collaboration and data sharing have become core elements of biomedical research. Especially when sensitive data from distributed sources are linked, privacy threats have to be considered. Statistical disclosure control allows the protection of sensitive data by introducing fuzziness. Reduction of data quality, however, needs to be balanced against gains in protection. Therefore, tools are needed which provide a good overview of the anonymization process to those responsible for data sharing. These tools require graphical interfaces and the use of intuitive and replicable methods. In addition, extensive testing, documentation and openness to reviews by the community are important. Existing publicly available software is limited in functionality, and often active support is lacking. We present the data anonymization tool ARX, which has been developed in close cooperation between the Chair for Biomedical Informatics, the Chair for IT Security and the Chair for Database Systems at Technische Universität München (TUM), Germany. ARX enables the de-identification of structured data (i.e., tabular data) and implements a wide variety of privacy methods in a highly efficient manner. It is extensible, well documented and actively supported. ARX provides an intuitive cross-platform graphical interface and offers a public API for integration with other software systems.
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataarx-deidentifier
Website with further information: http://arx.deidentifier.org
Description of this talk:
Collaboration and data sharing have become core elements of biomedical research. Especially when sensitive data from distributed sources are linked, privacy threats have to be considered. Statistical disclosure control allows the protection of sensitive data by introducing fuzziness. Reduction of data quality, however, needs to be balanced against gains in protection. Therefore, tools are needed which provide a good overview of the anonymization process to those responsible for data sharing. These tools require graphical interfaces and the use of intuitive and replicable methods. In addition, extensive testing, documentation and openness to reviews by the community are important. Existing publicly available software is limited in functionality, and often active support is lacking. We present the data anonymization tool ARX, which has been developed in close cooperation between the Chair for Biomedical Informatics, the Chair for IT Security and the Chair for Database Systems at Technische Universität München (TUM), Germany. ARX enables the de-identification of structured data (i.e., tabular data) and implements a wide variety of privacy methods in a highly efficient manner. It is extensible, well documented and actively supported. ARX provides an intuitive cross-platform graphical interface and offers a public API for integration with other software systems.
Engineering data privacy - The ARX data anonymization toolarx-deidentifier
Website with further information: http://arx.deidentifier.org
Description of this talk:
While a plethora of methods have been proposed for dealing with many aspects of de-identifying clinical data, only few (prototypical) implementations are available. Actually, the complexity of implementing privacy technologies is an often overlooked challenge.
In this talk we will present the open source data de-identification tool ARX, which has been carefully engineered to support multiple privacy technologies for relational datasets. Our tool bridges the gap between different scientific disciplines by integrating methods developed and used by the statistics community with data anonymization techniques developed by computer scientists.
ARX has been designed from the ground up to ensure scalability and it is able to process very large datasets on commodity hardware. The software implements a large set of
privacy models: (1) syntactic privacy models, such as k-anonymity, l-diversity, t-closeness and δ-presence, (2) statistical models for re-identification risks, and (3) differential privacy. In the talk, we will focus on measures to reduce the uniqueness of records. ARX also supports more than ten different methods for evaluating data utility, including loss, precision, non-uniform entropy and KL divergence.
In ARX, de-identification of data can be performed automatically, semi-automatically and manually using a complex method that integrates global recoding, local recoding, categorization, generalization, suppression, microaggregation and top/bottom-coding. All methods are accessible via a comprehensive cross-platform graphical user interface.
O'Reilly Webcast: Anonymizing Health DataLuk Arbuckle
Authors: Khaled El Emam, Luk Arbuckle
How can health data be released to analysts and app developers who desperately want it? Under current legislation, the use and disclosure of health data for secondary purposes is limited—patients must either consent to have their data used, which is often difficult to get and can lead to bias, or the data needs to be de-identified (there are some exceptions, but we won't address them in this webinar.)
To ensure that end users get data that is anonymized and highly useful, we focus on the HIPAA Privacy Rule De-identification Standard. We've built our risk-based methodology for anonymizing data around the foundation created by HIPAA's Statistical Method. In this webcast we'll share several of the case studies that we've described in our O'Reilly book Anonymizing Health Data, which is devoted to examples of how we anonymized real-world data sets. In almost every case in which we've anonymized data, there have been new and interesting challenges to overcome.
Medical identity theft is a growing problem, with over 275,000 cases reported last year in the US. Thieves are stealing patients' personal and medical information to impersonate them and charge medical procedures and services without the patients' consent. This can lead to thousands of dollars in fraudulent bills and damage credit reports. Experts warn that medical identity theft may worsen as more health records are digitized, as digital files can be easily stolen in large quantities. Victims face financial costs resolving the identity theft and correcting erroneous medical files, with the average fraud totaling over $12,000.
DEF CON 23 - CHRIS ROCK - i will kill you how to get away with muFelipe Prado
This document provides information about how to falsify birth and death records to create fake identities or harm others. It discusses how to register deaths without proper credentials by taking advantage of vulnerabilities in online registration systems. It also describes how to generate "shelf babies" and identities that have built up legitimate financial histories that could then be used to commit crimes like money laundering or identity theft. The document warns that anyone with this knowledge could potentially kill people on paper for revenge or profit due to weaknesses in how death records are registered and verified globally.
This chapter discusses several challenges organizations face regarding employee privacy and influence over personal lives. It covers issues like drug testing of employees, monitoring their communications and activities, and collecting health and personal information. The chapter also examines organizational policies on work conditions, maternity leave, childcare, and efforts to redesign work to improve employee satisfaction and engagement.
Can I share this? Curating sensitive dataGraham Smith
IDCC20 slides - curation of sensitive data.
Since 2016, the publisher Springer Nature has supported data sharing at its journals through the implementation of standardised research data policies. Authors are encouraged to share their data openly, and preferably in a suitable data repository, assisted by curators from the Springer Nature data publishing team where needed.
In providing assistance to authors in sharing their data openly, sensitive data has emerged as a common concern for authors, particularly in cases where their study involved human participants. Although excellent guidance exists to aid authors in decision-making around sensitive data sharing, we continue to encounter authors who are unsure whether their data can be “safely” shared.
In this lightning talk we will describe our approach to supporting authors in sharing sensitive data, with reference to existing guidance for sensitive data preparation based on an assessment of direct and indirect identifiers within the dataset. Direct identifiers are those which uniquely identify a participant within a study, for example their full name, date of birth or facial photograph. Indirect identifiers are often more difficult to recognise, as they do not individually identify a participant but may do in combination with other indirect identifiers, for example rare diseases, unusual job titles, or city of birth. Using a sample dataset we will give a visual demonstration of how appropriate processing can be used to ensure that individual study participants cannot be identified within a dataset, while retaining reuse value of the data. We will also show example data that are deemed identifiable, data that are suitably de-identified for sharing and the difficult cases that fall somewhere in between.
Our support for researchers in sharing their data is fundamentally collaborative. Researchers are experts on their own data, while publishers’ editorial expertise, standards and tools provide a rigorous framework of checks for a dataset to pass through before publication. This process seeks to protect the confidentiality of study participants, whilst ensuring maximal reuse of research data. Key judgements from both the data creator and the curator balance the removal of identifying information with retaining the value and usefulness of the data.
Learn what to do if you paid someone you suspect of being a scammer, gave them personal information, or granted them access to your phone or computer.
Scammers are incredibly convincing. They contact, email, and text us in an attempt to obtain our money or sensitive personal information, such as our Social Security numbers or account numbers. They're also really good at what they do. Here's what to do if you paid or offered your personal information to someone you suspect is a fraudster.
This document provides information and guidance for law enforcement on how to appropriately detect and respond to situations involving individuals with autism spectrum disorders or other developmental disabilities. It notes that such individuals are more likely to encounter law enforcement due to behaviors related to their conditions. The document outlines key facts about autism and developmental disabilities, relevant Illinois laws requiring police training, and approaches that can help reduce risks when interacting with these individuals, such as using a calm clinical approach instead of confrontation. It emphasizes that the highest risk period is initial uninformed contact.
Whistleblowing In Health Care
The Ethics Of The Whistle Blower Essay
Whistleblowers Essay
The Ethics Of Whistle Blowing Essay
Essay on Whistleblowing
Consequences Of Whistleblowing
Pros And Cons Of Whistleblowing
Whistleblowing
Disadvantages Of Whistleblowing
Examples Of Whistleblowing
Whistleblowing Essay
The Ethics Of Whistle Blowing Essay
Whistleblower Essay examples
Whistle Blowing
Whistleblowing Essay
Whistle Blowing
The Treatment of Whistleblowers Essay
Whistle-Blowing In The Workplace Essay
Whistleblowing And Its Impact On Organizations
Whistle-Blowing: Enron Essay
This document provides an overview of HIPAA privacy and confidentiality training. It discusses what HIPAA is, how it protects patient privacy and confidentiality, and outlines medical professionals' duties to maintain privacy and keep health information secure. Failure to comply with HIPAA privacy rules can result in criminal penalties such as fines up to $250,000 and imprisonment up to 10 years. The goal of the training is to educate medical staff on patient privacy rights and the legal requirements to keep health information confidential.
This document discusses the hidden problem of elder financial abuse. It notes that 83% of financial institutions suspect elderly client financial exploitation, most often by relatives. Common types of exploitation include forgery, misappropriation of funds, and power of attorney abuse. The document provides clues for identifying financial exploitation and outlines steps victims and witnesses can take to report abuse, including contacting adult protective services or an elder law attorney. Overall, the document aims to raise awareness of elder financial abuse and provide resources for victims.
Adult protection and safeguarding presentationJulian Dodd
This document discusses safeguarding vulnerable adults from abuse. It defines key terms like abuse, vulnerable adults, and the legal framework around safeguarding. It provides statistics on abuse including most common types of abuse, locations it occurs, demographics of victims and abusers. It also outlines how to recognize, report and respond to abuse, including enabling disclosure, understanding indicators of distress, and issues around confidentiality and consent.
Company names mentioned herein are the property of, and may be trademarks of, their respective owners and are for educational purposes only.
17 U.S. Code § 107 - Limitations on exclusive rights: Fair use
Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright.
This document provides guidelines for notifying individuals of a data breach under HIPAA and HITECH regulations. It outlines three steps: 1) determine the level of harm caused by assessing what data was exposed, 2) document the data exposure and harm level, and 3) notify affected individuals and regulatory agencies as required, consulting legal counsel. Notification requirements depend on the type of data lost and level of harm, such as reputational, financial, or other harm.
This training module provides an overview of HIPAA privacy rules for nursing students and faculty at South Arkansas Community College. It discusses the purpose and key aspects of HIPAA, including the different titles that make up the law and what protected health information is. It emphasizes the importance of protecting patient privacy and outlines specific ways to ensure privacy in clinical settings, such as not discussing patients in public places and following proper procedures for handling protected health information.
Data Breach Notifications Laws - Time for a Pimp Slap Presented by Steve Werb...Steve Werby
Data breach notification laws have proliferated worldwide, beginning with California’s law, which was enacted nearly a decade ago. As a result, citizens are being bombarded by breach notifications and media coverage of data exposures has skyrocketed. But are these increasingly onerous laws leading to stronger information security and better decisions by citizens or are they backfiring? I’ll compare existing laws, analyze data breach notifications and explore the effects of these laws, including feedback from citizens and information security professionals. By comparing data exposure disclosure to other negative events that don't require disclosure and sharing alternate disclosure models, I'll leave the audience questioning whether there's a better way.
Scammers will stop at nothing to get what they want, and seniors and the disabled are common prey for scam artists. Here are few tips and helpful resources to prevent scams and fraud.
This document provides a summary of the Health Insurance Portability and Accountability Act (HIPAA) for nursing students. It discusses the purpose and key aspects of HIPAA such as protecting patient privacy and confidentiality. It outlines the rules for use and disclosure of protected health information, and the consequences of violating HIPAA regulations, which can include civil penalties, criminal charges, and dismissal from nursing programs. Students are instructed to only access the minimum health information needed for their roles and to protect patient data.
The document discusses Oregon laws pertaining to HIV/AIDS testing and confidentiality. It outlines that informed consent is required for HIV testing but exceptions exist for situations like occupational exposure or court orders. Test results must be kept confidential except for certain disclosures like to public health authorities. The duty to warn potential contacts exists if an infected person refuses to inform partners about their status and continues high-risk behaviors. Medical organizations recommend counseling the individual first before warning contacts if necessary.
Three California hospitals were fined by state health officials for HIPAA violations involving the medical records of a celebrity patient. Nearly two dozen medical workers at one 218-bed facility illegally accessed the records of the woman who gave birth to octuplets. The hospital was fined $250,000 and over 15 employees were fired or resigned. New HIPAA rules expanded enforcement and increased penalties for privacy violations.
Canadian AI 2014 Conference Keynote - Deploying SMC in PracticeKhaled El Emam
This document discusses methods for facilitating data sharing and secondary use of health data while preserving privacy, including anonymization and secure multi-party computation. It provides examples of how these methods allow useful analyses like disease surveillance and anonymous record linkage to be performed without direct access to identifying information. Secure computation techniques allow computations on encrypted data, while anonymization aims to prevent re-identification of data through techniques like de-identification. Critical factors in applying these methods include managing risks, engaging relevant stakeholders, and protecting intellectual property.
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...Khaled El Emam
Over the last several months a staggering series of revelations have been reported about the wide-reaching efforts of the United States National Security Agency (NSA) to intercept digital communications. Though not surprising to learn the NSA—an intelligence organization—is spying on global targets, the apparent scale and sophistication of their capabilities have been turning heads internationally.
Last September, troubling allegations emerged suggesting the NSA influenced the National Institute of Standards and Technology (NIST) into standardizing a cryptographic primitive with a secret backdoor. If true, the backdoor would provide the NSA with a major advantage in its efforts to snoop communications through something known as the Dual Elliptic Curve Deterministic Random Bit Generator (Dual_EC_DRBG). Although the ensuing backlash has seen the offending code yanked from most major security products, surprising details about the program continue to emerge.
In this talk we will explain why random bits are crucial to online privacy, and what you could potentially do to people whose "random" bits you can predict. We will talk about Dual_EC_DRBG, and explain how the backdoor works in general terms. Finally, we will discuss some of the implications of state-level adversaries to health privacy and offer some high-level directions for healthcare providers to pursue.
Facilitating Analytics while Protecting PrivacyKhaled El Emam
This document summarizes Khaled El Emam's presentation on facilitating analytics while protecting individual privacy using data de-identification. It discusses two case studies where health data was shared after analyzing privacy risks. The first was a project with the Louisiana Department of Health providing de-identified Medicaid claims data for a coding competition. The second was sharing data from Mount Sinai School of Medicine's World Trade Center Disaster Registry. The presentation outlines the methodology used to de-identify the data, including removing direct identifiers, generalizing quasi-identifiers, and techniques like date shifting to prevent re-identification.
The document discusses de-identification of health research data. It provides motivations for de-identification such as obtaining consent not being practical for large databases and complying with regulations. Examples are given of alleged re-identification attacks like AOL search data and Netflix movie ratings, but these are argued to not actually demonstrate that properly de-identified data using standards like HIPAA can be re-identified. The document emphasizes the importance of avoiding attribute disclosure even if identity disclosure is prevented, and defines what constitutes de-identified data according to common standards.
Risk Based De-identification for Sharing Health DataKhaled El Emam
This presentation describes a methodology, tools, and experiences for the de-identification of health information. The objective is to support data sharing for the purpose of research and public health.
The Adoption of Personal Health Records by ConsumersKhaled El Emam
Standalone personal health records (PHRs) allow consumers to manually enter and track their health information. However, data quality can be unreliable since consumers often inaccurately self-report medical details. While some consumers are willing to pay subscription fees for PHR access, retention rates are low and standalone PHRs see high attrition over time. Tethered PHRs link to electronic medical records and allow consumers to view provider information, but few offer secure online communication with doctors currently. Interconnected and shared PHR models that link multiple data sources could provide more complete health histories but also pose greater privacy and security risks.
The Use of EDC in Canadian Clinical TrialsKhaled El Emam
Presentation at CHEO Research Rounds on a study to estimate the proportion of Canadian clinical trials that are using an Electronic Data Capture system during the period 2006-2007.
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotesPsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptxHolistified Wellness
We’re talking about Vedic Meditation, a form of meditation that has been around for at least 5,000 years. Back then, the people who lived in the Indus Valley, now known as India and Pakistan, practised meditation as a fundamental part of daily life. This knowledge that has given us yoga and Ayurveda, was known as Veda, hence the name Vedic. And though there are some written records, the practice has been passed down verbally from generation to generation.
ABDOMINAL TRAUMA in pediatrics part one.drhasanrajab
Abdominal trauma in pediatrics refers to injuries or damage to the abdominal organs in children. It can occur due to various causes such as falls, motor vehicle accidents, sports-related injuries, and physical abuse. Children are more vulnerable to abdominal trauma due to their unique anatomical and physiological characteristics. Signs and symptoms include abdominal pain, tenderness, distension, vomiting, and signs of shock. Diagnosis involves physical examination, imaging studies, and laboratory tests. Management depends on the severity and may involve conservative treatment or surgical intervention. Prevention is crucial in reducing the incidence of abdominal trauma in children.
Adhd Medication Shortage Uk - trinexpharmacy.comreignlana06
The UK is currently facing a Adhd Medication Shortage Uk, which has left many patients and their families grappling with uncertainty and frustration. ADHD, or Attention Deficit Hyperactivity Disorder, is a chronic condition that requires consistent medication to manage effectively. This shortage has highlighted the critical role these medications play in the daily lives of those affected by ADHD. Contact : +1 (747) 209 – 3649 E-mail : sales@trinexpharmacy.com
Basavarajeeyam is an important text for ayurvedic physician belonging to andhra pradehs. It is a popular compendium in various parts of our country as well as in andhra pradesh. The content of the text was presented in sanskrit and telugu language (Bilingual). One of the most famous book in ayurvedic pharmaceutics and therapeutics. This book contains 25 chapters called as prakaranas. Many rasaoushadis were explained, pioneer of dhatu druti, nadi pareeksha, mutra pareeksha etc. Belongs to the period of 15-16 century. New diseases like upadamsha, phiranga rogas are explained.
- Video recording of this lecture in English language: https://youtu.be/kqbnxVAZs-0
- Video recording of this lecture in Arabic language: https://youtu.be/SINlygW1Mpc
- Link to download the book free: https://nephrotube.blogspot.com/p/nephrotube-nephrology-books.html
- Link to NephroTube website: www.NephroTube.com
- Link to NephroTube social media accounts: https://nephrotube.blogspot.com/p/join-nephrotube-on-social-media.html
These lecture slides, by Dr Sidra Arshad, offer a quick overview of the physiological basis of a normal electrocardiogram.
Learning objectives:
1. Define an electrocardiogram (ECG) and electrocardiography
2. Describe how dipoles generated by the heart produce the waveforms of the ECG
3. Describe the components of a normal electrocardiogram of a typical bipolar lead (limb II)
4. Differentiate between intervals and segments
5. Enlist some common indications for obtaining an ECG
6. Describe the flow of current around the heart during the cardiac cycle
7. Discuss the placement and polarity of the leads of electrocardiograph
8. Describe the normal electrocardiograms recorded from the limb leads and explain the physiological basis of the different records that are obtained
9. Define mean electrical vector (axis) of the heart and give the normal range
10. Define the mean QRS vector
11. Describe the axes of leads (hexagonal reference system)
12. Comprehend the vectorial analysis of the normal ECG
13. Determine the mean electrical axis of the ventricular QRS and appreciate the mean axis deviation
14. Explain the concepts of current of injury, J point, and their significance
Study Resources:
1. Chapter 11, Guyton and Hall Textbook of Medical Physiology, 14th edition
2. Chapter 9, Human Physiology - From Cells to Systems, Lauralee Sherwood, 9th edition
3. Chapter 29, Ganong’s Review of Medical Physiology, 26th edition
4. Electrocardiogram, StatPearls - https://www.ncbi.nlm.nih.gov/books/NBK549803/
5. ECG in Medical Practice by ABM Abdullah, 4th edition
6. Chapter 3, Cardiology Explained, https://www.ncbi.nlm.nih.gov/books/NBK2214/
7. ECG Basics, http://www.nataliescasebook.com/tag/e-c-g-basics
Muktapishti is a traditional Ayurvedic preparation made from Shoditha Mukta (Purified Pearl), is believed to help regulate thyroid function and reduce symptoms of hyperthyroidism due to its cooling and balancing properties. Clinical evidence on its efficacy remains limited, necessitating further research to validate its therapeutic benefits.
Here is the updated list of Top Best Ayurvedic medicine for Gas and Indigestion and those are Gas-O-Go Syp for Dyspepsia | Lavizyme Syrup for Acidity | Yumzyme Hepatoprotective Capsules etc
Rasamanikya is a excellent preparation in the field of Rasashastra, it is used in various Kushtha Roga, Shwasa, Vicharchika, Bhagandara, Vatarakta, and Phiranga Roga. In this article Preparation& Comparative analytical profile for both Formulationon i.e Rasamanikya prepared by Kushmanda swarasa & Churnodhaka Shodita Haratala. The study aims to provide insights into the comparative efficacy and analytical aspects of these formulations for enhanced therapeutic outcomes.
2. Anonymizing Health Data
Part 1 of Webcast: Intro and Methodology
Part 2 of Webcast: A Look at Our Case Studies
Part 3 of Webcast: Questions and Answers
Khaled El Emam & Luk
6. Anonymizing Health Data
Consent needs to be informed.
Not all health care providers are willing to
share their patient’s PHI.
To Anonymize or not to Anonymize
Khaled El Emam & Luk
7. Anonymizing Health Data
Consent needs to be informed.
Not all health care providers are willing to
share their patient’s PHI.
Anonymization allows for the sharing of health information.
To Anonymize or not to Anonymize
Khaled El Emam & Luk
8. Anonymizing Health Data
Consent needs to be informed.
Not all health care providers are willing to
share their patient’s PHI.
Anonymization allows for the sharing of health information.
To Anonymize or not to Anonymize
Compelling financial case. Breach cost ~$200 per patient.
Khaled El Emam & Luk
9. Anonymizing Health Data
Consent needs to be informed.
Not all health care providers are willing to
share their patient’s PHI.
Anonymization allows for the sharing of health information.
To Anonymize or not to Anonymize
Compelling financial case. Breach cost ~$200 per patient.
Khaled El Emam & Luk
10. Anonymizing Health Data
Consent needs to be informed.
Not all health care providers are willing to
share their patient’s PHI.
Anonymization allows for the sharing of health information.
To Anonymize or not to Anonymize
Privacy protective behaviors by patients.
Compelling financial case. Breach cost ~$200 per patient.
Khaled El Emam & Luk
14. Anonymizing Health Data
Masking Standards
Creating pseudonyms.
First name, last name, SSN.
Distortion of data—no analytics.
Khaled El Emam & Luk
15. Anonymizing Health Data
Masking Standards
Removing a whole field.
Creating pseudonyms.
First name, last name, SSN.
Distortion of data—no analytics.
Khaled El Emam & Luk
16. Anonymizing Health Data
Masking Standards
Removing a whole field.
Creating pseudonyms.
Replacing actual values with random ones.
First name, last name, SSN.
Distortion of data—no analytics.
Khaled El Emam & Luk
22. Anonymizing Health Data
What’s “Actual Knowledge”?
Info, alone or in combo, that could identify
an individual.
Khaled El Emam & Luk
23. Anonymizing Health Data
What’s “Actual Knowledge”?
Info, alone or in combo, that could identify
an individual.
Has to be specific to the data set—not
theoretical.
Khaled El Emam & Luk
24. Anonymizing Health Data
What’s “Actual Knowledge”?
Info, alone or in combo, that could identify
an individual.
Has to be specific to the data set—not
theoretical.
Occupation Mayor of Gotham.
Khaled El Emam & Luk
25. Anonymizing Health Data
Heuristics, or rules of thumb.
Minimal distortion of data—for analytics.
Age, sex, race, address, income.
Safe Harbor in HIPAA Privacy Rule.
De-identification Standards
Khaled El Emam & Luk
26. Anonymizing Health Data
Heuristics, or rules of thumb.
Statistical method in HIPAA Privacy Rule.
Minimal distortion of data—for analytics.
Age, sex, race, address, income.
Safe Harbor in HIPAA Privacy Rule.
De-identification Standards
Khaled El Emam & Luk
29. Anonymizing Health Data
De-identification Myths
Myth: It’s possible to re-identify most, if not
all, data.
Using robust methods, evidence suggests risk
can be very small.
Khaled El Emam & Luk
30. Anonymizing Health Data
De-identification Myths
Myth: It’s possible to re-identify most, if not
all, data.
Myth: Genomic sequences are not
identifiable, or are easy to re-identify.
Using robust methods, evidence suggests risk
can be very small.
Khaled El Emam & Luk
31. Anonymizing Health Data
De-identification Myths
Myth: It’s possible to re-identify most, if not
all, data.
Myth: Genomic sequences are not
identifiable, or are easy to re-identify.
In some cases can re-identify, difficult to de-
identify using our methods.
Using robust methods, evidence suggests risk
can be very small.
Khaled El Emam & Luk
33. Anonymizing Health Data
A Risk-based De-identification Methodology
The risk of re-identification can be quantified.
Khaled El Emam & Luk
34. Anonymizing Health Data
A Risk-based De-identification Methodology
The risk of re-identification can be quantified.
The Goldilocks principle:
balancing privacy with data utility.
Khaled El Emam & Luk
36. Anonymizing Health Data
A Risk-based De-identification Methodology
The risk of re-identification can be quantified.
The Goldilocks principle:
balancing privacy with data utility.
The re-identification risk needs to be very small.
Khaled El Emam & Luk
37. Anonymizing Health Data
A Risk-based De-identification Methodology
The risk of re-identification can be quantified.
The Goldilocks principle:
balancing privacy with data utility.
De-identification involves a mix of technical, contractual,
and other measures.
The re-identification risk needs to be very small.
Khaled El Emam & Luk
38. Anonymizing Health Data
Steps in the De-identification Methodology
Step 1: Select Direct and Indirect Identifiers
Step 2: Setting the Threshold
Step 3: Examining Plausible Attacks
Step 4: De-identifying the Data
Step 5: Documenting the Process
Khaled El Emam & Luk
40. Anonymizing Health Data
Direct identifiers: name, telephone number, health
insurance card number, medical record number.
Step 1: Select Direct and Indirect Identifiers
Khaled El Emam & Luk
41. Anonymizing Health Data
Direct identifiers: name, telephone number, health
insurance card number, medical record number.
Indirect identifiers, or quasi-identifiers: sex, date of birth,
ethnicity, locations, event dates, medical codes.
Step 1: Select Direct and Indirect Identifiers
Khaled El Emam & Luk
44. Anonymizing Health Data
Maximum acceptable risk for sharing data.
Needs to be quantitative and defensible.
Step 2: Setting the Threshold
Khaled El Emam & Luk
45. Anonymizing Health Data
Maximum acceptable risk for sharing data.
Needs to be quantitative and defensible.
Is the data in going to be in the public domain?
Step 2: Setting the Threshold
Khaled El Emam & Luk
46. Anonymizing Health Data
Maximum acceptable risk for sharing data.
Needs to be quantitative and defensible.
Is the data in going to be in the public domain?
Extent of invasion-of-privacy when data was shared?
Step 2: Setting the Threshold
Khaled El Emam & Luk
48. Anonymizing Health Data
Recipient deliberately attempts to re-identify the data.
Step 3: Examining Plausible Attacks
Khaled El Emam & Luk
49. Anonymizing Health Data
Recipient deliberately attempts to re-identify the data.
Recipient inadvertently re-identifies the data.
“Holly Smokes, I know her!”
Step 3: Examining Plausible Attacks
Khaled El Emam & Luk
50. Anonymizing Health Data
Recipient deliberately attempts to re-identify the data.
Recipient inadvertently re-identifies the data.
Data breach at recipient’s site, “data gone wild”.
Step 3: Examining Plausible Attacks
Khaled El Emam & Luk
51. Anonymizing Health Data
Recipient deliberately attempts to re-identify the data.
Data breach at recipient’s site, “data gone wild”.
Adversary launches a demonstration attack on the data.
Step 3: Examining Plausible Attacks
Khaled El Emam & Luk
Recipient inadvertently re-identifies the data.
53. Anonymizing Health Data
Step 4: De-identifying the Data
Generalization: reducing the precision of a field.
Dates converted to month/year, or year.
Khaled El Emam & Luk
54. Anonymizing Health Data
Step 4: De-identifying the Data
Generalization: reducing the precision of a field.
Suppression: replacing a cell with NULL.
Unique 55-year old female in birth registry.
Khaled El Emam & Luk
55. Anonymizing Health Data
Step 4: De-identifying the Data
Generalization: reducing the precision of a field.
Suppression: replacing a cell with NULL.
Sub-sampling: releasing a simple random sample.
50% of data set instead of all data.
Khaled El Emam & Luk
57. Anonymizing Health Data
Step 5: Documenting the Process
Process documentation—a methodology text.
Khaled El Emam & Luk
58. Anonymizing Health Data
Step 5: Documenting the Process
Results documentation—data set, risk thresholds,
assumptions, evidence of low risk.
Khaled El Emam & Luk
Process documentation—a methodology text.
60. Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Pr(re-id, attempt) = Pr(attempt) Pr(re-id | attempt)
Khaled El Emam & Luk
61. Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
Pr(re-id, acquaintance) = Pr(acquaintance) Pr(re-id | acquaintance)
62. Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
Pr(re-id, breach) = Pr(breach) Pr(re-id | breach)
63. Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
T4: Public Data (demonstration attack)
Pr(re-id), based on data set only
66. Anonymizing Health Data
Choosing Thresholds
Khaled El Emam & Luk
Many precedents going back multiple decades.
Recommended by regulators.
67. Anonymizing Health Data
Choosing Thresholds
Khaled El Emam & Luk
Many precedents going back multiple decades.
Recommended by regulators.
All based on max risk though.
68. Anonymizing Health Data
Choosing Thresholds
Khaled El Emam & Luk
Many precedents going back multiple decades.
Recommended by regulators.
All based on max risk though.
71. Anonymizing Health Data
Cross Sectional Data: Research Registries
Khaled El Emam & Luk
Better Outcomes Registry & Network (BORN)
of Ontario
72. Anonymizing Health Data
Cross Sectional Data: Research Registries
Khaled El Emam & Luk
Better Outcomes Registry & Network (BORN)
of Ontario
140,000 births per year.
73. Anonymizing Health Data
Cross Sectional Data: Research Registries
Khaled El Emam & Luk
Better Outcomes Registry & Network (BORN)
of Ontario
140,000 births per year.
Cross-sectional—mothers not traced over time.
74. Anonymizing Health Data
Cross Sectional Data: Research Registries
Khaled El Emam & Luk
Better Outcomes Registry & Network (BORN)
of Ontario
140,000 births per year.
Cross-sectional—mothers not traced over time.
Process of getting de-identified data from a
research registry.
75. Anonymizing Health Data
Cross Sectional Data: Research Registries
Khaled El Emam & Luk
Better Outcomes Registry & Network (BORN)
of Ontario
140,000 births per year.
Cross-sectional—mothers not traced over time.
Process of getting de-identified data from a
research registry.
80. Anonymizing Health Data
Choosing Thresholds
Khaled El Emam & Luk
Average risk of 0.1 for Researcher Ronnie
(and the data he specifically requested).
81. Anonymizing Health Data
Choosing Thresholds
Khaled El Emam & Luk
0.05 if there were highly sensitive variables
(congenital anomalies, mental health problems).
Average risk of 0.1 for Researcher Ronnie
84. Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
Low motives and capacity; low mitigating controls.
86. Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
119,785 births out of a 4,478,500 women ( = 0.027)
87. Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
Pr(aquaintance) = 1- (1-0.027)150/2 = 0.87
88. Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
Based on historical data.
89. Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
Pr(breach)=0.27
90. Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
T4: Public Data (demonstration attack)
91. Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
Overall risk
Pr(re-id, T) = Pr(T) x Pr(re-id | T) ≤ 0.1
92. Anonymizing Health Data
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
Pr(aquaintance) = 1- (1-0.027)150/2 = 0.87
Overall risk
Pr(re-id, acquaintance) = 0.87 Pr(re-id | acquaintance) ≤ 0.1
97. Anonymizing Health Data
De-identifying the Data Set
Khaled El Emam & Luk
MDOB in 1-yy; BDOB in wk/yy; MPC of 1 char.
MDOB in 10-yy; BDOB in qtr/yy; MPC of 3 chars.
98. Anonymizing Health Data
De-identifying the Data Set
Khaled El Emam & Luk
MDOB in 1-yy; BDOB in wk/yy; MPC of 1 char.
MDOB in 10-yy; BDOB in qtr/yy; MPC of 3 chars.
MDOB in 10-yy; BDOB in mm/yy; MPC of 3 chars.
100. Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
In 2006 Researcher Ronnie asks for 2005.
101. Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
In 2006 Researcher Ronnie asks for 2005—deleted.
In 2007 Researcher Ronnie asks for 2006.
102. Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
In 2006 Researcher Ronnie asks for 2005.
In 2007 Researcher Ronnie asks for 2006—deleted.
In 2008 Researcher Ronnie asks for 2007.
103. Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
In 2006 Researcher Ronnie asks for 2005.
In 2007 Researcher Ronnie asks for 2006.
In 2008 Researcher Ronnie asks for 2007—deleted.
In 2009 Researcher Ronnie asks for 2008.
104. Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
In 2006 Researcher Ronnie asks for 2005.
In 2007 Researcher Ronnie asks for 2006.
In 2008 Researcher Ronnie asks for 2007.
In 2009 Researcher Ronnie asks for 2008—deleted.
In 2010 Researcher Ronnie asks for 2009.
105. Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
In 2006 Researcher Ronnie asks for 2005.
In 2007 Researcher Ronnie asks for 2006.
In 2008 Researcher Ronnie asks for 2007.
In 2009 Researcher Ronnie asks for 2008—deleted.
In 2010 Researcher Ronnie asks for 2009.
Can we use the same de-identification scheme every year?
108. Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
BORN data pertains to very stable populations.
109. Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
BORN data pertains to very stable populations.
No dramatic changes in the number or characteristics of
births from 2005-2010.
110. Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
BORN data pertains to very stable populations.
No dramatic changes in the number or characteristics of
births from 2005-2010.
Revisit de-identification scheme every 18 to 24 months.
111. Anonymizing Health Data
Year on Year: Re-using Risk Analyses
Khaled El Emam & Luk
BORN data pertains to very stable populations.
No dramatic changes in the number or characteristics of
births from 2005-2010.
Revisit de-identification scheme every 18 to 24 months.
Revisit if any new quasi-identifiers are added or changed.
113. Anonymizing Health Data
Longitudinal Discharge Abstract Data:
State Inpatient Databases
Khaled El Emam & Luk
Linking a patient’s records over time.
114. Anonymizing Health Data
Longitudinal Discharge Abstract Data:
State Inpatient Databases
Khaled El Emam & Luk
Linking a patient’s records over time.
Need to be de-identified differently.
127. Anonymizing Health Data
T1:Deliberate Attempt
Measuring Risk Under Plausible Attacks
Khaled El Emam & Luk
T2: Inadvertent Attempt (“Holly Smokes, I know her!”)
T3: Data Breach (“data gone wild”)
T4: Public Data (demonstration attack)
Pr(re-id) ≤ 0.09 (maximum risk)
129. Anonymizing Health Data
De-identifying the Data Set
Khaled El Emam & Luk
BirthYear in 5-yy (cut at 1910-);
AdmissionYear unchanged;
DaysSinceLastService in 28-dd (cut at 7-, 182+);
LengthOfStay same as DaysSinceLastService.
130. Anonymizing Health Data
De-identifying the Data Set
Khaled El Emam & Luk
BirthYear in 5-yy (cut at 1910-);
AdmissionYear unchanged;
DaysSinceLastService in 28-dd (cut at 7-, 182+);
LengthOfStay same as DaysSinceLastService.
134. Anonymizing Health Data
Connected Variables
Khaled El Emam & Luk
QI to QI
Similar QI?
Same generalization and suppression.
QI to non-QI
135. Anonymizing Health Data
Connected Variables
Khaled El Emam & Luk
QI to QI
Similar QI?
Same generalization and suppression.
QI to non-QI
Non-QI is revealing?
Same suppression so both are removed.
137. Anonymizing Health Data
Other Issues Regarding Longitudinal Data
Khaled El Emam & Luk
Date shifting—maintaining order of records.
138. Anonymizing Health Data
Other Issues Regarding Longitudinal Data
Khaled El Emam & Luk
Date shifting—maintaining order of records.
Long tails—truncation of records.
139. Anonymizing Health Data
Other Issues Regarding Longitudinal Data
Khaled El Emam & Luk
Date shifting—maintaining order of records.
Long tails—truncation of records.
Adversary power—assumption of knowledge.
142. Anonymizing Health Data
Other Concerns to Think About
Khaled El Emam & Luk
Free-form text—anonymization.
Geospatial information—aggregation and
geoproxy risk.
143. Anonymizing Health Data
Other Concerns to Think About
Khaled El Emam & Luk
Free-form text—anonymization.
Geospatial information—aggregation and
geoproxy risk.
Medical codes—generalization, suppression,
shuffling (yes, as in cards).
144. Anonymizing Health Data
Other Concerns to Think About
Khaled El Emam & Luk
Free-form text—anonymization.
Geospatial information—aggregation and
geoproxy risk.
Medical codes—generalization, suppression,
shuffling (yes, as in cards).
Secure linking—linking data through
encryption before anonymization.
147. Anonymizing Health Data
Khaled El Emam & Luk
Khaled El Emam: kelemam@privacyanalytics.ca
Luk Arbuckle: larbuckle@privacyanalytics.ca
More Comments or Questions: Contact us!
Editor's Notes
A risk-based methodology is consistent with contemporary standards from regulators and governments, and is the approach we present in our book.
This is where things get heavy. We’ll start with some basic principles.
The Goldilocks Principle: the trade-off between perfect data and perfect privacy.
We use masking for direct identifiers, and de-identification for indirect identifiers.
Masking
De-identification
Yahoo!
From a regulatory perspective, it’s important to document the process that was used to de-identify the data set, as well as the results of enacting that process.
From a regulatory perspective, it’s important to document the process that was used to de-identify the data set, as well as the results of enacting that process.
From a regulatory perspective, it’s important to document the process that was used to de-identify the data set, as well as the results of enacting that process.
The probability of anattack will depend on the controls in place to manage the data (mitigating controls).
On average people tend to have 150 friends. This is called the Dunbar number.
Based on recent credible evidence, we know that approximately 27% of providers that are supposed to follow the HIPAA Security Rule have a reportable breach every year.
We assume that there is an adversary who has background information that can be used to launch an attack.
So we can measure risk under plausible attacks, but how to we set an overall risk threshold?
Max risk is based on the record that has the highest probability of re-identification; average risk when the adversary is trying to re-identify someone they know or all everyone in data set.
To set the threshold, we can look at the sensitivity of the data and the consent mechanism that was in place (invasion of privacy).