The Rise of Data Ethics and Security - AIDI Webinar
1. 11
Eryk B. Pratama
IT Advisory & Cyber Security Consultant at Global Consulting Firm
Asosiasi Ilmuwan Data Indonesia (AIDI)
Komunitas Data Privacy & Protection Indonesia
29 July 2020
AIDI Webinar
The Rise of Data Ethics & Security
2. About Me
❑ Global IT Advisory & Cyber Security Professional
❑ Asosiasi Ilmuwan Data Indonesia (AIDI)
❑ Komunitas Data Privacy & Protection Indonesia
❑ International Association of Privacy Professional (IAPP)
❑ Information Systems Audit and Control Association (ISACA)
❑ Community Enthusiast
❑ Blogger / Writer
❑ Knowledge Hunter
❑ https://medium.com/@proferyk
❑ https://www.slideshare.net/proferyk
https://www.linkedin.com/in/erykbudipratama/
You can subscribe to my telegram channel.
▪ IT Advisory & Risk (t.me/itadvindonesia)
▪ Data Privacy & Protection (t.me/dataprivid)
▪ Komunitas Data Privacy & Protection (t.me/dataprotectionid)
6. Data/Information Lifecycle
Introduction
Source: ISACA – Getting Started with Data Governance with COBIT 5
It is important to plan the life cycle of data along with their placement within the governance structure. As practices
operate, the data supporting or underlying them reach the various levels of their natural life cycles. Data is planned,
designed, acquired, used, monitored and disposed of.
Critical information security control
Store | Data at Rest Share | Data in Motion Use | Data in Use
7. A growing reliance on data and analytics trigger the rise of Four
Anchors to make analytics more trusted
Introduction
Does it perform as intended?Are the inputs and the
development process high
quality?
Is its use considered
acceptable?
Is its long term operation
optimised?
Percentage of respondents who reported being very confident in their D&A insights
Source: https://home.kpmg/pl/en/home/insights/2018/01/report-building-trust-in-analytics.html
8. Data sourcing is the key trust in stage of the analytics lifecycle
Introduction
Source: https://home.kpmg/pl/en/home/insights/2018/01/report-building-trust-in-analytics.html
10. Ethics in Data Processing
Data Ethics
In the context of personal data, data represent the characteristics of individuals that can later be used
to determine decisions that can affect the life of the individual. For example health data / medical
records. What is the impact if a medical record is leaked? Unauthorized and irresponsible people can
exploit it for financial needs, for example by selling medical records to companies that need the data.
Impact on
People
Abuse
Potential
The
economic
value of data
Misuse of data can have a negative impact on individuals. For example when we register a credit card at
the mall. Mostly, there will be offers from either other credit card providers or other advertisements
that we would ask from where or whom this sales person obtain our number. Another example is the
leak of permanent voter list (which the KPU said that those data indeed opened for public). What can
you do with that data? We can sell those data to certain parties. For criminals, this information can be
used for fraud activities.
Proper data processing will provide economic value. The ethics of the data owner can determine how
this value is obtained and who may take economic value from the data.
11. Implementation of Data Ethics
Data Ethics
Vision
Vision really determines the direction / goals of the organization. In this context, the organization
needs to determine what ethical data usage is in the organization. The vision can be adopted from
data ethics principles chosen by Management.
Strategy
Strategies are arranged to achieve the vision. In this case, organizations need to develop strategies
so that data ethics can be applied and carried out consistently as part of the organization's culture.
Governance
To "force" stakeholders to carry out data ethics practices, organizations need to develop effective
policies and procedures and ensure that each related party has clearly defined responsibilities.
Infrastructure & Architecture
Managing complex data (especially for large organizations) will certainly be easier and integrated if
the organization has visibility of all data and is outlined in architecture (for example Enterprise
Architecture) and supported by systems and infrastructure that are qualified and reliable.
Data Insight
The use of insight to support clear and accurate data results is certainly very necessary. Use of tools
(such as dashboards) can help organizations monitor and provide early warnings of potential ethical
data violations.
Training & Development
People are the main factor in the context of data ethics. Organizations need to conduct training
related to ethics in the use (and misuse) of data. Of course this can be done when the organization
conducts socialization or training related to Data Privacy and Personal Data Protection, because data
ethics is attached to both
13. Regulation: RUU Perlindungan Data Pribadi
Data Ethics
Key Highlight
▪ Explicit Consent is required from the data owner for
personal data processing.
▪ Responding timelines for Data subject rights have been
separately called out in the RUU PDP.
▪ Data controller to notify the data owner and the Minister
within 3 days of data breach.
▪ Penalties for non-compliance may range from Rp 20 Billion
to Rp 70 Billion or Imprisonment ranging from 2 to 7 years
Data Owner Data Controller Data Processor Data Protection Officer
14. Sample RUU PDP Article: Visual Processing Tools
Data Ethics
15. Privacy Regulation Impact for Data Scientist
Data Ethics
Data scientists working with user data are facing several challenges:
1. Making data both protected and accessible (for when lawful disclosure is required)
2. Creating ways of data sharing and processing that not only preserve privacy but allow retracting
information, if need be
3. Maintaining enough flexibility and interpretability to provide sufficient transparency of processes (and
additionally to future-proof the technology)
4. Learning to work with limited data, where its usage is restricted or regulated by law
5. For projects intended for multiple countries: providing compliance with varying regional laws
regarding data privacy and security
User Profiling Consent Management Data Decrement
17. Data Masking - Tokenization
Data Ethics
Source: https://blog.thalesesecurity.com/2015/02/05/token-gesture-vormetric-unveils-new-tokenization-solution/
No sensitive data is stored in the production
database
18. Privacy Control in ETL Process
Data Ethics
Source: Big Data Privacy: a Technological Perspective and Review
Big data architecture and testing area new paradigms for privacy conformance testing to the four areas of the ETL
(Extract, Transform, and Load) processes
19. Privacy Control in ETL Process
Data Ethics
Source: Big Data Privacy: a Technological Perspective and Review
Big data architecture and testing area new paradigms for privacy conformance testing to the four areas of the ETL
(Extract, Transform, and Load) processes as described below.
1. Pre‐Hadoop process validation. This step does the representation of the data loading process. At this step, the
privacy specifications characterize the sensitive pieces of data that can uniquely identify a user or an entity. Privacy
terms can likewise indicate which pieces of data can be stored and for how long. At this step, schema restrictions
can take place as well.
2. Map‐reduce process validation. This process changes big data assets to effectively react to a query. Privacy
terms can tell the minimum number of returned records required to cover individual values, in addition to
constraints on data sharing between various processes.
3. ETL process validation. Similar to step (2), warehousing rationale should be confirmed at this step for compliance
with privacy terms. Some data values may be aggregated anonymously or excluded in the warehouse if that
indicates high probability of identifying individuals.
4. Reports testing reports are another form of questions, conceivably with higher visibility and wider audience.
Privacy terms that characterize ‘purpose’ are fundamental to check that sensitive data is not reported with the
exception of specified uses.
21. Data Governance: Common Area
Big Data Security
Source: https://www.pinterest.com/pin/838584393089888744/
Data Security is one of
foundational and important
area in Data Governance
22. Big Data : Big risks
Big Data Security
Big Data carries significant security, privacy, and transfer risks that are real and will continue to escalate. It is important
that companies give consideration to the risks related to :
which can result in new data creation when combining data from a multitude of sources as organizations seek to
optimize their Big Data programs.
Identification
Re-Identification
Predictive Analytics
Indiscriminate collection of data
Increased risk of data breach
23. Challenge to Big Data Security & Privacy
Big Data Security
• Protecting Transaction Logs and Data
• Validation and Filtration of End-Point Inputs
• Securing Distributed Framework Calculations and Other
Processes
• Securing and Protecting Data in Real Time
• Protecting Access Control Method Communication and
Encryption
• Data Provenance
• Granular Auditing
• Granular access control
• Privacy Protection for Non-Rational Data Stores
Big Data governance
Re-identification risk
Third Parties risk
Interpreting current regulations and
anticipating future regulations
Maintaining privacy and security
requirements
24. Approach to Building out Big Data Security and Privacy Program
Big Data Security
Source: KPMG – Navigating Big Data Privacy and Security Challenges
Data Governance
Data governance program must be established that provides clear direction for how
the data is handled and protected by the organization.
Compliance
Organizations must identify and understand the security and privacy regulations
that apply to the data they store, process, and transmit.
Data use cases and data feed approval
A key consideration in the adoption of any new data feed is that the potential risk
for re-identification increases when existing data feeds are combined with new data
feeds
Consent Management
Customer consent management is critical to the success implementation of any Big
Data governance. Customer consent requires Transparency, Consistency, and
Granularity
Access management
Organizations must effectively control who within the organization has access to the
data sets.
Anonymization
Anonymization means removing all Personally Identifiable Information (PII) from a
data set and permanently turning it into non-identifying data.
Data sharing/third-party management
Organizations maintain a responsibility to their customers as they share data with
third parties.
25. Differential Privacy (DP) Mechanism
Big Data Security
Data Transformation
Differential Privacy (DP) was conceived to deal with privacy threats to prevent unwanted re-identification and other
privacy threats to individuals whose personal information is present in large datasets, while providing useful access to
data. Under the DP model, personal information in a large database is not modified and released for analysts to use.
Original Data Coefficients
Noisy
Coefficients
Private Data
Transform Noise Invert
General Idea
▪ Apply transform of data
▪ Add noise in the transformed space (based on sensitivity)
▪ Publish noisy coefficients, or invert transform (post-processing)
Goal
▪ Pick a transform that preserves good properties of data
▪ And which has low sensitivity, so noise does not corrupt
[Sample] Laplace Noise
scaled by sensitivity
26. Differential Privacy (DP) Implementation - Example
Big Data Security
Uber uses DP as part of their data analysis pipeline and other development workflows. A novel aspect of their
implementation is the use of Elastic Sensitivity, a technique that allows you to compute the sensitivity of a query and met
Uber’s demanding performance and scalability requirements
Source: https://medium.com/uber-security-privacy/differential-privacy-open-source-7892c82c42b6
28. Case Study: Big Data IT Audit & Penetration Testing
Case Study
Client is planning to launch XYZ Big Data platform after development process done. It is important for Client to ensure that XYZ Big Data
application and its infrastructure systems are properly protected and secured.
Scope XYZ Big Data Platform, ABC Cloud-based Machine Learning, and supporting infrastructure
Top Findings / Issues
Penetration Testing IT Audit
▪ Default Login Password Lead To Root Access
▪ Unrestricted Access to Administration Web Page
▪ Unrestricted access to share folder directory leads
to sensitive information disclosure (e.g KTP,
Invoice)
▪ User information disclosure via Insecure Direct
Object Reference (IDOR)
▪ Shared user ID: There is a shared user ID/admin
account for both database and application levels
▪ Access Administration: Administrator access to
the application can be granted and authorized by
users themselves
▪ Activity Log: Review over logs of administrative
user activities could not be conducted