Invezz.com - Grow your wealth with trading signals
Data Privacy and Security Course Overview
1. Data Privacy and Security(DPS)
Asst. Prof.: Bagesh Kumar Course Code- DPS3204
M.Tech.-Ph.D.: IIIT Allahabad
2. 2
SYLLABUS
Introduction to Data Privacy, types of privacy attacks, Data linking and profiling, access control
models, role based access control, privacy policies, their specifications, privacy policy languages,
privacy in different domains-medical, financial, etc. Mathematical model for comparing real-world
data sharing Practices, computing privacy and risk measurements. Demographics and Uniqueness.
Protection Models-Nuil-map, k-map, Wrong map. Survey of techniques-Protection models
(null-map, k-map, wrong map), Disclosure control, Inferring entity identities, entry specific
databases. Computation systems for protecting delimited data-Min Gen, Datafly, Mu-Argus,
k-Similar. Introduction to Security: The OSI Security Architecture, Security Attacks, Services and
Mechanisms, Model for Network Security, Number theory, Cryptographic Hash Functions, Digital
Signatures, System Security, ‘Symmetric Encryption and Message Confidentiality, Substitution
ciphers, Stream ciphers, Public-key cryptography and Message Authentication, Key Distribution
and Authentication, Transport Layer Security, Wireless Network Security, E-mail Security, IP
Security, Security Management Systems, Need for IT Security, Intrusion Prevention and Detection
Systems, Cyber Security. Security metrics: Design, Data sources, Analysis of security metrics data,
Measuring security cost and value, Different Context for security process management. Acquisition
and Duplication: Sterilizing Evidence Media, ‘Acquiring Forensics Images, Acquiring Live Volatile
Data, Data Analysis, Metadata Extraction, and File ‘System Analysis.
3. Course Plan
Lecture by Professor
3
A professor will teach and elaborate the
concepts in this lecture
Lecture by Industry
Expert
An Industry Expert will share his
learnings and practical experiences in this
lecture
4. 4
Aseem Shrey
● Placed at Publicis.Sapient
● Summer Intern : Innovaccer
● Google Bug Bounty
● Digilocker Bug Bounty
● 1st Place (India), 7th Place (World) nullcon
CTF 2018
● 20th Place , DRDO 60 Cyber Challenge -
CTF 2018
● 1st Place , Terminal Tragedy - CTF,
organised by NIT Trichy 2017
● 5th Place , CSAW Finals (India Region)
2017
Security Engineer
at Rippling
5. 5
OP VYAS
●Alumnus with Technical University of Kaiserslautern
(Germany).
●Master of Technology (Computer Sc.) from Indian Institute of
Technology-Kharagpur.
●Ph.D. jointly with TU Kaiserslautern & IIT-Kharagpur as DAAD
Fellow (1995-97)
Worked at ‘Fraunhofer Institute for Software Engineering’ (Kaiserslautern – Germany), Visiting Professor in IIM-Rohtak
for one Semester, Fraunhofer AIS, St. Augustin, Germany
●Currently working as , Dean (Technology Development), Professor (IT) at IIIT Allahabad (India) formerly worked as Dean
(Academic) and Dean (Research).
Research Area: Data Science, Business Informatics and
Cyber Security
●Worked as Visiting Professor in 5 different Countries with Collaborative projects with Germany, France,
Norway, Italy and Japan in Data Science, Machine Learning & IoT. Completed 06 Funded projects with one
Indo-Norway project, one Indo-French Project, One Indo-Italy and two Indo-German Projects.
● Guided 16 Ph.D. scholars, ~50 Master Theses, ~ 75 B.Tech. Students’ theses. More than ~160 publications
and 04 Books.
6. 6
Research Scholars
a a
Ayush
Sinha
Kumar
Saurabh
●Data Science for Cyber
Physical Security.
●Deep Learning Techniques
for Time Series data
●Intrusion Detection using
Machine Learning
●Cyber Security in
Industrial IoT.
Harshit
Gupta
●Federated Learning in
Industrial IoT.
7. 7
SIDDHART
SIMHARAJU
❖ Case Studies:
➢ Building Hacker Resume to get 16,000
daily users in 3 months
➢ Redesigning website to get 5 Million
users in 2 years
➢
8. 8
Luv Kumar
● Platform Engineer @Directi.
● Ex Goldman Sachs
● Former Coordinator
Competitive-Coding wing, IIITA.
● Teaching Assistant - Data
structures and algorithms, IIIT
Allahabad.
Backend Engineer at
Directi | Teacher |
Youtuber
9. 9
Abhinav Khare
● Software @Directi.
● Ex Goldman Sachs
● Former Coordinator Competitive-Coding
wing, GeekHaven.
● Teaching Assistant - Data structures and
algorithms, IIIT Allahabad.
● Youtuber with 150k Subscriber
Backend Engineer at
Directi | Teacher |
Youtuber
11. 11
Santosh Gautham
● ML Engineer in Berlin, Germany
● He has interests in ML and AI
● Former Software Intern @Headout
● Research Intern @University of
Messina, Italy
● Co-Founder of Hack in the North
and Nybles for Students articles.
Founder at
Polynomial Protocol
13. Table of contents
1 2 3 4 5
Introduction Data Privacy Types of
Privacy Attacks
Data Linking
and Profiling
Summary
13 Presentation title 20XX
14. Introduction
• What is Data?
• Importance of Data: Data is modern day currency if any online service is
free they use your data as their payment.
• What is Information?
• DataBase Management and DBMS
• Data Privacy vs Data Security vs Data Protection
14
Data Privacy and
Security
2023
15. Data Privacy
Data Privacy deals with the
proper handling of data with
focus on compliance with data
protection regulations.
16. Basic Components of Security
● CIA
○ Confidentiality: Who is authorized to use data?
○ Integrity: Is data good?
○ Availability: Can access data whenever need it?
● CIAAAN
○ Authentication
○ Authorization
○ Non-repudiation
16
17. Critical Infrastructure Areas
● Telecommunications
● Electrical power systems
● Water supply systems
● Gas and oil pipelines
● Transportation
● Government services
● Emergency services
● Banking and finance
17
19. What is Data Privacy?
• Data privacy refers to the proper use and processing of personal data by restoring
control over their data to individuals.
• Data privacy enables individuals to decide and limit access to the use and sharing
of their personal data.
• Data privacy is centered around how data should be collected, stored, managed,
and shared with any third parties, as well as compliance with the applicable privacy
laws.
• Data protection laws around the world aim to give back individuals control over
their data.
• Empowering individuals to know how their data is being used, by whom and why,
giving them control how their personal data is being processed and used.
19
Data Privacy and
Security
2022
20. Three Key Elements of Data Privacy
Right of an
individual to
be left alone
and control
over their
personal data.
Procedures for
proper
handling,
processing,
collecting and
sharing of
personal data.
Compliance
with data
protection laws.
20
Data Privacy and
Security
2023
21. Importance
of Data
Privacy
When data that should’ve been private gets
into the wrong hands, bad things can
happen.
A data breach at a government agency can,
for example, put top-secret information in the
hands of an enemy state.
A breach at a corporation can put proprietary
data in the hands of a competitor.
A breach at a school could put student’s PII
in the hands of criminals who could commit
identity theft.
A breach at a hospital’s or doctor’s office
can put PHI In the hands of those who might
misuse it.
24. Malware Attack
•“Malware” refers to malicious software viruses including worms,
spyware, ransomware, adware, and trojans.
1. The trojan virus disguises itself as legitimate software.
2. Ransomware blocks access to the network's key components.
3. Spyware is software that steals all your confidential data without your knowledge.
4. Adware is software that displays advertising content on a user's screen.
•Malware breaches a network through a vulnerability. When the user clicks
a dangerous link, it downloads an email attachment or when an infected
pen drive is used.
24
Data Security and
Privacy
20XX
25. Phishing Attack
•Phishing attacks are one of the most prominent widespread types of cyber
attacks.
•It is a type of social engineering attack wherein an attacker impersonates
to be a trusted contact and sends the victim fake emails.
•Unaware of this, the victim opens the mail and clicks on the malicious
link or opens the email’s attachment.
•Attackers gain access to confidential information and account credentials.
•They can also install malware through a phishing attack.
25 Presentation title 20XX
26. Password Attack
•It is a form of attack wherein a hacker cracks your password with various
programs and password-cracking tools like Aircrack, Cain, Abel, John
the Ripper, Hashcat etc.
•There are different types of password attacks like brute force attacks,
dictionary attacks, and keylogger attacks.
26 Presentation title 20XX
27. Man-in-the-Middle Attack
•In this attack, an attacker comes in between a two-party communication,
i.e., the attacker hijacks the session between a client and host. By doing
so, hackers steal and manipulate data.
•Client-server communication is cut off, and the communication line goes
through the hacker.
27 Presentation title 20XX
28. SQL Injection Attack
•A Structured Query Language (SQL) injection attack occurs on a
database-driven website when the hacker manipulates a standard
SQL query.
•It is carried out by injecting malicious code into a vulnerable
website search box, thereby making the server reveal crucial
information.
•This results in the attacker being able to view, edit, and delete
tables in the databases. Attackers can also get administrative rights
through this.
28 Presentation title 20XX
29. Denial-of-Service Attack
•A Denial-of-Service Attack is a significant threat to companies. Here,
attackers target systems, servers, or networks and flood them with traffic
to exhaust their resources and bandwidth.
•When this happens, catering to the incoming requests becomes
overwhelming for the servers, resulting in the website it hosts either
shutting down or slowing down. This leaves legitimate service requests
unattended.
•It is also known as a DDoS (Distributed Denial-of-Service) attack when
attackers use multiple compromised systems to launch this attack.
29 Presentation title 20XX
30. Insider Threat
•An insider threat does not involve a third party but an insider. In
such a case; it could be an individual from within the organization
who knows everything about the organization.
•Insider threats have the potential to cause tremendous damage.
•Insider threats are rampant in small businesses, as the staff there
hold access to multiple accounts with data.
•Reasons for this form of attack are many, it can be greed, malice, or
even carelessness. Insider threats are hard to predict and hence
tricky.
30 Presentation title 20XX
31. Cryptojacking
•Cryptojacking takes place when attackers access someone else’s computer
for mining cryptocurrency.
•The access is gained by infecting a website or manipulating the victim to
click on a malicious link.
•They also use online ads with JavaScript code for this. Victims are
unaware of this as the Crypto mining code works in the background; a
delay in the execution is the only sign they might witness.
31 Presentation title 20XX
32. Zero-Day Exploit
•A Zero-Day Exploit happens after the announcement of a network
vulnerability; there is no solution for the vulnerability in most cases.
Hence the vendor notifies the vulnerability so that the users are aware;
however, this news also reaches the attackers.
•Depending on the vulnerability, the vendor or the developer could take
any amount of time to fix the issue. Meanwhile, the attackers target the
disclosed vulnerability. They make sure to exploit the vulnerability even
before a patch or solution is implemented for it.
32 Presentation title 20XX
33. Watering Hole Attack
•The victim here is a particular group of an organization, region, etc.
•The attacker targets websites that are frequently used by the targeted
group. Websites are identified either by closely monitoring the group or
by guessing.
•Attackers infect these websites with malware, which infects the victims'
systems. The malware in such an attack targets the user's personal
information.
•It is also possible for the hacker to take remote access to the infected
computer.
33 Presentation title 20XX
34. Data Linking
And Profiling
The process of
combining multiple
sources of data about a
person in order to create
a detailed profile of that
person's characteristics,
preferences, and
behavior.
35. What is Data Linking?
•Data linking is the process of collating information from different sources
in order to create a more valuable and helpful data set.
•The linking of information about the same person or entity from disparate
sources allows, among other things, the construction of a chronological
sequence of events. This information is of immense value at the policy
level to derive meaningful decisions.
•For example information about children in a local community can help
decide on the volume of early childhood programs required and school
locations.
35 Presentation title 20XX
36. Procedure for Data Linking
•The advisor takes information from the different custodians and
extracts the data needed for this particular process.
•The data is run through advanced computer software and the
individual records are linked across all the required data sets.
•Once the data is linked, each individual is given a unique code
which is called the ‘linkage key,’ and the individual is de-identified.
•The customer (Person who needs linked data) will use each
linkage key later to connect the data of separate individuals across
all these data sets.
•For Example: A researcher wants to work for welfare of society.
36 Presentation title 20XX
37. Ways to Link Data
•.
37 Presentation title 20XX
•A unique identifier is available on each data set that establishes the links between
these data sets. It is also called deterministic or exact linking because the unique
identifiers either match completely, or do not at all. This method means there is no
uncertainty
Unique
Identifier
•The linkage key works like a substitute for the unique identifier
•This key is created using information like name and address available on both data
sets. These linkage keys maintain the privacy of the person or entity as the key is
used in place of the name and address.
Linkage Key
•It is based on the probability that the pair of records, taken from one data set, refers
to the same entity or person.
Probabilistic
Linking
•This technique combines records similar to the entity but not necessarily the same
person or organization.
•This kind of data linking may not give the most accurate results but does provide a
pattern or trend from the given information or statistics.
Statistical
Linking
38. 38 Presentation title 20XX
Scope of
Data
Linking
Life
Sciences
Government
Healthcare
Libraries
Archives
and
Universities
Social
Media
Business
Uses
39. Benefits and Challenges of Data Linking
Benefits
Helps in Research and
Policymaking
Integral Tool for Business
Research
Time Saving
Challenges
Lack of Common Entity
Identifiers
Long Delays in Approvals
Inconsistent or Incomplete
Data
39 Presentation title 20XX
40. What is Data Profiling?
•Data Profiling can be defined as the process of examining and analyzing
data to create valuable summaries of it. The process yields a high-level
overview that aids in
•Data Profiling can eliminate costly errors that are common in databases.
These errors include incorrect or missing values, values outside the range,
unexpected patterns in data, etc.
i. discovering data quality issues,
ii. risks, and
iii. overall trends
40 Presentation title 20XX
41. What is Data Profiling? (Contd.)
•For example: report or analysis, data warehousing or business intelligence
projects may necessitate gathering data from numerous distinct systems or
databases. Before moving on with these projects, data profiling can assist
detect potential flaws and corrections in extract, transform, and load
(ETL) activities and other data integration procedures.
•Usually, it is combined with an ETL process. If performed correctly, Data
Profiling and ETL can together be leveraged to cleanse, enrich, and load
quality data into a target location.
41 Presentation title 20XX
42. What is Data Profiling? (Contd.)
It involves the following processes:
•Collecting Descriptive Statistics such as minimum and maximum values,
count of values, etc.
•Performing data quality assessment.
•Identifying data types, recurring patterns, etc.
•Tagging data with descriptions and keywords.
•Group data into categories.
•Identifying the metadata and its accuracy.
•Performing inter-table analysis.
•Identifying functional dependencies, embedded value dependencies,
distributions, key candidates, foreign-key candidates, etc.
42 Presentation title 20XX
43. Types of Data Profiling
Structure Discovery
This type of profiling
involves performing
mathematical checks on
the data such as sum,
minimum, maximum,
etc., along with other
Descriptive Statistics.
Content Discovery
Content Discovery
profiling involves
looking into individual
data records to identify
errors. Content
Discovery identifies
which rows in a given
dataset contain problems
or any systemic issues
occurring in the data.
Relationship
Discovery
Relationship
Discovering involves
identifying how parts of
the data are related to
each other. For example,
identifying key
relationships between
tables in a database etc.
43 Presentation title 20XX
44. Data Profiling Methods
Column Profiling
In this method, the
number of times every
value appears within
each column of a table is
counted. This method
helps to uncover patterns
within the data.
Cross-column
Profiling
We look across columns to
perform Key and Dependency
Analysis. Key Analysis is
implemented to scan the
collections of values in a table
to identify a potential Primary
Key. Dependency Analysis
determines the dependent
relationships within data sets.
These analyses can be
leveraged to determine the
relationships and dependencies
across tables.
Cross-table Profiling
In this method, users
look across tables to
identify all potential
Foreign Keys. It also
attempts to identify
similarities and
differences among data
types and syntax
between tables to
determine which data
can be mapped together
and which might be
redundant.
44 Presentation title 20XX
45. Benefits of Data Profiling
45 Presentation title 20XX
Improved Data Quality and Credibility
Proactive Crisis Management
Predictive Decision-making
Organized Sorting
46. Data Profiling Challenges
• Actual work required is quite complicated, with various processes occurring from data
acquisition to data warehousing.
• One of the obstacles that businesses have when attempting to create and operate a
successful data profiling program is complexity.
• Another problem is the sheer volume of data generated by a typical firm, as well as the
variety of data sources, which range from cloud-based systems to endpoint devices
deployed as part of an internet-of-things ecosystem.
• The rapidity with which data enters an organization adds to the difficulty of
implementing a successful data profiling program.
• These data preparation issues are magnified in organizations that have not yet adopted
current data profiling tools and still rely on manual processes for the majority of their
data preparation.
• Similarly, organizations that lack necessary resources, such as Trained Data Experts,
Tools, and Financing, will find it more difficult to overcome these obstacles.
46 Presentation title 20XX
47. BEST DATA
PROFILING
TOOLS
IBM InfoSphere Information Analyzer
SAP Business Objects Data Services (BODS)
Informatica Data Explorer
Melissa Data Profiler
Melissa Data Profiling Pricing