SlideShare a Scribd company logo
1 of 47
Download to read offline
Data Privacy and Security(DPS)
Asst. Prof.: Bagesh Kumar Course Code- DPS3204
M.Tech.-Ph.D.: IIIT Allahabad
2
SYLLABUS
Introduction to Data Privacy, types of privacy attacks, Data linking and profiling, access control
models, role based access control, privacy policies, their specifications, privacy policy languages,
privacy in different domains-medical, financial, etc. Mathematical model for comparing real-world
data sharing Practices, computing privacy and risk measurements. Demographics and Uniqueness.
Protection Models-Nuil-map, k-map, Wrong map. Survey of techniques-Protection models
(null-map, k-map, wrong map), Disclosure control, Inferring entity identities, entry specific
databases. Computation systems for protecting delimited data-Min Gen, Datafly, Mu-Argus,
k-Similar. Introduction to Security: The OSI Security Architecture, Security Attacks, Services and
Mechanisms, Model for Network Security, Number theory, Cryptographic Hash Functions, Digital
Signatures, System Security, ‘Symmetric Encryption and Message Confidentiality, Substitution
ciphers, Stream ciphers, Public-key cryptography and Message Authentication, Key Distribution
and Authentication, Transport Layer Security, Wireless Network Security, E-mail Security, IP
Security, Security Management Systems, Need for IT Security, Intrusion Prevention and Detection
Systems, Cyber Security. Security metrics: Design, Data sources, Analysis of security metrics data,
Measuring security cost and value, Different Context for security process management. Acquisition
and Duplication: Sterilizing Evidence Media, ‘Acquiring Forensics Images, Acquiring Live Volatile
Data, Data Analysis, Metadata Extraction, and File ‘System Analysis.
Course Plan
Lecture by Professor
3
A professor will teach and elaborate the
concepts in this lecture
Lecture by Industry
Expert
An Industry Expert will share his
learnings and practical experiences in this
lecture
4
Aseem Shrey
● Placed at Publicis.Sapient
● Summer Intern : Innovaccer
● Google Bug Bounty
● Digilocker Bug Bounty
● 1st Place (India), 7th Place (World) nullcon
CTF 2018
● 20th Place , DRDO 60 Cyber Challenge -
CTF 2018
● 1st Place , Terminal Tragedy - CTF,
organised by NIT Trichy 2017
● 5th Place , CSAW Finals (India Region)
2017
Security Engineer
at Rippling
5
OP VYAS
●Alumnus with Technical University of Kaiserslautern
(Germany).
●Master of Technology (Computer Sc.) from Indian Institute of
Technology-Kharagpur.
●Ph.D. jointly with TU Kaiserslautern & IIT-Kharagpur as DAAD
Fellow (1995-97)
Worked at ‘Fraunhofer Institute for Software Engineering’ (Kaiserslautern – Germany), Visiting Professor in IIM-Rohtak
for one Semester, Fraunhofer AIS, St. Augustin, Germany
●Currently working as , Dean (Technology Development), Professor (IT) at IIIT Allahabad (India) formerly worked as Dean
(Academic) and Dean (Research).
Research Area: Data Science, Business Informatics and
Cyber Security
●Worked as Visiting Professor in 5 different Countries with Collaborative projects with Germany, France,
Norway, Italy and Japan in Data Science, Machine Learning & IoT. Completed 06 Funded projects with one
Indo-Norway project, one Indo-French Project, One Indo-Italy and two Indo-German Projects.
● Guided 16 Ph.D. scholars, ~50 Master Theses, ~ 75 B.Tech. Students’ theses. More than ~160 publications
and 04 Books.
6
Research Scholars
a a
Ayush
Sinha
Kumar
Saurabh
●Data Science for Cyber
Physical Security.
●Deep Learning Techniques
for Time Series data
●Intrusion Detection using
Machine Learning
●Cyber Security in
Industrial IoT.
Harshit
Gupta
●Federated Learning in
Industrial IoT.
7
SIDDHART
SIMHARAJU
❖ Case Studies:
➢ Building Hacker Resume to get 16,000
daily users in 3 months
➢ Redesigning website to get 5 Million
users in 2 years
➢
8
Luv Kumar
● Platform Engineer @Directi.
● Ex Goldman Sachs
● Former Coordinator
Competitive-Coding wing, IIITA.
● Teaching Assistant - Data
structures and algorithms, IIIT
Allahabad.
Backend Engineer at
Directi | Teacher |
Youtuber
9
Abhinav Khare
● Software @Directi.
● Ex Goldman Sachs
● Former Coordinator Competitive-Coding
wing, GeekHaven.
● Teaching Assistant - Data structures and
algorithms, IIIT Allahabad.
● Youtuber with 150k Subscriber
Backend Engineer at
Directi | Teacher |
Youtuber
10
11
Santosh Gautham
● ML Engineer in Berlin, Germany
● He has interests in ML and AI
● Former Software Intern @Headout
● Research Intern @University of
Messina, Italy
● Co-Founder of Hack in the North
and Nybles for Students articles.
Founder at
Polynomial Protocol
12
Course Structure
Table of contents
1 2 3 4 5
Introduction Data Privacy Types of
Privacy Attacks
Data Linking
and Profiling
Summary
13 Presentation title 20XX
Introduction
• What is Data?
• Importance of Data: Data is modern day currency if any online service is
free they use your data as their payment.
• What is Information?
• DataBase Management and DBMS
• Data Privacy vs Data Security vs Data Protection
14
Data Privacy and
Security
2023
Data Privacy
Data Privacy deals with the
proper handling of data with
focus on compliance with data
protection regulations.
Basic Components of Security
● CIA
○ Confidentiality: Who is authorized to use data?
○ Integrity: Is data good?
○ Availability: Can access data whenever need it?
● CIAAAN
○ Authentication
○ Authorization
○ Non-repudiation
16
Critical Infrastructure Areas
● Telecommunications
● Electrical power systems
● Water supply systems
● Gas and oil pipelines
● Transportation
● Government services
● Emergency services
● Banking and finance
17
18
What is Data Privacy?
• Data privacy refers to the proper use and processing of personal data by restoring
control over their data to individuals.
• Data privacy enables individuals to decide and limit access to the use and sharing
of their personal data.
• Data privacy is centered around how data should be collected, stored, managed,
and shared with any third parties, as well as compliance with the applicable privacy
laws.
• Data protection laws around the world aim to give back individuals control over
their data.
• Empowering individuals to know how their data is being used, by whom and why,
giving them control how their personal data is being processed and used.
19
Data Privacy and
Security
2022
Three Key Elements of Data Privacy
Right of an
individual to
be left alone
and control
over their
personal data.
Procedures for
proper
handling,
processing,
collecting and
sharing of
personal data.
Compliance
with data
protection laws.
20
Data Privacy and
Security
2023
Importance
of Data
Privacy
When data that should’ve been private gets
into the wrong hands, bad things can
happen.
A data breach at a government agency can,
for example, put top-secret information in the
hands of an enemy state.
A breach at a corporation can put proprietary
data in the hands of a competitor.
A breach at a school could put student’s PII
in the hands of criminals who could commit
identity theft.
A breach at a hospital’s or doctor’s office
can put PHI In the hands of those who might
misuse it.
Types of Privacy
Attacks
Types of Cyber Attacks
Malware Attack
Phishing Attack
Password Attack
Man-in-the-Middle Attack
SQL Injection Attack
23
Data Privacy and
Security
20XX
Denial-of-Service Attack
Insider Threat
Cryptojacking
Zero-Day Exploit
Watering Hole Attack
Malware Attack
•“Malware” refers to malicious software viruses including worms,
spyware, ransomware, adware, and trojans.
1. The trojan virus disguises itself as legitimate software.
2. Ransomware blocks access to the network's key components.
3. Spyware is software that steals all your confidential data without your knowledge.
4. Adware is software that displays advertising content on a user's screen.
•Malware breaches a network through a vulnerability. When the user clicks
a dangerous link, it downloads an email attachment or when an infected
pen drive is used.
24
Data Security and
Privacy
20XX
Phishing Attack
•Phishing attacks are one of the most prominent widespread types of cyber
attacks.
•It is a type of social engineering attack wherein an attacker impersonates
to be a trusted contact and sends the victim fake emails.
•Unaware of this, the victim opens the mail and clicks on the malicious
link or opens the email’s attachment.
•Attackers gain access to confidential information and account credentials.
•They can also install malware through a phishing attack.
25 Presentation title 20XX
Password Attack
•It is a form of attack wherein a hacker cracks your password with various
programs and password-cracking tools like Aircrack, Cain, Abel, John
the Ripper, Hashcat etc.
•There are different types of password attacks like brute force attacks,
dictionary attacks, and keylogger attacks.
26 Presentation title 20XX
Man-in-the-Middle Attack
•In this attack, an attacker comes in between a two-party communication,
i.e., the attacker hijacks the session between a client and host. By doing
so, hackers steal and manipulate data.
•Client-server communication is cut off, and the communication line goes
through the hacker.
27 Presentation title 20XX
SQL Injection Attack
•A Structured Query Language (SQL) injection attack occurs on a
database-driven website when the hacker manipulates a standard
SQL query.
•It is carried out by injecting malicious code into a vulnerable
website search box, thereby making the server reveal crucial
information.
•This results in the attacker being able to view, edit, and delete
tables in the databases. Attackers can also get administrative rights
through this.
28 Presentation title 20XX
Denial-of-Service Attack
•A Denial-of-Service Attack is a significant threat to companies. Here,
attackers target systems, servers, or networks and flood them with traffic
to exhaust their resources and bandwidth.
•When this happens, catering to the incoming requests becomes
overwhelming for the servers, resulting in the website it hosts either
shutting down or slowing down. This leaves legitimate service requests
unattended.
•It is also known as a DDoS (Distributed Denial-of-Service) attack when
attackers use multiple compromised systems to launch this attack.
29 Presentation title 20XX
Insider Threat
•An insider threat does not involve a third party but an insider. In
such a case; it could be an individual from within the organization
who knows everything about the organization.
•Insider threats have the potential to cause tremendous damage.
•Insider threats are rampant in small businesses, as the staff there
hold access to multiple accounts with data.
•Reasons for this form of attack are many, it can be greed, malice, or
even carelessness. Insider threats are hard to predict and hence
tricky.
30 Presentation title 20XX
Cryptojacking
•Cryptojacking takes place when attackers access someone else’s computer
for mining cryptocurrency.
•The access is gained by infecting a website or manipulating the victim to
click on a malicious link.
•They also use online ads with JavaScript code for this. Victims are
unaware of this as the Crypto mining code works in the background; a
delay in the execution is the only sign they might witness.
31 Presentation title 20XX
Zero-Day Exploit
•A Zero-Day Exploit happens after the announcement of a network
vulnerability; there is no solution for the vulnerability in most cases.
Hence the vendor notifies the vulnerability so that the users are aware;
however, this news also reaches the attackers.
•Depending on the vulnerability, the vendor or the developer could take
any amount of time to fix the issue. Meanwhile, the attackers target the
disclosed vulnerability. They make sure to exploit the vulnerability even
before a patch or solution is implemented for it.
32 Presentation title 20XX
Watering Hole Attack
•The victim here is a particular group of an organization, region, etc.
•The attacker targets websites that are frequently used by the targeted
group. Websites are identified either by closely monitoring the group or
by guessing.
•Attackers infect these websites with malware, which infects the victims'
systems. The malware in such an attack targets the user's personal
information.
•It is also possible for the hacker to take remote access to the infected
computer.
33 Presentation title 20XX
Data Linking
And Profiling
The process of
combining multiple
sources of data about a
person in order to create
a detailed profile of that
person's characteristics,
preferences, and
behavior.
What is Data Linking?
•Data linking is the process of collating information from different sources
in order to create a more valuable and helpful data set.
•The linking of information about the same person or entity from disparate
sources allows, among other things, the construction of a chronological
sequence of events. This information is of immense value at the policy
level to derive meaningful decisions.
•For example information about children in a local community can help
decide on the volume of early childhood programs required and school
locations.
35 Presentation title 20XX
Procedure for Data Linking
•The advisor takes information from the different custodians and
extracts the data needed for this particular process.
•The data is run through advanced computer software and the
individual records are linked across all the required data sets.
•Once the data is linked, each individual is given a unique code
which is called the ‘linkage key,’ and the individual is de-identified.
•The customer (Person who needs linked data) will use each
linkage key later to connect the data of separate individuals across
all these data sets.
•For Example: A researcher wants to work for welfare of society.
36 Presentation title 20XX
Ways to Link Data
•.
37 Presentation title 20XX
•A unique identifier is available on each data set that establishes the links between
these data sets. It is also called deterministic or exact linking because the unique
identifiers either match completely, or do not at all. This method means there is no
uncertainty
Unique
Identifier
•The linkage key works like a substitute for the unique identifier
•This key is created using information like name and address available on both data
sets. These linkage keys maintain the privacy of the person or entity as the key is
used in place of the name and address.
Linkage Key
•It is based on the probability that the pair of records, taken from one data set, refers
to the same entity or person.
Probabilistic
Linking
•This technique combines records similar to the entity but not necessarily the same
person or organization.
•This kind of data linking may not give the most accurate results but does provide a
pattern or trend from the given information or statistics.
Statistical
Linking
38 Presentation title 20XX
Scope of
Data
Linking
Life
Sciences
Government
Healthcare
Libraries
Archives
and
Universities
Social
Media
Business
Uses
Benefits and Challenges of Data Linking
Benefits
Helps in Research and
Policymaking
Integral Tool for Business
Research
Time Saving
Challenges
Lack of Common Entity
Identifiers
Long Delays in Approvals
Inconsistent or Incomplete
Data
39 Presentation title 20XX
What is Data Profiling?
•Data Profiling can be defined as the process of examining and analyzing
data to create valuable summaries of it. The process yields a high-level
overview that aids in
•Data Profiling can eliminate costly errors that are common in databases.
These errors include incorrect or missing values, values outside the range,
unexpected patterns in data, etc.
i. discovering data quality issues,
ii. risks, and
iii. overall trends
40 Presentation title 20XX
What is Data Profiling? (Contd.)
•For example: report or analysis, data warehousing or business intelligence
projects may necessitate gathering data from numerous distinct systems or
databases. Before moving on with these projects, data profiling can assist
detect potential flaws and corrections in extract, transform, and load
(ETL) activities and other data integration procedures.
•Usually, it is combined with an ETL process. If performed correctly, Data
Profiling and ETL can together be leveraged to cleanse, enrich, and load
quality data into a target location.
41 Presentation title 20XX
What is Data Profiling? (Contd.)
It involves the following processes:
•Collecting Descriptive Statistics such as minimum and maximum values,
count of values, etc.
•Performing data quality assessment.
•Identifying data types, recurring patterns, etc.
•Tagging data with descriptions and keywords.
•Group data into categories.
•Identifying the metadata and its accuracy.
•Performing inter-table analysis.
•Identifying functional dependencies, embedded value dependencies,
distributions, key candidates, foreign-key candidates, etc.
42 Presentation title 20XX
Types of Data Profiling
Structure Discovery
This type of profiling
involves performing
mathematical checks on
the data such as sum,
minimum, maximum,
etc., along with other
Descriptive Statistics.
Content Discovery
Content Discovery
profiling involves
looking into individual
data records to identify
errors. Content
Discovery identifies
which rows in a given
dataset contain problems
or any systemic issues
occurring in the data.
Relationship
Discovery
Relationship
Discovering involves
identifying how parts of
the data are related to
each other. For example,
identifying key
relationships between
tables in a database etc.
43 Presentation title 20XX
Data Profiling Methods
Column Profiling
In this method, the
number of times every
value appears within
each column of a table is
counted. This method
helps to uncover patterns
within the data.
Cross-column
Profiling
We look across columns to
perform Key and Dependency
Analysis. Key Analysis is
implemented to scan the
collections of values in a table
to identify a potential Primary
Key. Dependency Analysis
determines the dependent
relationships within data sets.
These analyses can be
leveraged to determine the
relationships and dependencies
across tables.
Cross-table Profiling
In this method, users
look across tables to
identify all potential
Foreign Keys. It also
attempts to identify
similarities and
differences among data
types and syntax
between tables to
determine which data
can be mapped together
and which might be
redundant.
44 Presentation title 20XX
Benefits of Data Profiling
45 Presentation title 20XX
Improved Data Quality and Credibility
Proactive Crisis Management
Predictive Decision-making
Organized Sorting
Data Profiling Challenges
• Actual work required is quite complicated, with various processes occurring from data
acquisition to data warehousing.
• One of the obstacles that businesses have when attempting to create and operate a
successful data profiling program is complexity.
• Another problem is the sheer volume of data generated by a typical firm, as well as the
variety of data sources, which range from cloud-based systems to endpoint devices
deployed as part of an internet-of-things ecosystem.
• The rapidity with which data enters an organization adds to the difficulty of
implementing a successful data profiling program.
• These data preparation issues are magnified in organizations that have not yet adopted
current data profiling tools and still rely on manual processes for the majority of their
data preparation.
• Similarly, organizations that lack necessary resources, such as Trained Data Experts,
Tools, and Financing, will find it more difficult to overcome these obstacles.
46 Presentation title 20XX
BEST DATA
PROFILING
TOOLS
IBM InfoSphere Information Analyzer
SAP Business Objects Data Services (BODS)
Informatica Data Explorer
Melissa Data Profiler
Melissa Data Profiling Pricing

More Related Content

Similar to Data Privacy and Security Course Overview

Ethical Hacking
Ethical HackingEthical Hacking
Ethical Hackingijtsrd
 
Top Cyber Security Interview Questions and Answers 2022.pdf
Top Cyber Security Interview Questions and Answers 2022.pdfTop Cyber Security Interview Questions and Answers 2022.pdf
Top Cyber Security Interview Questions and Answers 2022.pdfCareerera
 
MIST Effective Masquerade Attack Detection in the Cloud
MIST Effective Masquerade Attack Detection in the CloudMIST Effective Masquerade Attack Detection in the Cloud
MIST Effective Masquerade Attack Detection in the CloudKumar Goud
 
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...CREST @ University of Adelaide
 
CYBER SECURITY (R18A0521).pdf
CYBER SECURITY (R18A0521).pdfCYBER SECURITY (R18A0521).pdf
CYBER SECURITY (R18A0521).pdfJayaMalaR6
 
The Future of Cybersecurity courses.pptx
The Future of Cybersecurity courses.pptxThe Future of Cybersecurity courses.pptx
The Future of Cybersecurity courses.pptxRykaBhatt
 
Collusion Attack: A Kernel-Based Privacy Preserving Techniques in Data Mining
Collusion Attack: A Kernel-Based Privacy Preserving Techniques in Data MiningCollusion Attack: A Kernel-Based Privacy Preserving Techniques in Data Mining
Collusion Attack: A Kernel-Based Privacy Preserving Techniques in Data Miningdbpublications
 
Application of Data Science in Cybersecurity.pptx
Application of Data Science in Cybersecurity.pptxApplication of Data Science in Cybersecurity.pptx
Application of Data Science in Cybersecurity.pptxchristinacammillus20
 
Corporate threat vector and landscape
Corporate threat vector and landscapeCorporate threat vector and landscape
Corporate threat vector and landscapeyohansurya2
 
The Future of Cyber Security - Matthew Rosenquist
The Future of Cyber Security - Matthew RosenquistThe Future of Cyber Security - Matthew Rosenquist
The Future of Cyber Security - Matthew RosenquistMatthew Rosenquist
 
Data Storage Issues in Cloud Computing
Data Storage Issues in Cloud ComputingData Storage Issues in Cloud Computing
Data Storage Issues in Cloud Computingijtsrd
 
Module 1- Introduction to Cybercrime.pptx
Module 1- Introduction to Cybercrime.pptxModule 1- Introduction to Cybercrime.pptx
Module 1- Introduction to Cybercrime.pptxnikshaikh786
 
ATS Connection.pdf
ATS Connection.pdfATS Connection.pdf
ATS Connection.pdfZOOTSEO
 
INFORMATION SECURITY: THREATS AND SOLUTIONS.
INFORMATION SECURITY: THREATS AND SOLUTIONS.INFORMATION SECURITY: THREATS AND SOLUTIONS.
INFORMATION SECURITY: THREATS AND SOLUTIONS.Ni
 
Introduction to cyber security.pptx
Introduction to cyber security.pptxIntroduction to cyber security.pptx
Introduction to cyber security.pptxSharmaAnirudh2
 

Similar to Data Privacy and Security Course Overview (20)

Ethical Hacking
Ethical HackingEthical Hacking
Ethical Hacking
 
Top Cyber Security Interview Questions and Answers 2022.pdf
Top Cyber Security Interview Questions and Answers 2022.pdfTop Cyber Security Interview Questions and Answers 2022.pdf
Top Cyber Security Interview Questions and Answers 2022.pdf
 
MIST Effective Masquerade Attack Detection in the Cloud
MIST Effective Masquerade Attack Detection in the CloudMIST Effective Masquerade Attack Detection in the Cloud
MIST Effective Masquerade Attack Detection in the Cloud
 
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
 
CYBER SECURITY (R18A0521).pdf
CYBER SECURITY (R18A0521).pdfCYBER SECURITY (R18A0521).pdf
CYBER SECURITY (R18A0521).pdf
 
cyber security.pdf
cyber security.pdfcyber security.pdf
cyber security.pdf
 
Cyber security
Cyber security Cyber security
Cyber security
 
Module 1.pdf
Module 1.pdfModule 1.pdf
Module 1.pdf
 
module 1 Cyber Security Concepts
module 1 Cyber Security Conceptsmodule 1 Cyber Security Concepts
module 1 Cyber Security Concepts
 
The Future of Cybersecurity courses.pptx
The Future of Cybersecurity courses.pptxThe Future of Cybersecurity courses.pptx
The Future of Cybersecurity courses.pptx
 
CCA study group
CCA study groupCCA study group
CCA study group
 
Collusion Attack: A Kernel-Based Privacy Preserving Techniques in Data Mining
Collusion Attack: A Kernel-Based Privacy Preserving Techniques in Data MiningCollusion Attack: A Kernel-Based Privacy Preserving Techniques in Data Mining
Collusion Attack: A Kernel-Based Privacy Preserving Techniques in Data Mining
 
Application of Data Science in Cybersecurity.pptx
Application of Data Science in Cybersecurity.pptxApplication of Data Science in Cybersecurity.pptx
Application of Data Science in Cybersecurity.pptx
 
Corporate threat vector and landscape
Corporate threat vector and landscapeCorporate threat vector and landscape
Corporate threat vector and landscape
 
The Future of Cyber Security - Matthew Rosenquist
The Future of Cyber Security - Matthew RosenquistThe Future of Cyber Security - Matthew Rosenquist
The Future of Cyber Security - Matthew Rosenquist
 
Data Storage Issues in Cloud Computing
Data Storage Issues in Cloud ComputingData Storage Issues in Cloud Computing
Data Storage Issues in Cloud Computing
 
Module 1- Introduction to Cybercrime.pptx
Module 1- Introduction to Cybercrime.pptxModule 1- Introduction to Cybercrime.pptx
Module 1- Introduction to Cybercrime.pptx
 
ATS Connection.pdf
ATS Connection.pdfATS Connection.pdf
ATS Connection.pdf
 
INFORMATION SECURITY: THREATS AND SOLUTIONS.
INFORMATION SECURITY: THREATS AND SOLUTIONS.INFORMATION SECURITY: THREATS AND SOLUTIONS.
INFORMATION SECURITY: THREATS AND SOLUTIONS.
 
Introduction to cyber security.pptx
Introduction to cyber security.pptxIntroduction to cyber security.pptx
Introduction to cyber security.pptx
 

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Recently uploaded (20)

CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

Data Privacy and Security Course Overview

  • 1. Data Privacy and Security(DPS) Asst. Prof.: Bagesh Kumar Course Code- DPS3204 M.Tech.-Ph.D.: IIIT Allahabad
  • 2. 2 SYLLABUS Introduction to Data Privacy, types of privacy attacks, Data linking and profiling, access control models, role based access control, privacy policies, their specifications, privacy policy languages, privacy in different domains-medical, financial, etc. Mathematical model for comparing real-world data sharing Practices, computing privacy and risk measurements. Demographics and Uniqueness. Protection Models-Nuil-map, k-map, Wrong map. Survey of techniques-Protection models (null-map, k-map, wrong map), Disclosure control, Inferring entity identities, entry specific databases. Computation systems for protecting delimited data-Min Gen, Datafly, Mu-Argus, k-Similar. Introduction to Security: The OSI Security Architecture, Security Attacks, Services and Mechanisms, Model for Network Security, Number theory, Cryptographic Hash Functions, Digital Signatures, System Security, ‘Symmetric Encryption and Message Confidentiality, Substitution ciphers, Stream ciphers, Public-key cryptography and Message Authentication, Key Distribution and Authentication, Transport Layer Security, Wireless Network Security, E-mail Security, IP Security, Security Management Systems, Need for IT Security, Intrusion Prevention and Detection Systems, Cyber Security. Security metrics: Design, Data sources, Analysis of security metrics data, Measuring security cost and value, Different Context for security process management. Acquisition and Duplication: Sterilizing Evidence Media, ‘Acquiring Forensics Images, Acquiring Live Volatile Data, Data Analysis, Metadata Extraction, and File ‘System Analysis.
  • 3. Course Plan Lecture by Professor 3 A professor will teach and elaborate the concepts in this lecture Lecture by Industry Expert An Industry Expert will share his learnings and practical experiences in this lecture
  • 4. 4 Aseem Shrey ● Placed at Publicis.Sapient ● Summer Intern : Innovaccer ● Google Bug Bounty ● Digilocker Bug Bounty ● 1st Place (India), 7th Place (World) nullcon CTF 2018 ● 20th Place , DRDO 60 Cyber Challenge - CTF 2018 ● 1st Place , Terminal Tragedy - CTF, organised by NIT Trichy 2017 ● 5th Place , CSAW Finals (India Region) 2017 Security Engineer at Rippling
  • 5. 5 OP VYAS ●Alumnus with Technical University of Kaiserslautern (Germany). ●Master of Technology (Computer Sc.) from Indian Institute of Technology-Kharagpur. ●Ph.D. jointly with TU Kaiserslautern & IIT-Kharagpur as DAAD Fellow (1995-97) Worked at ‘Fraunhofer Institute for Software Engineering’ (Kaiserslautern – Germany), Visiting Professor in IIM-Rohtak for one Semester, Fraunhofer AIS, St. Augustin, Germany ●Currently working as , Dean (Technology Development), Professor (IT) at IIIT Allahabad (India) formerly worked as Dean (Academic) and Dean (Research). Research Area: Data Science, Business Informatics and Cyber Security ●Worked as Visiting Professor in 5 different Countries with Collaborative projects with Germany, France, Norway, Italy and Japan in Data Science, Machine Learning & IoT. Completed 06 Funded projects with one Indo-Norway project, one Indo-French Project, One Indo-Italy and two Indo-German Projects. ● Guided 16 Ph.D. scholars, ~50 Master Theses, ~ 75 B.Tech. Students’ theses. More than ~160 publications and 04 Books.
  • 6. 6 Research Scholars a a Ayush Sinha Kumar Saurabh ●Data Science for Cyber Physical Security. ●Deep Learning Techniques for Time Series data ●Intrusion Detection using Machine Learning ●Cyber Security in Industrial IoT. Harshit Gupta ●Federated Learning in Industrial IoT.
  • 7. 7 SIDDHART SIMHARAJU ❖ Case Studies: ➢ Building Hacker Resume to get 16,000 daily users in 3 months ➢ Redesigning website to get 5 Million users in 2 years ➢
  • 8. 8 Luv Kumar ● Platform Engineer @Directi. ● Ex Goldman Sachs ● Former Coordinator Competitive-Coding wing, IIITA. ● Teaching Assistant - Data structures and algorithms, IIIT Allahabad. Backend Engineer at Directi | Teacher | Youtuber
  • 9. 9 Abhinav Khare ● Software @Directi. ● Ex Goldman Sachs ● Former Coordinator Competitive-Coding wing, GeekHaven. ● Teaching Assistant - Data structures and algorithms, IIIT Allahabad. ● Youtuber with 150k Subscriber Backend Engineer at Directi | Teacher | Youtuber
  • 10. 10
  • 11. 11 Santosh Gautham ● ML Engineer in Berlin, Germany ● He has interests in ML and AI ● Former Software Intern @Headout ● Research Intern @University of Messina, Italy ● Co-Founder of Hack in the North and Nybles for Students articles. Founder at Polynomial Protocol
  • 13. Table of contents 1 2 3 4 5 Introduction Data Privacy Types of Privacy Attacks Data Linking and Profiling Summary 13 Presentation title 20XX
  • 14. Introduction • What is Data? • Importance of Data: Data is modern day currency if any online service is free they use your data as their payment. • What is Information? • DataBase Management and DBMS • Data Privacy vs Data Security vs Data Protection 14 Data Privacy and Security 2023
  • 15. Data Privacy Data Privacy deals with the proper handling of data with focus on compliance with data protection regulations.
  • 16. Basic Components of Security ● CIA ○ Confidentiality: Who is authorized to use data? ○ Integrity: Is data good? ○ Availability: Can access data whenever need it? ● CIAAAN ○ Authentication ○ Authorization ○ Non-repudiation 16
  • 17. Critical Infrastructure Areas ● Telecommunications ● Electrical power systems ● Water supply systems ● Gas and oil pipelines ● Transportation ● Government services ● Emergency services ● Banking and finance 17
  • 18. 18
  • 19. What is Data Privacy? • Data privacy refers to the proper use and processing of personal data by restoring control over their data to individuals. • Data privacy enables individuals to decide and limit access to the use and sharing of their personal data. • Data privacy is centered around how data should be collected, stored, managed, and shared with any third parties, as well as compliance with the applicable privacy laws. • Data protection laws around the world aim to give back individuals control over their data. • Empowering individuals to know how their data is being used, by whom and why, giving them control how their personal data is being processed and used. 19 Data Privacy and Security 2022
  • 20. Three Key Elements of Data Privacy Right of an individual to be left alone and control over their personal data. Procedures for proper handling, processing, collecting and sharing of personal data. Compliance with data protection laws. 20 Data Privacy and Security 2023
  • 21. Importance of Data Privacy When data that should’ve been private gets into the wrong hands, bad things can happen. A data breach at a government agency can, for example, put top-secret information in the hands of an enemy state. A breach at a corporation can put proprietary data in the hands of a competitor. A breach at a school could put student’s PII in the hands of criminals who could commit identity theft. A breach at a hospital’s or doctor’s office can put PHI In the hands of those who might misuse it.
  • 23. Types of Cyber Attacks Malware Attack Phishing Attack Password Attack Man-in-the-Middle Attack SQL Injection Attack 23 Data Privacy and Security 20XX Denial-of-Service Attack Insider Threat Cryptojacking Zero-Day Exploit Watering Hole Attack
  • 24. Malware Attack •“Malware” refers to malicious software viruses including worms, spyware, ransomware, adware, and trojans. 1. The trojan virus disguises itself as legitimate software. 2. Ransomware blocks access to the network's key components. 3. Spyware is software that steals all your confidential data without your knowledge. 4. Adware is software that displays advertising content on a user's screen. •Malware breaches a network through a vulnerability. When the user clicks a dangerous link, it downloads an email attachment or when an infected pen drive is used. 24 Data Security and Privacy 20XX
  • 25. Phishing Attack •Phishing attacks are one of the most prominent widespread types of cyber attacks. •It is a type of social engineering attack wherein an attacker impersonates to be a trusted contact and sends the victim fake emails. •Unaware of this, the victim opens the mail and clicks on the malicious link or opens the email’s attachment. •Attackers gain access to confidential information and account credentials. •They can also install malware through a phishing attack. 25 Presentation title 20XX
  • 26. Password Attack •It is a form of attack wherein a hacker cracks your password with various programs and password-cracking tools like Aircrack, Cain, Abel, John the Ripper, Hashcat etc. •There are different types of password attacks like brute force attacks, dictionary attacks, and keylogger attacks. 26 Presentation title 20XX
  • 27. Man-in-the-Middle Attack •In this attack, an attacker comes in between a two-party communication, i.e., the attacker hijacks the session between a client and host. By doing so, hackers steal and manipulate data. •Client-server communication is cut off, and the communication line goes through the hacker. 27 Presentation title 20XX
  • 28. SQL Injection Attack •A Structured Query Language (SQL) injection attack occurs on a database-driven website when the hacker manipulates a standard SQL query. •It is carried out by injecting malicious code into a vulnerable website search box, thereby making the server reveal crucial information. •This results in the attacker being able to view, edit, and delete tables in the databases. Attackers can also get administrative rights through this. 28 Presentation title 20XX
  • 29. Denial-of-Service Attack •A Denial-of-Service Attack is a significant threat to companies. Here, attackers target systems, servers, or networks and flood them with traffic to exhaust their resources and bandwidth. •When this happens, catering to the incoming requests becomes overwhelming for the servers, resulting in the website it hosts either shutting down or slowing down. This leaves legitimate service requests unattended. •It is also known as a DDoS (Distributed Denial-of-Service) attack when attackers use multiple compromised systems to launch this attack. 29 Presentation title 20XX
  • 30. Insider Threat •An insider threat does not involve a third party but an insider. In such a case; it could be an individual from within the organization who knows everything about the organization. •Insider threats have the potential to cause tremendous damage. •Insider threats are rampant in small businesses, as the staff there hold access to multiple accounts with data. •Reasons for this form of attack are many, it can be greed, malice, or even carelessness. Insider threats are hard to predict and hence tricky. 30 Presentation title 20XX
  • 31. Cryptojacking •Cryptojacking takes place when attackers access someone else’s computer for mining cryptocurrency. •The access is gained by infecting a website or manipulating the victim to click on a malicious link. •They also use online ads with JavaScript code for this. Victims are unaware of this as the Crypto mining code works in the background; a delay in the execution is the only sign they might witness. 31 Presentation title 20XX
  • 32. Zero-Day Exploit •A Zero-Day Exploit happens after the announcement of a network vulnerability; there is no solution for the vulnerability in most cases. Hence the vendor notifies the vulnerability so that the users are aware; however, this news also reaches the attackers. •Depending on the vulnerability, the vendor or the developer could take any amount of time to fix the issue. Meanwhile, the attackers target the disclosed vulnerability. They make sure to exploit the vulnerability even before a patch or solution is implemented for it. 32 Presentation title 20XX
  • 33. Watering Hole Attack •The victim here is a particular group of an organization, region, etc. •The attacker targets websites that are frequently used by the targeted group. Websites are identified either by closely monitoring the group or by guessing. •Attackers infect these websites with malware, which infects the victims' systems. The malware in such an attack targets the user's personal information. •It is also possible for the hacker to take remote access to the infected computer. 33 Presentation title 20XX
  • 34. Data Linking And Profiling The process of combining multiple sources of data about a person in order to create a detailed profile of that person's characteristics, preferences, and behavior.
  • 35. What is Data Linking? •Data linking is the process of collating information from different sources in order to create a more valuable and helpful data set. •The linking of information about the same person or entity from disparate sources allows, among other things, the construction of a chronological sequence of events. This information is of immense value at the policy level to derive meaningful decisions. •For example information about children in a local community can help decide on the volume of early childhood programs required and school locations. 35 Presentation title 20XX
  • 36. Procedure for Data Linking •The advisor takes information from the different custodians and extracts the data needed for this particular process. •The data is run through advanced computer software and the individual records are linked across all the required data sets. •Once the data is linked, each individual is given a unique code which is called the ‘linkage key,’ and the individual is de-identified. •The customer (Person who needs linked data) will use each linkage key later to connect the data of separate individuals across all these data sets. •For Example: A researcher wants to work for welfare of society. 36 Presentation title 20XX
  • 37. Ways to Link Data •. 37 Presentation title 20XX •A unique identifier is available on each data set that establishes the links between these data sets. It is also called deterministic or exact linking because the unique identifiers either match completely, or do not at all. This method means there is no uncertainty Unique Identifier •The linkage key works like a substitute for the unique identifier •This key is created using information like name and address available on both data sets. These linkage keys maintain the privacy of the person or entity as the key is used in place of the name and address. Linkage Key •It is based on the probability that the pair of records, taken from one data set, refers to the same entity or person. Probabilistic Linking •This technique combines records similar to the entity but not necessarily the same person or organization. •This kind of data linking may not give the most accurate results but does provide a pattern or trend from the given information or statistics. Statistical Linking
  • 38. 38 Presentation title 20XX Scope of Data Linking Life Sciences Government Healthcare Libraries Archives and Universities Social Media Business Uses
  • 39. Benefits and Challenges of Data Linking Benefits Helps in Research and Policymaking Integral Tool for Business Research Time Saving Challenges Lack of Common Entity Identifiers Long Delays in Approvals Inconsistent or Incomplete Data 39 Presentation title 20XX
  • 40. What is Data Profiling? •Data Profiling can be defined as the process of examining and analyzing data to create valuable summaries of it. The process yields a high-level overview that aids in •Data Profiling can eliminate costly errors that are common in databases. These errors include incorrect or missing values, values outside the range, unexpected patterns in data, etc. i. discovering data quality issues, ii. risks, and iii. overall trends 40 Presentation title 20XX
  • 41. What is Data Profiling? (Contd.) •For example: report or analysis, data warehousing or business intelligence projects may necessitate gathering data from numerous distinct systems or databases. Before moving on with these projects, data profiling can assist detect potential flaws and corrections in extract, transform, and load (ETL) activities and other data integration procedures. •Usually, it is combined with an ETL process. If performed correctly, Data Profiling and ETL can together be leveraged to cleanse, enrich, and load quality data into a target location. 41 Presentation title 20XX
  • 42. What is Data Profiling? (Contd.) It involves the following processes: •Collecting Descriptive Statistics such as minimum and maximum values, count of values, etc. •Performing data quality assessment. •Identifying data types, recurring patterns, etc. •Tagging data with descriptions and keywords. •Group data into categories. •Identifying the metadata and its accuracy. •Performing inter-table analysis. •Identifying functional dependencies, embedded value dependencies, distributions, key candidates, foreign-key candidates, etc. 42 Presentation title 20XX
  • 43. Types of Data Profiling Structure Discovery This type of profiling involves performing mathematical checks on the data such as sum, minimum, maximum, etc., along with other Descriptive Statistics. Content Discovery Content Discovery profiling involves looking into individual data records to identify errors. Content Discovery identifies which rows in a given dataset contain problems or any systemic issues occurring in the data. Relationship Discovery Relationship Discovering involves identifying how parts of the data are related to each other. For example, identifying key relationships between tables in a database etc. 43 Presentation title 20XX
  • 44. Data Profiling Methods Column Profiling In this method, the number of times every value appears within each column of a table is counted. This method helps to uncover patterns within the data. Cross-column Profiling We look across columns to perform Key and Dependency Analysis. Key Analysis is implemented to scan the collections of values in a table to identify a potential Primary Key. Dependency Analysis determines the dependent relationships within data sets. These analyses can be leveraged to determine the relationships and dependencies across tables. Cross-table Profiling In this method, users look across tables to identify all potential Foreign Keys. It also attempts to identify similarities and differences among data types and syntax between tables to determine which data can be mapped together and which might be redundant. 44 Presentation title 20XX
  • 45. Benefits of Data Profiling 45 Presentation title 20XX Improved Data Quality and Credibility Proactive Crisis Management Predictive Decision-making Organized Sorting
  • 46. Data Profiling Challenges • Actual work required is quite complicated, with various processes occurring from data acquisition to data warehousing. • One of the obstacles that businesses have when attempting to create and operate a successful data profiling program is complexity. • Another problem is the sheer volume of data generated by a typical firm, as well as the variety of data sources, which range from cloud-based systems to endpoint devices deployed as part of an internet-of-things ecosystem. • The rapidity with which data enters an organization adds to the difficulty of implementing a successful data profiling program. • These data preparation issues are magnified in organizations that have not yet adopted current data profiling tools and still rely on manual processes for the majority of their data preparation. • Similarly, organizations that lack necessary resources, such as Trained Data Experts, Tools, and Financing, will find it more difficult to overcome these obstacles. 46 Presentation title 20XX
  • 47. BEST DATA PROFILING TOOLS IBM InfoSphere Information Analyzer SAP Business Objects Data Services (BODS) Informatica Data Explorer Melissa Data Profiler Melissa Data Profiling Pricing