2. privacy-preserving
data mining?
Privacy-preserving data mining refers to a set of
methods that enable the analysis of data without
disclosing the sensitive information of the
organizations or individuals whose data is being
examined. This can be achieved through a variety of
techniques
3. Key Techniques in Privacy-
Preserving Data Mining
Anonymization :
K-Anonymity: Ensures that each
record in a dataset is indistinguishable from at least
k-1 other records. This
is typically achieved by generalizing or suppressing
certain attributes.
L-Diversity:
Extends k-anonymity by ensuring that each group of
indistinguishable records
has at least l distinct values for a sensitive attribute.
T-Closeness:
Focuses on the distribution of sensitive attributes
within each anonymized group, ensuring that
it is close to the overall distribution in the dataset.
4. Anonymization
Techniques of Privacy Preservation Data Publishing(clinical
data).
Data anonymisation, also known as data masking or
desensitization, is used to hide or conceal any sensitive data
about an individual, thus preventing the re-identification of the
individual
5. Differential Privacy:
Differential
privacy mechanisms: It introduces noise
or randomness to query responses in a
way that protects individual privacy.
The key idea is to ensure that the
inclusion or exclusion of a single record
does not significantly impact the
outcome of a query.
6. Data Perturbation
This involve sintroducing controlled noise or
random variations to individual data points,
making it more challenging to identify specific
individuals while still extracting valuable insights.
Various transformation techniques such as:
scaling, shearing, reflection, rotation etc. can also
be used to perturb data points.
7. Encryption:
Enables computations to be performed on
encrypted data without decrypting it.
Allows data to be kept confidential even during
processing, reducing the exposure of sensitive
information.
8. Searching in encryption
Searchable Encryption (SE) Enables
computations to be performed on
encrypted data without decrypting it.
Allows data to be kept confidential even
during processing, reducing the exposure
of sensitive
Information.
The servers use Boolean or Ranked search
9. SSE (Searchable Symmetric Encryption) - AES
algorithm with either CTR or CBC
PEKS (Public Key Encryption with Key-word
Search)
-RSA or ElGamal, or Elliptic Curve
Cryptography (ECC)
10. Searchable Encryption
Process
As SSE approaches use the same secret key for
encryption and decryption, they are computationally
efficient than the ASE, which uses a different key for
both encryption and decyption. SSE approaches are
primarily meant for enabling data access only by data
owners that have the corresponding secret key.
To enable the data access by other users in SSE, the
data owners have to provide them with either the
secret key or the encrypted queries.
On the other hand, PEKS approaches [23] are meant
for sharing the stored data with multiple users, and it
also supports expressive queries, but these
approaches are computationally expensive.
11. Searchable Encryption Process
Encryption of indexes: Once the plaintext indexes are generated, they need to be encrypted using
encryption schemes such as AES, RSA, Deterministic Encryption (DE), Functional encryption(FE),
Predicate Encryption, etc.
Encryption of plaintext documents: Data owners are then required to encrypt the plaintext documents
using either a public key or secret key encryption scheme with a sufficiently large key.Incorporation of
ranking information: Searchable indexes should include ranking information determined using various
keyword weight measures such as Term-Frequency (TF), Term-Frequency-Inverse Document
Frequency (TF-IDF), etc. This ranking information also needs to be encrypted using encryption
schemes such as Order Preserving Encryption (OPE), Fully Homomorphic Encryption (FHE), and
Paillier Encryption (PE).Upload of encrypted documents and indexes to cloud servers: Data owners
then upload the encrypted documents and their corresponding searchable indexes onto the cloud
servers.
lIssuance of
trapdoors and retrieval of documents: Data owners and authorized data users
issue trapdoors, i.e., queries in encrypted form along with a value 'k' to the
cloud server, which returns the top-k documents with the help of ranking
12. Generate plaintext indexes
other statistical methods to determine the relevance of each token within the document collection.
lConstructing the
index: Create a data structure, such as an inverted index, that maps each token
to the documents in which it appears and the corresponding weights or relevance
scores. This allows for efficient retrieval of documents based on search queries.
lEncryption: Once
the plaintext indexes are generated, they need to be encrypted using suitable
encryption schemes such as AES, RSA, or other cryptographic methods to ensure
the confidentiality and privacy of the index information.Generation of plaintext indexes: Data owners
are required to generate plaintext indexes for all their documents. This can be achieved using
different indexing schemes such as Forward indexing and Inverted indexing, each represented by
various data structures.
Encryption of indexes: Once the plaintext indexes are generated, they need to be encrypted using
encryption schemes such as AES, RSA, Deterministic Encryption (DE), Functional encryption(FE),
Predicate Encryption, etc.
Encryption of plaintext documents: Data owners are then required to encrypt the plaintext documents
using either a public key or secret key encryption scheme with a sufficiently large key.Incorporation of
13. Constructing the index
Indexing is a process in information retrieval where an index is created to map terms or keywords to
their occurrences in a document.
It helps in quickly locating the positions of terms within the document, enabling efficient
retrieval of information.
Forward
indexing, Inverted indexing, tree-based indexing, Bloom filters, Bucketization.
14. Encrypting the indices
Symetric, public keye ncryption, order-preserving encryption, and homomorphic encryption.
Generate Encryption Keys: Depending on the chosen encryption scheme, generate the necessary
encryption keys. For symmetric key encryption, a single key is used for both encryption and
decryption. In the case of public key encryption, a public key is used for encryption, and a
corresponding private key is used for decryption.
Apply Encryption to Indexes: Use the encryption scheme and keys to encrypt the plaintext
indexes. This involves applying cryptographic algorithms to transform the index data into
ciphertext, rendering it unintelligible to unauthorized parties.
15. Assigning weights
Relevance Ranking:
-These measures help rank documents
based on their relevance to a user's
query.
-Users are often interested in obtaining
the most relevant information first, and
ranking helps prioritize results.
-Term-Frequency (TF),
Term-Frequency-Inverse Document
Frequency (TF-IDF), Cosine Similarity,
Cosine Similarity and Jaccard Coefficient
16. Ranking information
Ranking information in indexes also needs to
be encrypted by using Order Preserving
Encryption (OPE) [34],
Fully Homomorphic Encryption (FHE[43], and
Paillier Encryption (PE) [25] schemes.
These schemes can be used for encrypting the
ranking information as they preserve the
order of plaintext ranking information and
they also enable ranking lbased on the
encrypted ranking information [44], [45].
17. TYPES OF SEARCH
Boolean Search Ranked search
Single Keyword Boolean Search
Multi-Keyword Boolean Search
Single Keyword Ranked Search
Muti- Keyword Ranked Search