SlideShare a Scribd company logo
Wafaa Anani (MCDBA, MCSD)
Electrical & Computer Engineering – Software Engineering, UWO
wanani@uwo.ca
 Introduction
 Data Mining Roles
 Data Provider
 Data Collector
 Data Miner
 Decision Maker
 Game Theory
 None Technical Solution
 Future Research Area
 Conclusion
 References
 Big Data
 Is a term that describes the large volume of data – both
structured and unstructured.
 Is a term used for data set so large or complex that it is
difficult to process using traditional database and software
techniques.
 Data Mining
 Data mining is the process of discovering interesting
patterns and knowledge from large amount of data.
 Data Mining has been successfully applied to many
domains, such as business intelligence, web search,
scientific discovery, digital library, etc.
Data Mining is also refers to “Knowledge Discovery from Data” (KDD)
To obtain useful knowledge from data as the following steps :
 Step 1 : Data Preprocessing (Data selection, cleaning, and integration)
 Step 2 : Data Transformation (transform data into form appropriate for the mining task)
 Step 3 : Data Mining (extract data patterns)
 Step 4 : Pattern Evaluation and Presentation (present the knowledge in an easy to
understand)
 Data Mining technologies bring serious threat to the security of individual’s
sensitive information.
 Reduce the privacy risk brought by Data Mining operations.
 We need to modify the data in such a way so as to perform Data Mining
algorithms effectively without compromising the security of sensitive information
contained in the data.
 Individual’s privacy maybe violated due to the unauthorized access to personal
data. Thus there is a conflict between data mining and privacy security.
 Privacy Preserving Data Mining (PPDM)
 To deal with the privacy issues in data mining.
 Objective of PPDM is to safeguard sensitive information from unsolicited or
unsanctioned disclosure, and mean while, preserve the utility of the data.
 Consideration of PPDM is:
 1. Sensitive raw data (IDs, Phone number.. Etc.) Should not be used in Data Mining.
 2. Sensitive mining results whose disclosure will result in privacy violation should be
excluded.
Data Database
Data Provider Data Collector Data Minor
Extracted Info.
Information Transmitter
Decision Maker
The user who
owns some
data that are
desired by the
data mining
task
The user who
collects data from
data provider and
then publish it to
the data miner
The user who
performs data
mining tasks on
the data.
The user who makes
decisions based on
the data mining
results in order to
achieve certain goals
 Privacy Concerns of each Role
 Approaches to Privacy Protection Data Provider
Data Collector
Data Miner
Decision Maker
The user who owns some data that are desired by the
data mining task
 If the Data Provider reveals his data to the Data Collector, his privacy might be
compromised due to the unexpected data breach.
 The privacy concern of the Data Provider is weather he can take control over what
kind of and how much information other people can obtain from his data.
 Data Provider should be able to make his sensitive data, inaccessible to the data
collector, However, the Data Provider has to provide some data, and get enough
compensation for the possible loss in privacy
 Limit The Access
Security tools developed for internet environment to protect data:
 Anti-tracking Extensions (Do Not Track Me, Ghostery, etc.)
 Advertisement and script blockers (AdBlock Plus, NoScript, FlashBlock, etc.)
 Encryption Tools (MailCloack, TorChat, etc.)
 Trade Privacy
 Data Provider needs to make a trade-off between the loss of privacy and the benefit brought by participating in data
mining.
 Data Provider needs to know how to negotiate with the data collector, so that he will get enough compensation for any
possible loss in privacy
 Data Provider may be willing to provide his sensitive data to Data Collector who promises that his sensitive information
will not be revealed.
 Provide False Data
 Using “Sockpuppets” to hide one’s true activities
 Using fake Identity to create phony information
 Using security tools to mask one’s Identity
The user who collects data from data provider and
then publish it to the data miner
Data Database
Data Provider Data Collector Data Minor
Extracted Info.
Information Transmitter
Decision Maker
 The original data collected from Data Providers usually contains a sensitive
information about individuals. If the Data Collector doesn’t take sufficient
precautions before releasing the data to public or data miners, those sensitive
information maybe disclosed.
 It is necessary for the Data Collector to modify the original data before releasing it
to others, so that sensitive information about the Data Provider can not be found.
 The modifications to the data should retained the sufficient utility of the data
after the modifications.
1. Basic Of PPDP
2. Privacy-Preserving publishing of social media
3. Attack Model
4. Privacy-Preserving Publishing of trajectory data
BASIC OF PPDPThe data modification process adopted by the Data Collector, with the goal of preserving
privacy, and utility simultaneously, is usually called Privacy-Preserving Data Publishing
(PPDP)
 Basic Of PPDP
 The original data is assumed to be private table consisting of multiple records, each record
contains : Identifier (ID), Quasi-Identifier (QID), Sensitive Attribute (SA), Non-sensitive
Attribute (NSA).
 The table should be anonymized before published to others, IDs should be removed, QID should
modified.
 K-Anonymity are the most privacy model used, among other privacy models.
BASIC OF PPDP Anonymization operations:
 Generalization : Replace some values with a parent value
 Suppression : Replace some values with a special value e.g. ‘*’
 Anatomization : De-associate the relationship between the QID and sensitive attribute
 Permutation: De-associate the relationship between the QID and the numerical Sensitive
attribute)
 Perturbation: Replace the original data value with synthetic data value, so the computation
would be still the same if it was to be done on the original data
 The Anonymization operation will reduce the utility of the data, there are various
metrics for measuring the information loss.
 A fundamental problem of PPDP is how to make a trade-off between privacy and utility
PRIVACY-PRESERVING PUBLISHING OF
SOCIAL MEDIA Social network usually modeled as a graph, where the vertex represents an entity and the
edge represent the relationship between two entities.
 PPDP in the context of social network mainly deals with anonymizing graph data.
 It is more challenging than anonymizing relation data table
 There are three challenges in social network:
 Modeling adversary’s background knowledge about network is much harder
 Measuring the information loss in anonymizing social network data is harder than relations data.
 Devising anonymization method for social network data is much harder than for relational data.
ATTACK MODEL Given the anonymized network data, adversaries usually rely on background knowledge to de-
anonymize individuals and learn relationships between de-anonymized individuals
 Attack Model is to find the social relationship between the de-anonymized individuals.
 Type of back ground knowledge:
 Attribute of vertices, vertex degrees, Link relationship, Neighborhoods, embedded subgraphs
and graph metrics
 A proposed algorithm called ‘Seed-and-Grow’ to identify uses from an anonymized social graph.
The algorithm identifies a seed sub-graph which is either planted by an attacker or divulged by
collusion of small group of users, then grows the seed larger based on the existing knowledge of t
user’s social relations. e.g. (Structural attack, Mutual friend attack, Friendship attack, degree
attack.)
ATTACK MODEL Privacy Model
 In order to protect the privacy of relationship from the mutual friend attack, a variant of k-
anonymity introduces k-NMF anonymity.
 If the Network satisfies k-NMF anonymity then each edge e, here will be at least k - 1 other
edges with the same number of mutual friends as e. It can be guaranteed that the probability of
an edge being identified is not greater than 1/k
ATTACK MODEL Data Utility
 In the context of network data anonymization, the implication of data utility is : whether and to
what extent properties of the graph are preserved.
 Most Existing K-anonymization algorithms for network data publishing perform edge insertion
and/or deletion operation, to reduce the utility loss.
PRIVACY-PRESERVING PUBLISHING OF
TRAJECTORY DATA Location Based Services (LBS) : by utilizing the location information of individuals.
 Locate a restaurant, or monitor congestion levels of traffic
 Use of private location information may raise a privacy issues in LBS, for publishing
trajectory data of individuals.
 Redefine the k-anonymity for trajectories and proposed (k, ẟ)-anonymity
The user who performs data mining tasks on the
data.
Data Database
Data Provider Data Collector Data Miner
Extracted Info.
Information Transmitter
Decision Maker
 Personal Information can be directly observed in the data and data breach happens.
 If the Data Miner is able to find out information underlying the data. (Sometimes the
data mining may reveal sensitive information bout the data owners)
 Data Miner also face the Privacy-Utility trade-off problem.
 The main concern of the Data Miner is HOW to prevent sensitive information from
appearing in the mining result
 To perform a privacy-preserving data mining, the Data Miner usually need to modify
the data he got from the Data Collector
 Based on the distribution of data, PPDM approaches can be classified:
 Approaches for Centralized Data Mining
 Approaches for Distributed Data Mining
 Horizontally partitioned data
 Vertically partitioned data
 With distributed data mining, Secure Multi-party Computation (SMC) widely
used
 The goal of SMC to make sure that each participant can get the correct data
mining result without revealing his data to others.
P1, P2, P3, ……….. , Pm  Participants
X1, X2, X3, ………. , Xm  Data
 Privacy-Preserving Association Rule Mining
 Privacy-Preserving Classification
 Privacy-Preserving Clustering
PRIVACY-PRESERVING ASSOCIATION
RULE MINING Privacy-Preserving Association Rule Mining
 Finding interesting associations and correlation relationships among large set of data
items (e.g. Basket Analysis)
 Some of the rule considered to be sensitive
 Generate a sanitized data set (Rule Hiding)
 Heuristic distortion approaches
 Heuristic blocking approaches
 Probabilistic distortion approaches
 Reconstruction-based approaches
 Hybrid partial hiding (HPH)
 Inverse frequent set mining (IFM)
PRIVACY-PRESERVING
CLASSIFICATION Privacy-Preserving Classification
 Classification : is a form of data analysis that extract models describing important data
classes
 Data Classification seen as two-steps:
 Step 1: Learning step, classification algorithm is employed to build a classifier (Classification
model).
 Step 2: the classifier is used for classification
 Classification models :
 Decision Tree
 Naïve Bayesian Classification
 Support Vector Machine
 Privacy-Preserving Clustering
 Clustering the data to group them.
 Data Miner can modify the original data via randomization, blocking, or
reconstruction. The modification often has negative affect on the utility of the
data.
 Data Miner needs to make a balance between privacy and utility. The implication
of privacy and utility vary with the characteristic of data and purpose of the
mining task.
The user who makes decisions based on the data
mining results in order to achieve certain goals.
Data Database
Data Provider Data Collector Data Minor
Extracted Info.
Information Transmitter
Decision Maker
 The privacy concerns of the Decision Maker are:
 How to prevent unwanted disclosure of sensitive mining result
 How to evaluate the credibility of the received mining result.
 1ST Issue:
 Legal Measures
 making a contract with the data miner to forbid the miner from disclosing the mining result to a
third party.
 2nd Issue:
 The Decision Maker can utilize methodologies from Data Provenance, credibility
analysis of web information, or other related research fields
DATA PROVENANCE Data Provenance :
 The information that helps determine the derivation history of the data, starting from the original source
 Provenance, which describe Where the data come from, and How the data evolved over the time, can
help people to evaluate the credibility of the data.
 Provenance contains two kinds of information:
 Ancestral data from which current data evolved.
 Transformations applied to ancestral data that helped to produce the current data.
 However, in most cases provenance of the data mining results is not available
 The major approach to present the provenance information is adding annotations to data.
WEB INFORMATION CREDIBILITY
 Web Information Credibility
 Users can differentiate false information from the truth based on :
 Authority : the real author of false information is usually not clear
 Accuracy: false information does not contain accurate data
 Objectivity: false information is often prejudicial
 Currency: for false information, the data about its source, time, and place of its origin is
incomplete, out of date or missing
 Coverage : false information usually contains no effective links to other information online
 Game theory provides a formal approach to model situations where a group of
agents have to choose optimum actions considering the mutual effects of other
agents' decisions.
 The essential elements of a game are: players, actions, payoffs, and information.
 Players have actions that they can perform at designated times in the game. As a
result of the performed actions, players receive payoffs.
 PRIVATE DATA COLLECTION AND PUBLICATION
 In this data collection game, the level of privacy protection has significant influence on
each player's action and payoff.
 PRIVACY PRESERVING DISTRIBUTED DATA MINING
 SMC-Bases privacy preserving distributed Data Mining
 Recommender System
 Linear Progression as a non-cooperative game
 DATA ANONYMIZATION
 Game Model :
 Define the elements of the game, namely the players, the actions and the payoffs
 Determine the type of the game: static or dynamic, complete information or incomplete
information
 Solve the game to find equilibriums
 Analyze the equilibriums to obtain some implications for practice
 The Data Collector wants Data Providers to participate in the data mining
activity, i.e. hand over their private data, but the Data Providers may choose to
opt-out because of the privacy concerns. In order to get useful data mining results,
the Data Collector needs to design mechanisms to encourage Data Providers to
opt-in.
 Mechanisms for Truthful Data Sharing
 A mechanism requires agents to report their preferences over the outcomes.
 Privacy Auctions
 Law and regulations
 USA – Privacy Act 1974
 European commission – General Data Protection Regulation 2012
 Industry conventions.
 Agreement between organization to how to collect, analyze, and store personal data,
should help to create Privacy-Safe environment
 Enhance the education to increase the awareness of information security
 Personalized Privacy Preserving
 Developing practical personalized anonymization methods.
 Introducing Personalize Privacy into other type of PPDP/PPDM.
 Data Customization
 A concept was introduced for data mining called “Reverse Data Management “ (RDM)
which it is similar to Inverse data mining. RDM covers a lot of Data problems: Inversion
mapping, provenance, data generation, view update, constraint-based repair, etc.
(We may consider RDM to be a family of data customization methods)
 Provenance for Data Mining
 New techniques and mechanisms that can support Provenance in Data Mining context
should receive more attention.
 Each user role has its own privacy concerns and approaches to Preserve-Privacy
with maintain the data utility.
 Lei Xu, Chunxiao Jiang, Jian Wang, Jain Yuan and Young
Ren, Information security in Big Data: Privacy and Data
Mining, Access, IEEE, 2014
Thank you

More Related Content

What's hot

Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
DataminingTools Inc
 
Gis based urban transportation system
Gis based urban transportation systemGis based urban transportation system
Gis based urban transportation system
Dinesh Kumar Azad
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3Laila Fatehy
 
Security issues and attacks in wireless sensor networks
Security issues and attacks in wireless sensor networksSecurity issues and attacks in wireless sensor networks
Security issues and attacks in wireless sensor networks
Md Waresul Islam
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Impetus Technologies
 
Data aggregation in wireless sensor network , 11751 d5811
Data aggregation in wireless sensor network , 11751 d5811Data aggregation in wireless sensor network , 11751 d5811
Data aggregation in wireless sensor network , 11751 d5811
praveen369
 
WB-2022-01-25-India Data Protection Bill
WB-2022-01-25-India Data Protection BillWB-2022-01-25-India Data Protection Bill
WB-2022-01-25-India Data Protection Bill
TrustArc
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
Hadi Fadlallah
 
Introduction to Ethics of Big Data
Introduction to Ethics of Big DataIntroduction to Ethics of Big Data
Introduction to Ethics of Big Data28 Burnside
 
Fundamentals of Information Systems Security Chapter 1
Fundamentals of Information Systems Security Chapter 1Fundamentals of Information Systems Security Chapter 1
Fundamentals of Information Systems Security Chapter 1
Dr. Ahmed Al Zaidy
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
DataminingTools Inc
 
Wireless sensor networks
Wireless sensor networksWireless sensor networks
Wireless sensor networks
GodspowerAgbulu
 
Wireless sensor network and its application
Wireless sensor network and its applicationWireless sensor network and its application
Wireless sensor network and its application
Roma Vyas
 
Routing in Wireless Sensor Network
Routing in Wireless Sensor NetworkRouting in Wireless Sensor Network
Routing in Wireless Sensor Network
Aarthi Raghavendra
 
Data mining in telecommunications industry
Data mining in telecommunications industryData mining in telecommunications industry
Data mining in telecommunications industry
Issa Memari
 
Data visualization in a Nutshell
Data visualization in a NutshellData visualization in a Nutshell
Data visualization in a Nutshell
WingChan46
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
SAS Singapore Institute Pte Ltd
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
tsering choezom
 

What's hot (20)

Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Gis based urban transportation system
Gis based urban transportation systemGis based urban transportation system
Gis based urban transportation system
 
08 clustering
08 clustering08 clustering
08 clustering
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 
Security issues and attacks in wireless sensor networks
Security issues and attacks in wireless sensor networksSecurity issues and attacks in wireless sensor networks
Security issues and attacks in wireless sensor networks
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Data aggregation in wireless sensor network , 11751 d5811
Data aggregation in wireless sensor network , 11751 d5811Data aggregation in wireless sensor network , 11751 d5811
Data aggregation in wireless sensor network , 11751 d5811
 
WB-2022-01-25-India Data Protection Bill
WB-2022-01-25-India Data Protection BillWB-2022-01-25-India Data Protection Bill
WB-2022-01-25-India Data Protection Bill
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Introduction to Ethics of Big Data
Introduction to Ethics of Big DataIntroduction to Ethics of Big Data
Introduction to Ethics of Big Data
 
5desc
5desc5desc
5desc
 
Fundamentals of Information Systems Security Chapter 1
Fundamentals of Information Systems Security Chapter 1Fundamentals of Information Systems Security Chapter 1
Fundamentals of Information Systems Security Chapter 1
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Wireless sensor networks
Wireless sensor networksWireless sensor networks
Wireless sensor networks
 
Wireless sensor network and its application
Wireless sensor network and its applicationWireless sensor network and its application
Wireless sensor network and its application
 
Routing in Wireless Sensor Network
Routing in Wireless Sensor NetworkRouting in Wireless Sensor Network
Routing in Wireless Sensor Network
 
Data mining in telecommunications industry
Data mining in telecommunications industryData mining in telecommunications industry
Data mining in telecommunications industry
 
Data visualization in a Nutshell
Data visualization in a NutshellData visualization in a Nutshell
Data visualization in a Nutshell
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
 

Viewers also liked

Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
Peter Wood
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
Cloudera, Inc.
 
Big data security the perfect storm
Big data security   the perfect stormBig data security   the perfect storm
Big data security the perfect stormUlf Mattsson
 
Mr. satish kumar, schnieder electric
Mr. satish kumar, schnieder electricMr. satish kumar, schnieder electric
Mr. satish kumar, schnieder electric
Rohan Pinto
 
Open-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit ReportOpen-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit Report
Innovative Management Services
 
Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title)
Coastal Pet Products, Inc.
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environmentEvans Ye
 
BigDataEurope - Big Data & Energy
BigDataEurope - Big Data & EnergyBigDataEurope - Big Data & Energy
BigDataEurope - Big Data & Energy
BigData_Europe
 
Big Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesBig Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and Challenges
Gregg Barrett
 
Add
AddAdd
"Big Data" in the Energy Industry
"Big Data" in the Energy Industry"Big Data" in the Energy Industry
"Big Data" in the Energy Industry
Paige Bailey
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data science
Mahesh Kumar CV
 
Smart Analytics For The Utility Sector
Smart Analytics For The Utility SectorSmart Analytics For The Utility Sector
Smart Analytics For The Utility Sector
Herman Bosker
 
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneOpen-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Innovative Management Services
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
REAL-TIME BIG DATA ANALYTICAL ARCHITECTURE FOR REMOTE SENSING APPLICATION
REAL-TIME BIG DATA ANALYTICAL ARCHITECTURE FOR REMOTE SENSING APPLICATIONREAL-TIME BIG DATA ANALYTICAL ARCHITECTURE FOR REMOTE SENSING APPLICATION
REAL-TIME BIG DATA ANALYTICAL ARCHITECTURE FOR REMOTE SENSING APPLICATION
I3E Technologies
 
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
CA API Management
 

Viewers also liked (20)

Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Big data security
Big data securityBig data security
Big data security
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big data security the perfect storm
Big data security   the perfect stormBig data security   the perfect storm
Big data security the perfect storm
 
Mr. satish kumar, schnieder electric
Mr. satish kumar, schnieder electricMr. satish kumar, schnieder electric
Mr. satish kumar, schnieder electric
 
Open-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit ReportOpen-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit Report
 
Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title)
 
Building hadoop based big data environment
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environment
 
BigDataEurope - Big Data & Energy
BigDataEurope - Big Data & EnergyBigDataEurope - Big Data & Energy
BigDataEurope - Big Data & Energy
 
Big Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesBig Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and Challenges
 
Add
AddAdd
Add
 
"Big Data" in the Energy Industry
"Big Data" in the Energy Industry"Big Data" in the Energy Industry
"Big Data" in the Energy Industry
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data science
 
Smart Analytics For The Utility Sector
Smart Analytics For The Utility SectorSmart Analytics For The Utility Sector
Smart Analytics For The Utility Sector
 
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th JuneOpen-BDA - Big Data Hadoop Developer Training 10th & 11th June
Open-BDA - Big Data Hadoop Developer Training 10th & 11th June
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
REAL-TIME BIG DATA ANALYTICAL ARCHITECTURE FOR REMOTE SENSING APPLICATION
REAL-TIME BIG DATA ANALYTICAL ARCHITECTURE FOR REMOTE SENSING APPLICATIONREAL-TIME BIG DATA ANALYTICAL ARCHITECTURE FOR REMOTE SENSING APPLICATION
REAL-TIME BIG DATA ANALYTICAL ARCHITECTURE FOR REMOTE SENSING APPLICATION
 
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
 

Similar to Information Security in Big Data : Privacy and Data Mining

Data attribute security and privacy in Collaborative distributed database Pub...
Data attribute security and privacy in Collaborative distributed database Pub...Data attribute security and privacy in Collaborative distributed database Pub...
Data attribute security and privacy in Collaborative distributed database Pub...
International Journal of Engineering Inventions www.ijeijournal.com
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Ib3514141422
Ib3514141422Ib3514141422
Ib3514141422
IJERA Editor
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
redpel dot com
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
Pvrtechnologies Nellore
 
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data MiningA Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
ijujournal
 
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data MiningA Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
ijujournal
 
A review on privacy preservation in data mining
A review on privacy preservation in data miningA review on privacy preservation in data mining
A review on privacy preservation in data mining
ijujournal
 
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data MiningA Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
ijujournal
 
Privacy Preserving Data Mining Using Inverse Frequent ItemSet Mining Approach
Privacy Preserving Data Mining Using Inverse Frequent ItemSet Mining ApproachPrivacy Preserving Data Mining Using Inverse Frequent ItemSet Mining Approach
Privacy Preserving Data Mining Using Inverse Frequent ItemSet Mining Approach
IRJET Journal
 
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
IJSRD
 
IRJET - Security Model for Preserving the Privacy of Medical Big Data in ...
IRJET -  	  Security Model for Preserving the Privacy of Medical Big Data in ...IRJET -  	  Security Model for Preserving the Privacy of Medical Big Data in ...
IRJET - Security Model for Preserving the Privacy of Medical Big Data in ...
IRJET Journal
 
Cloud assisted mobile-access of health data with privacy and auditability
Cloud assisted mobile-access of health data with privacy and auditabilityCloud assisted mobile-access of health data with privacy and auditability
Cloud assisted mobile-access of health data with privacy and auditability
IGEEKS TECHNOLOGIES
 
Final review m score
Final review m scoreFinal review m score
Final review m scoreazhar4010
 
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionMultilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
IOSR Journals
 
Ej24856861
Ej24856861Ej24856861
Ej24856861
IJERA Editor
 
A survey on privacy preserving data publishing
A survey on privacy preserving data publishingA survey on privacy preserving data publishing
A survey on privacy preserving data publishing
ijcisjournal
 
Big Data and Information Security
Big Data and Information SecurityBig Data and Information Security
Big Data and Information Security
ijceronline
 
Privacy Preserving Data Leak Detection for Sensitive Data
Privacy Preserving Data Leak Detection for Sensitive DataPrivacy Preserving Data Leak Detection for Sensitive Data
Privacy Preserving Data Leak Detection for Sensitive Data
paperpublications3
 
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data MiningPerformance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
idescitation
 

Similar to Information Security in Big Data : Privacy and Data Mining (20)

Data attribute security and privacy in Collaborative distributed database Pub...
Data attribute security and privacy in Collaborative distributed database Pub...Data attribute security and privacy in Collaborative distributed database Pub...
Data attribute security and privacy in Collaborative distributed database Pub...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Ib3514141422
Ib3514141422Ib3514141422
Ib3514141422
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
 
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data MiningA Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
 
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data MiningA Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
 
A review on privacy preservation in data mining
A review on privacy preservation in data miningA review on privacy preservation in data mining
A review on privacy preservation in data mining
 
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data MiningA Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
 
Privacy Preserving Data Mining Using Inverse Frequent ItemSet Mining Approach
Privacy Preserving Data Mining Using Inverse Frequent ItemSet Mining ApproachPrivacy Preserving Data Mining Using Inverse Frequent ItemSet Mining Approach
Privacy Preserving Data Mining Using Inverse Frequent ItemSet Mining Approach
 
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
 
IRJET - Security Model for Preserving the Privacy of Medical Big Data in ...
IRJET -  	  Security Model for Preserving the Privacy of Medical Big Data in ...IRJET -  	  Security Model for Preserving the Privacy of Medical Big Data in ...
IRJET - Security Model for Preserving the Privacy of Medical Big Data in ...
 
Cloud assisted mobile-access of health data with privacy and auditability
Cloud assisted mobile-access of health data with privacy and auditabilityCloud assisted mobile-access of health data with privacy and auditability
Cloud assisted mobile-access of health data with privacy and auditability
 
Final review m score
Final review m scoreFinal review m score
Final review m score
 
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionMultilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
 
Ej24856861
Ej24856861Ej24856861
Ej24856861
 
A survey on privacy preserving data publishing
A survey on privacy preserving data publishingA survey on privacy preserving data publishing
A survey on privacy preserving data publishing
 
Big Data and Information Security
Big Data and Information SecurityBig Data and Information Security
Big Data and Information Security
 
Privacy Preserving Data Leak Detection for Sensitive Data
Privacy Preserving Data Leak Detection for Sensitive DataPrivacy Preserving Data Leak Detection for Sensitive Data
Privacy Preserving Data Leak Detection for Sensitive Data
 
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data MiningPerformance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Mining
 

Recently uploaded

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 

Recently uploaded (20)

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 

Information Security in Big Data : Privacy and Data Mining

  • 1. Wafaa Anani (MCDBA, MCSD) Electrical & Computer Engineering – Software Engineering, UWO wanani@uwo.ca
  • 2.  Introduction  Data Mining Roles  Data Provider  Data Collector  Data Miner  Decision Maker  Game Theory  None Technical Solution  Future Research Area  Conclusion  References
  • 3.  Big Data  Is a term that describes the large volume of data – both structured and unstructured.  Is a term used for data set so large or complex that it is difficult to process using traditional database and software techniques.  Data Mining  Data mining is the process of discovering interesting patterns and knowledge from large amount of data.  Data Mining has been successfully applied to many domains, such as business intelligence, web search, scientific discovery, digital library, etc.
  • 4.
  • 5. Data Mining is also refers to “Knowledge Discovery from Data” (KDD) To obtain useful knowledge from data as the following steps :  Step 1 : Data Preprocessing (Data selection, cleaning, and integration)  Step 2 : Data Transformation (transform data into form appropriate for the mining task)  Step 3 : Data Mining (extract data patterns)  Step 4 : Pattern Evaluation and Presentation (present the knowledge in an easy to understand)
  • 6.  Data Mining technologies bring serious threat to the security of individual’s sensitive information.  Reduce the privacy risk brought by Data Mining operations.  We need to modify the data in such a way so as to perform Data Mining algorithms effectively without compromising the security of sensitive information contained in the data.
  • 7.  Individual’s privacy maybe violated due to the unauthorized access to personal data. Thus there is a conflict between data mining and privacy security.  Privacy Preserving Data Mining (PPDM)  To deal with the privacy issues in data mining.  Objective of PPDM is to safeguard sensitive information from unsolicited or unsanctioned disclosure, and mean while, preserve the utility of the data.  Consideration of PPDM is:  1. Sensitive raw data (IDs, Phone number.. Etc.) Should not be used in Data Mining.  2. Sensitive mining results whose disclosure will result in privacy violation should be excluded.
  • 8. Data Database Data Provider Data Collector Data Minor Extracted Info. Information Transmitter Decision Maker The user who owns some data that are desired by the data mining task The user who collects data from data provider and then publish it to the data miner The user who performs data mining tasks on the data. The user who makes decisions based on the data mining results in order to achieve certain goals
  • 9.  Privacy Concerns of each Role  Approaches to Privacy Protection Data Provider Data Collector Data Miner Decision Maker
  • 10. The user who owns some data that are desired by the data mining task
  • 11.  If the Data Provider reveals his data to the Data Collector, his privacy might be compromised due to the unexpected data breach.  The privacy concern of the Data Provider is weather he can take control over what kind of and how much information other people can obtain from his data.  Data Provider should be able to make his sensitive data, inaccessible to the data collector, However, the Data Provider has to provide some data, and get enough compensation for the possible loss in privacy
  • 12.  Limit The Access Security tools developed for internet environment to protect data:  Anti-tracking Extensions (Do Not Track Me, Ghostery, etc.)  Advertisement and script blockers (AdBlock Plus, NoScript, FlashBlock, etc.)  Encryption Tools (MailCloack, TorChat, etc.)  Trade Privacy  Data Provider needs to make a trade-off between the loss of privacy and the benefit brought by participating in data mining.  Data Provider needs to know how to negotiate with the data collector, so that he will get enough compensation for any possible loss in privacy  Data Provider may be willing to provide his sensitive data to Data Collector who promises that his sensitive information will not be revealed.  Provide False Data  Using “Sockpuppets” to hide one’s true activities  Using fake Identity to create phony information  Using security tools to mask one’s Identity
  • 13. The user who collects data from data provider and then publish it to the data miner
  • 14. Data Database Data Provider Data Collector Data Minor Extracted Info. Information Transmitter Decision Maker
  • 15.  The original data collected from Data Providers usually contains a sensitive information about individuals. If the Data Collector doesn’t take sufficient precautions before releasing the data to public or data miners, those sensitive information maybe disclosed.  It is necessary for the Data Collector to modify the original data before releasing it to others, so that sensitive information about the Data Provider can not be found.  The modifications to the data should retained the sufficient utility of the data after the modifications.
  • 16. 1. Basic Of PPDP 2. Privacy-Preserving publishing of social media 3. Attack Model 4. Privacy-Preserving Publishing of trajectory data
  • 17. BASIC OF PPDPThe data modification process adopted by the Data Collector, with the goal of preserving privacy, and utility simultaneously, is usually called Privacy-Preserving Data Publishing (PPDP)  Basic Of PPDP  The original data is assumed to be private table consisting of multiple records, each record contains : Identifier (ID), Quasi-Identifier (QID), Sensitive Attribute (SA), Non-sensitive Attribute (NSA).  The table should be anonymized before published to others, IDs should be removed, QID should modified.  K-Anonymity are the most privacy model used, among other privacy models.
  • 18. BASIC OF PPDP Anonymization operations:  Generalization : Replace some values with a parent value  Suppression : Replace some values with a special value e.g. ‘*’  Anatomization : De-associate the relationship between the QID and sensitive attribute  Permutation: De-associate the relationship between the QID and the numerical Sensitive attribute)  Perturbation: Replace the original data value with synthetic data value, so the computation would be still the same if it was to be done on the original data  The Anonymization operation will reduce the utility of the data, there are various metrics for measuring the information loss.  A fundamental problem of PPDP is how to make a trade-off between privacy and utility
  • 19.
  • 20. PRIVACY-PRESERVING PUBLISHING OF SOCIAL MEDIA Social network usually modeled as a graph, where the vertex represents an entity and the edge represent the relationship between two entities.  PPDP in the context of social network mainly deals with anonymizing graph data.  It is more challenging than anonymizing relation data table  There are three challenges in social network:  Modeling adversary’s background knowledge about network is much harder  Measuring the information loss in anonymizing social network data is harder than relations data.  Devising anonymization method for social network data is much harder than for relational data.
  • 21. ATTACK MODEL Given the anonymized network data, adversaries usually rely on background knowledge to de- anonymize individuals and learn relationships between de-anonymized individuals  Attack Model is to find the social relationship between the de-anonymized individuals.  Type of back ground knowledge:  Attribute of vertices, vertex degrees, Link relationship, Neighborhoods, embedded subgraphs and graph metrics  A proposed algorithm called ‘Seed-and-Grow’ to identify uses from an anonymized social graph. The algorithm identifies a seed sub-graph which is either planted by an attacker or divulged by collusion of small group of users, then grows the seed larger based on the existing knowledge of t user’s social relations. e.g. (Structural attack, Mutual friend attack, Friendship attack, degree attack.)
  • 22.
  • 23.
  • 24. ATTACK MODEL Privacy Model  In order to protect the privacy of relationship from the mutual friend attack, a variant of k- anonymity introduces k-NMF anonymity.  If the Network satisfies k-NMF anonymity then each edge e, here will be at least k - 1 other edges with the same number of mutual friends as e. It can be guaranteed that the probability of an edge being identified is not greater than 1/k
  • 25. ATTACK MODEL Data Utility  In the context of network data anonymization, the implication of data utility is : whether and to what extent properties of the graph are preserved.  Most Existing K-anonymization algorithms for network data publishing perform edge insertion and/or deletion operation, to reduce the utility loss.
  • 26. PRIVACY-PRESERVING PUBLISHING OF TRAJECTORY DATA Location Based Services (LBS) : by utilizing the location information of individuals.  Locate a restaurant, or monitor congestion levels of traffic  Use of private location information may raise a privacy issues in LBS, for publishing trajectory data of individuals.  Redefine the k-anonymity for trajectories and proposed (k, ẟ)-anonymity
  • 27. The user who performs data mining tasks on the data.
  • 28. Data Database Data Provider Data Collector Data Miner Extracted Info. Information Transmitter Decision Maker
  • 29.  Personal Information can be directly observed in the data and data breach happens.  If the Data Miner is able to find out information underlying the data. (Sometimes the data mining may reveal sensitive information bout the data owners)  Data Miner also face the Privacy-Utility trade-off problem.  The main concern of the Data Miner is HOW to prevent sensitive information from appearing in the mining result  To perform a privacy-preserving data mining, the Data Miner usually need to modify the data he got from the Data Collector
  • 30.  Based on the distribution of data, PPDM approaches can be classified:  Approaches for Centralized Data Mining  Approaches for Distributed Data Mining  Horizontally partitioned data  Vertically partitioned data
  • 31.  With distributed data mining, Secure Multi-party Computation (SMC) widely used  The goal of SMC to make sure that each participant can get the correct data mining result without revealing his data to others. P1, P2, P3, ……….. , Pm  Participants X1, X2, X3, ………. , Xm  Data
  • 32.  Privacy-Preserving Association Rule Mining  Privacy-Preserving Classification  Privacy-Preserving Clustering
  • 33. PRIVACY-PRESERVING ASSOCIATION RULE MINING Privacy-Preserving Association Rule Mining  Finding interesting associations and correlation relationships among large set of data items (e.g. Basket Analysis)  Some of the rule considered to be sensitive  Generate a sanitized data set (Rule Hiding)  Heuristic distortion approaches  Heuristic blocking approaches  Probabilistic distortion approaches  Reconstruction-based approaches  Hybrid partial hiding (HPH)  Inverse frequent set mining (IFM)
  • 34.
  • 35.
  • 36. PRIVACY-PRESERVING CLASSIFICATION Privacy-Preserving Classification  Classification : is a form of data analysis that extract models describing important data classes  Data Classification seen as two-steps:  Step 1: Learning step, classification algorithm is employed to build a classifier (Classification model).  Step 2: the classifier is used for classification  Classification models :  Decision Tree  Naïve Bayesian Classification  Support Vector Machine
  • 37.  Privacy-Preserving Clustering  Clustering the data to group them.
  • 38.  Data Miner can modify the original data via randomization, blocking, or reconstruction. The modification often has negative affect on the utility of the data.  Data Miner needs to make a balance between privacy and utility. The implication of privacy and utility vary with the characteristic of data and purpose of the mining task.
  • 39. The user who makes decisions based on the data mining results in order to achieve certain goals.
  • 40. Data Database Data Provider Data Collector Data Minor Extracted Info. Information Transmitter Decision Maker
  • 41.  The privacy concerns of the Decision Maker are:  How to prevent unwanted disclosure of sensitive mining result  How to evaluate the credibility of the received mining result.
  • 42.  1ST Issue:  Legal Measures  making a contract with the data miner to forbid the miner from disclosing the mining result to a third party.  2nd Issue:  The Decision Maker can utilize methodologies from Data Provenance, credibility analysis of web information, or other related research fields
  • 43. DATA PROVENANCE Data Provenance :  The information that helps determine the derivation history of the data, starting from the original source  Provenance, which describe Where the data come from, and How the data evolved over the time, can help people to evaluate the credibility of the data.  Provenance contains two kinds of information:  Ancestral data from which current data evolved.  Transformations applied to ancestral data that helped to produce the current data.  However, in most cases provenance of the data mining results is not available  The major approach to present the provenance information is adding annotations to data.
  • 44. WEB INFORMATION CREDIBILITY  Web Information Credibility  Users can differentiate false information from the truth based on :  Authority : the real author of false information is usually not clear  Accuracy: false information does not contain accurate data  Objectivity: false information is often prejudicial  Currency: for false information, the data about its source, time, and place of its origin is incomplete, out of date or missing  Coverage : false information usually contains no effective links to other information online
  • 45.
  • 46.  Game theory provides a formal approach to model situations where a group of agents have to choose optimum actions considering the mutual effects of other agents' decisions.  The essential elements of a game are: players, actions, payoffs, and information.  Players have actions that they can perform at designated times in the game. As a result of the performed actions, players receive payoffs.
  • 47.  PRIVATE DATA COLLECTION AND PUBLICATION  In this data collection game, the level of privacy protection has significant influence on each player's action and payoff.  PRIVACY PRESERVING DISTRIBUTED DATA MINING  SMC-Bases privacy preserving distributed Data Mining  Recommender System  Linear Progression as a non-cooperative game  DATA ANONYMIZATION
  • 48.  Game Model :  Define the elements of the game, namely the players, the actions and the payoffs  Determine the type of the game: static or dynamic, complete information or incomplete information  Solve the game to find equilibriums  Analyze the equilibriums to obtain some implications for practice
  • 49.
  • 50.  The Data Collector wants Data Providers to participate in the data mining activity, i.e. hand over their private data, but the Data Providers may choose to opt-out because of the privacy concerns. In order to get useful data mining results, the Data Collector needs to design mechanisms to encourage Data Providers to opt-in.  Mechanisms for Truthful Data Sharing  A mechanism requires agents to report their preferences over the outcomes.  Privacy Auctions
  • 51.
  • 52.  Law and regulations  USA – Privacy Act 1974  European commission – General Data Protection Regulation 2012  Industry conventions.  Agreement between organization to how to collect, analyze, and store personal data, should help to create Privacy-Safe environment  Enhance the education to increase the awareness of information security
  • 53.
  • 54.  Personalized Privacy Preserving  Developing practical personalized anonymization methods.  Introducing Personalize Privacy into other type of PPDP/PPDM.  Data Customization  A concept was introduced for data mining called “Reverse Data Management “ (RDM) which it is similar to Inverse data mining. RDM covers a lot of Data problems: Inversion mapping, provenance, data generation, view update, constraint-based repair, etc. (We may consider RDM to be a family of data customization methods)  Provenance for Data Mining  New techniques and mechanisms that can support Provenance in Data Mining context should receive more attention.
  • 55.
  • 56.  Each user role has its own privacy concerns and approaches to Preserve-Privacy with maintain the data utility.
  • 57.
  • 58.  Lei Xu, Chunxiao Jiang, Jian Wang, Jain Yuan and Young Ren, Information security in Big Data: Privacy and Data Mining, Access, IEEE, 2014