This document discusses the roles involved in data mining processes and privacy concerns. It describes the roles of data provider, data collector, data miner, and decision maker. For each role, it outlines their privacy concerns and approaches that can be used to address those concerns, such as limiting data access, anonymization techniques, and secure multi-party computation. The goal of privacy-preserving data mining is to protect sensitive information while still allowing for useful knowledge discovery from data.
Privacy and Data Security in Data miningAbhishek L.R
Presentation on Privacy and Security in Data Mining. Where it contains the information about the Mining Process, how it is done, what are the major threats in the process.etc...
Today, more than ever, computer networks are utilized for sharing services and resources. Information travelling across a shared IP-based network, such as the Internet, could be exposed to many devious acts such as eavesdropping, forgery and manipulation. Fortunately, there are several mechanisms that can protect any information that needs to be sent over a network. This paper introduces security threats to today’s IP-based networks and explains available security mechanisms to effectively prevent such threats from happening.
Privacy and Data Security in Data miningAbhishek L.R
Presentation on Privacy and Security in Data Mining. Where it contains the information about the Mining Process, how it is done, what are the major threats in the process.etc...
Today, more than ever, computer networks are utilized for sharing services and resources. Information travelling across a shared IP-based network, such as the Internet, could be exposed to many devious acts such as eavesdropping, forgery and manipulation. Fortunately, there are several mechanisms that can protect any information that needs to be sent over a network. This paper introduces security threats to today’s IP-based networks and explains available security mechanisms to effectively prevent such threats from happening.
The urban transportation systems are very complex in nature, they combine different modes of transportation over a limited space in high density areas with increasing transport demand.
With variety of information needed in the field of Transportation, Geographic Information Systems (GIS) is used as a valuable tool for the representation and analysis of transportation systems.
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
Detecting anomalous patterns in data can lead to significant actionable insights in a wide variety of application domains, such as fraud detection, network traffic management, predictive healthcare, energy monitoring and many more.
However, detecting anomalies accurately can be difficult. What qualifies as an anomaly is continuously changing and anomalous patterns are unexpected. An effective anomaly detection system needs to continuously self-learn without relying on pre-programmed thresholds.
Join our speakers Ravishankar Rao Vallabhajosyula, Senior Data Scientist, Impetus Technologies and Saurabh Dutta, Technical Product Manager - StreamAnalytix, in a discussion on:
Importance of anomaly detection in enterprise data, types of anomalies, and challenges
Prominent real-time application areas
Approaches, techniques and algorithms for anomaly detection
Sample use-case implementation on the StreamAnalytix platform
INTRODUCTION TO WIRELESS SENSOR NETWORKS.
This powerpoint generally defines Wireless Sensor Networks, the advantages, disadvantages and the general types.
Big Data and Security - Where are we now? (2015)Peter Wood
Peter Wood started looking at Big Data as a solution for Advanced Threat Protection in 2013. This presentation examines how Big Data is being used for security in 2015, how this market is developing and how realistic vendor offerings are.
The urban transportation systems are very complex in nature, they combine different modes of transportation over a limited space in high density areas with increasing transport demand.
With variety of information needed in the field of Transportation, Geographic Information Systems (GIS) is used as a valuable tool for the representation and analysis of transportation systems.
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
Detecting anomalous patterns in data can lead to significant actionable insights in a wide variety of application domains, such as fraud detection, network traffic management, predictive healthcare, energy monitoring and many more.
However, detecting anomalies accurately can be difficult. What qualifies as an anomaly is continuously changing and anomalous patterns are unexpected. An effective anomaly detection system needs to continuously self-learn without relying on pre-programmed thresholds.
Join our speakers Ravishankar Rao Vallabhajosyula, Senior Data Scientist, Impetus Technologies and Saurabh Dutta, Technical Product Manager - StreamAnalytix, in a discussion on:
Importance of anomaly detection in enterprise data, types of anomalies, and challenges
Prominent real-time application areas
Approaches, techniques and algorithms for anomaly detection
Sample use-case implementation on the StreamAnalytix platform
INTRODUCTION TO WIRELESS SENSOR NETWORKS.
This powerpoint generally defines Wireless Sensor Networks, the advantages, disadvantages and the general types.
Big Data and Security - Where are we now? (2015)Peter Wood
Peter Wood started looking at Big Data as a solution for Advanced Threat Protection in 2013. This presentation examines how Big Data is being used for security in 2015, how this market is developing and how realistic vendor offerings are.
Big Data Analytics (BDA) is rapidly turning out to be a significant global enterprise need. It aims to facilitate the storage, querying and analysis of enterprise big data, which is getting more complicated and time-consuming with traditional database technologies. Apache Hadoop is a well-known Open-source BDA enterprise solution which is seeing an annual application growth rate of 60% globally.
With the rise of Apache Hadoop, a next-generation enterprise data architecture is emerging that allows organizations to efficiently rein in their big data business transactions. Hadoop is uniquely capable of storing, aggregating, querying and analyzing big data sources into formats that fuel new business insights. Organizations that embrace solution architectures focused on maximizing data-driven insights will put themselves in a position to drive more business, enhance productivity, maintain competitive edge or discover new and lucrative business opportunities. Over the coming years, Hadoop could be in a position to process more than half the world’s data.
To educate organizations about how best to leverage Apache Hadoop as a key component of their enterprise big data architecture, Innovative Management Services is pleased to host the 1st annual Open-BDA Hadoop Summit 2014 which is scheduled to be held on 18th & 19th November, 2014 at Marriott Hotel, Karachi.
The only way to get where we need to be in security analysis is if we use Security Intelligence. This means working harder and understanding the big picture of your data.
Big Data and the Energy domain (vis-a-vis the respective H2020 Societal Challenge) - Opportunities, Challenges and Requirements. As presented and discussed in the public launch of the BigDataEurope project.
Demystify big data data science
An overview of the shift to Data Science Platforms
The 3 critical components of a Data Science platform
Industries that are most likely to get disrupted and shift to Data Science
Characteristics of firms that get left behind the Data Science wave
Factors that push an industry towards Data Science
A brief overview of aspects of platform architecture beyond technology
Are you excited and want to learn Big Data Technologies? Do you feel that internet is loaded with free materials is complicated for a newbie?
There are many things that may go wrong when learning a new technology. Free internet material are sometimes can of worms for a beginner and training is advised for a jumpstart.
Open-BDA Big Data Hadoop Developer Training which is going to be held on 11th & 12th May 2015 @ Marriott Hotel Karachi, will cover everything you need to know to start a career in Hadoop technology and achieve expertise to a level where you can take certification exams with MAPR, Cloudera & Hortonworks with confidence. You can start as a beginner and this course will help you become a certified professional.
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...CA API Management
Chief Architect Francois Lascelles gave this presentation at Gartner Catalyst 2013. The user experience associated with mobile applications is a critical determinant of the adoption of the APIs that powers them. Mobile platforms and their public app stores create challenges when it comes to securing APIs consumed by mobile applications in such a way that does not require constant user prompts. This presentation will describe the challenge of providing positive UX patterns such as single sign-on on mobile platforms and explore API provider-side architectures enabling them.
In this era, there are need to secure data in distributed database system. For collaborative data
publishing some anonymization techniques are available such as generalization and bucketization. We consider
the attack can call as “insider attack” by colluding data providers who may use their own records to infer
others records. To protect our database from these types of attacks we used slicing technique for anonymization,
as above techniques are not suitable for high dimensional data. It cause loss of data and also they need clear
separation of quasi identifier and sensitive database. We consider this threat and make several contributions.
First, we introduce a notion of data privacy and used slicing technique which shows that anonymized data
satisfies privacy and security of data which classifies data vertically and horizontally. Second, we present
verification algorithms which prove the security against number of providers of data and insure high utility and
data privacy of anonymized data with efficiency. For experimental result we use the hospital patient datasets
and suggest that our slicing approach achieves better or comparable utility and efficiency than baseline
algorithms while satisfying data security. Our experiment successfully demonstrates the difference between
computation time of encryption algorithm which is used to secure data and our system.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A Review on Privacy Preservation in Data Miningijujournal
The main focus of privacy preserving data publishing was to enhance traditional data mining techniques
for masking sensitive information through data modification. The major issues were how to modify the data
and how to recover the data mining result from the altered data. The reports were often tightly coupled
with the data mining algorithms under consideration. Privacy preserving data publishing focuses on
techniques for publishing data, not techniques for data mining. In case, it is expected that standard data
mining techniques are applied on the published data. Anonymization of the data is done by hiding the
identity of record owners, whereas privacy preserving data mining seeks to directly belie the sensitive data.
This survey carries out the various privacy preservation techniques and algorithms.
A Review on Privacy Preservation in Data Miningijujournal
The main focus of privacy preserving data publishing was to enhance traditional data mining techniques
for masking sensitive information through data modification. The major issues were how to modify the data
and how to recover the data mining result from the altered data. The reports were often tightly coupled
with the data mining algorithms under consideration. Privacy preserving data publishing focuses on
techniques for publishing data, not techniques for data mining. In case, it is expected that standard data
mining techniques are applied on the published data. Anonymization of the data is done by hiding the
identity of record owners, whereas privacy preserving data mining seeks to directly belie the sensitive data.
This survey carries out the various privacy preservation techniques and algorithms.
A review on privacy preservation in data miningijujournal
The main focus of privacy preserving data publishing was to enhance traditional data mining techniques for masking sensitive information through data modification. The major issues were how to modify the data and how to recover the data mining result from the altered data. The reports were often tightly coupled with the data mining algorithms under consideration. Privacy preserving data publishing focuses on techniques for publishing data, not techniques for data mining. In case, it is expected that standard data mining techniques are applied on the published data. Anonymization of the data is done by hiding the identity of record owners, whereas privacy preserving data mining seeks to directly belie the sensitive data. This survey carries out the various privacy preservation techniques and algorithms.
A Review on Privacy Preservation in Data Miningijujournal
The main focus of privacy preserving data publishing was to enhance traditional data mining techniques
for masking sensitive information through data modification. The major issues were how to modify the data
and how to recover the data mining result from the altered data. The reports were often tightly coupled
with the data mining algorithms under consideration. Privacy preserving data publishing focuses on
techniques for publishing data, not techniques for data mining. In case, it is expected that standard data
mining techniques are applied on the published data. Anonymization of the data is done by hiding the
identity of record owners, whereas privacy preserving data mining seeks to directly belie the sensitive data.
This survey carries out the various privacy preservation techniques and algorithms.
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...IJSRD
Data mining is a technique which is used for extraction of knowledge and information from large amount of data collected by hospitals, government and individuals. The term data mining is also referred as knowledge mining from databases. The major challenge in data mining is ensuring security and privacy of data in databases, because data sharing is common at organizational level. The data in databases comes from a number of sources like – medical, financial, library, marketing, shopping record etc so it is foremost task for anyone to keep secure that data. The objective is to achieve fully privacy preserved data without affecting the data utility in databases. i.e. how data is used or transferred between organizations so that data integrity remains in database but sensitive and confidential data is preserved. This paper presents a brief study about different PPDM techniques like- Randomization, perturbation, Slicing, summarization etc. by use of which the data privacy can be preserved. The technique for which the best computational and theoretical outcome is achieved is chosen for privacy preserving in high dimensional data.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
A survey on privacy preserving data publishingijcisjournal
Data mining is a computational process of analysing and extracting the data from large useful datasets. In
recent years, exchanging and publishing data has been common for their wealth of opportunities. Security,
Privacy and data integrity are considered as challenging problems in data
mining.Privacy is necessary to protect people’s interest in competitive situations. Privacy is an abilityto
create and maintain different sort of social relationships with people. Privacy Preservation is one of the
most important factor for an individual since he should not embarrassed by an adversary. The Privacy
Preservation is an important aspect of data mining to ensure the privacy by various methods. Privacy
Preservation is necessary to protect sensitive information associated with individual. This paper provides a
survey of key to success and an approach where individual’s privacy would to be non-distracted.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Privacy Preserving Data Leak Detection for Sensitive Datapaperpublications3
Abstract: Number of data leaks in the organization, research institutions and security firms have grown rapidly in recent years. The data leakage occurs if there is no proper protection. The common approach is to monitor the data that are stored in the organization local network. The existing method require the plaintext sensitive data. However, this requirement is undesirable, as it may threaten the confidentiality of the sensitive information. A privacy preserving data-leak detection solution is proposed which can be outsourced and be deployed in a semi-honest detection environment. Fuzzy fingerprint technique is designed and implemented that enhances data privacy during data-leak detection operations. The DLD provider computes fingerprints from network traffic and identifies potential leaks in them. To prevent the DLD provider from gathering exact knowledge about the sensitive data, the collection of potential leaks is composed of real leaks and noises. The evaluation results show that this method can provide accurate detection.
Performance Analysis of Hybrid Approach for Privacy Preserving in Data Miningidescitation
Now-a day’s data sharing between two organizations
is common in many application areas like business planning
or marketing. When data are to be shared between parties,
there could be some sensitive data which should not be
disclosed to the other parties. Also medical records are more
sensitive so, privacy protection is taken more seriously. As
required by the Health Insurance Portability and
Accountability Act (HIPAA), it is necessary to protect the
privacy of patients and ensure the security of the medical
data. To address this problem, released datasets must be
modified unavoidably. We propose a method called Hybrid
approach for privacy preserving and implemented it. First we
randomized the original data. Then we have applied
generalization on randomized or modified data. This
technique protect private data with better accuracy, also it can
reconstruct original data and provide data with no information
loss, makes usability of data.
Similar to Information Security in Big Data : Privacy and Data Mining (20)
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
2. Introduction
Data Mining Roles
Data Provider
Data Collector
Data Miner
Decision Maker
Game Theory
None Technical Solution
Future Research Area
Conclusion
References
3. Big Data
Is a term that describes the large volume of data – both
structured and unstructured.
Is a term used for data set so large or complex that it is
difficult to process using traditional database and software
techniques.
Data Mining
Data mining is the process of discovering interesting
patterns and knowledge from large amount of data.
Data Mining has been successfully applied to many
domains, such as business intelligence, web search,
scientific discovery, digital library, etc.
4.
5. Data Mining is also refers to “Knowledge Discovery from Data” (KDD)
To obtain useful knowledge from data as the following steps :
Step 1 : Data Preprocessing (Data selection, cleaning, and integration)
Step 2 : Data Transformation (transform data into form appropriate for the mining task)
Step 3 : Data Mining (extract data patterns)
Step 4 : Pattern Evaluation and Presentation (present the knowledge in an easy to
understand)
6. Data Mining technologies bring serious threat to the security of individual’s
sensitive information.
Reduce the privacy risk brought by Data Mining operations.
We need to modify the data in such a way so as to perform Data Mining
algorithms effectively without compromising the security of sensitive information
contained in the data.
7. Individual’s privacy maybe violated due to the unauthorized access to personal
data. Thus there is a conflict between data mining and privacy security.
Privacy Preserving Data Mining (PPDM)
To deal with the privacy issues in data mining.
Objective of PPDM is to safeguard sensitive information from unsolicited or
unsanctioned disclosure, and mean while, preserve the utility of the data.
Consideration of PPDM is:
1. Sensitive raw data (IDs, Phone number.. Etc.) Should not be used in Data Mining.
2. Sensitive mining results whose disclosure will result in privacy violation should be
excluded.
8. Data Database
Data Provider Data Collector Data Minor
Extracted Info.
Information Transmitter
Decision Maker
The user who
owns some
data that are
desired by the
data mining
task
The user who
collects data from
data provider and
then publish it to
the data miner
The user who
performs data
mining tasks on
the data.
The user who makes
decisions based on
the data mining
results in order to
achieve certain goals
9. Privacy Concerns of each Role
Approaches to Privacy Protection Data Provider
Data Collector
Data Miner
Decision Maker
10. The user who owns some data that are desired by the
data mining task
11. If the Data Provider reveals his data to the Data Collector, his privacy might be
compromised due to the unexpected data breach.
The privacy concern of the Data Provider is weather he can take control over what
kind of and how much information other people can obtain from his data.
Data Provider should be able to make his sensitive data, inaccessible to the data
collector, However, the Data Provider has to provide some data, and get enough
compensation for the possible loss in privacy
12. Limit The Access
Security tools developed for internet environment to protect data:
Anti-tracking Extensions (Do Not Track Me, Ghostery, etc.)
Advertisement and script blockers (AdBlock Plus, NoScript, FlashBlock, etc.)
Encryption Tools (MailCloack, TorChat, etc.)
Trade Privacy
Data Provider needs to make a trade-off between the loss of privacy and the benefit brought by participating in data
mining.
Data Provider needs to know how to negotiate with the data collector, so that he will get enough compensation for any
possible loss in privacy
Data Provider may be willing to provide his sensitive data to Data Collector who promises that his sensitive information
will not be revealed.
Provide False Data
Using “Sockpuppets” to hide one’s true activities
Using fake Identity to create phony information
Using security tools to mask one’s Identity
13. The user who collects data from data provider and
then publish it to the data miner
14. Data Database
Data Provider Data Collector Data Minor
Extracted Info.
Information Transmitter
Decision Maker
15. The original data collected from Data Providers usually contains a sensitive
information about individuals. If the Data Collector doesn’t take sufficient
precautions before releasing the data to public or data miners, those sensitive
information maybe disclosed.
It is necessary for the Data Collector to modify the original data before releasing it
to others, so that sensitive information about the Data Provider can not be found.
The modifications to the data should retained the sufficient utility of the data
after the modifications.
16. 1. Basic Of PPDP
2. Privacy-Preserving publishing of social media
3. Attack Model
4. Privacy-Preserving Publishing of trajectory data
17. BASIC OF PPDPThe data modification process adopted by the Data Collector, with the goal of preserving
privacy, and utility simultaneously, is usually called Privacy-Preserving Data Publishing
(PPDP)
Basic Of PPDP
The original data is assumed to be private table consisting of multiple records, each record
contains : Identifier (ID), Quasi-Identifier (QID), Sensitive Attribute (SA), Non-sensitive
Attribute (NSA).
The table should be anonymized before published to others, IDs should be removed, QID should
modified.
K-Anonymity are the most privacy model used, among other privacy models.
18. BASIC OF PPDP Anonymization operations:
Generalization : Replace some values with a parent value
Suppression : Replace some values with a special value e.g. ‘*’
Anatomization : De-associate the relationship between the QID and sensitive attribute
Permutation: De-associate the relationship between the QID and the numerical Sensitive
attribute)
Perturbation: Replace the original data value with synthetic data value, so the computation
would be still the same if it was to be done on the original data
The Anonymization operation will reduce the utility of the data, there are various
metrics for measuring the information loss.
A fundamental problem of PPDP is how to make a trade-off between privacy and utility
19.
20. PRIVACY-PRESERVING PUBLISHING OF
SOCIAL MEDIA Social network usually modeled as a graph, where the vertex represents an entity and the
edge represent the relationship between two entities.
PPDP in the context of social network mainly deals with anonymizing graph data.
It is more challenging than anonymizing relation data table
There are three challenges in social network:
Modeling adversary’s background knowledge about network is much harder
Measuring the information loss in anonymizing social network data is harder than relations data.
Devising anonymization method for social network data is much harder than for relational data.
21. ATTACK MODEL Given the anonymized network data, adversaries usually rely on background knowledge to de-
anonymize individuals and learn relationships between de-anonymized individuals
Attack Model is to find the social relationship between the de-anonymized individuals.
Type of back ground knowledge:
Attribute of vertices, vertex degrees, Link relationship, Neighborhoods, embedded subgraphs
and graph metrics
A proposed algorithm called ‘Seed-and-Grow’ to identify uses from an anonymized social graph.
The algorithm identifies a seed sub-graph which is either planted by an attacker or divulged by
collusion of small group of users, then grows the seed larger based on the existing knowledge of t
user’s social relations. e.g. (Structural attack, Mutual friend attack, Friendship attack, degree
attack.)
22.
23.
24. ATTACK MODEL Privacy Model
In order to protect the privacy of relationship from the mutual friend attack, a variant of k-
anonymity introduces k-NMF anonymity.
If the Network satisfies k-NMF anonymity then each edge e, here will be at least k - 1 other
edges with the same number of mutual friends as e. It can be guaranteed that the probability of
an edge being identified is not greater than 1/k
25. ATTACK MODEL Data Utility
In the context of network data anonymization, the implication of data utility is : whether and to
what extent properties of the graph are preserved.
Most Existing K-anonymization algorithms for network data publishing perform edge insertion
and/or deletion operation, to reduce the utility loss.
26. PRIVACY-PRESERVING PUBLISHING OF
TRAJECTORY DATA Location Based Services (LBS) : by utilizing the location information of individuals.
Locate a restaurant, or monitor congestion levels of traffic
Use of private location information may raise a privacy issues in LBS, for publishing
trajectory data of individuals.
Redefine the k-anonymity for trajectories and proposed (k, ẟ)-anonymity
27. The user who performs data mining tasks on the
data.
28. Data Database
Data Provider Data Collector Data Miner
Extracted Info.
Information Transmitter
Decision Maker
29. Personal Information can be directly observed in the data and data breach happens.
If the Data Miner is able to find out information underlying the data. (Sometimes the
data mining may reveal sensitive information bout the data owners)
Data Miner also face the Privacy-Utility trade-off problem.
The main concern of the Data Miner is HOW to prevent sensitive information from
appearing in the mining result
To perform a privacy-preserving data mining, the Data Miner usually need to modify
the data he got from the Data Collector
30. Based on the distribution of data, PPDM approaches can be classified:
Approaches for Centralized Data Mining
Approaches for Distributed Data Mining
Horizontally partitioned data
Vertically partitioned data
31. With distributed data mining, Secure Multi-party Computation (SMC) widely
used
The goal of SMC to make sure that each participant can get the correct data
mining result without revealing his data to others.
P1, P2, P3, ……….. , Pm Participants
X1, X2, X3, ………. , Xm Data
33. PRIVACY-PRESERVING ASSOCIATION
RULE MINING Privacy-Preserving Association Rule Mining
Finding interesting associations and correlation relationships among large set of data
items (e.g. Basket Analysis)
Some of the rule considered to be sensitive
Generate a sanitized data set (Rule Hiding)
Heuristic distortion approaches
Heuristic blocking approaches
Probabilistic distortion approaches
Reconstruction-based approaches
Hybrid partial hiding (HPH)
Inverse frequent set mining (IFM)
34.
35.
36. PRIVACY-PRESERVING
CLASSIFICATION Privacy-Preserving Classification
Classification : is a form of data analysis that extract models describing important data
classes
Data Classification seen as two-steps:
Step 1: Learning step, classification algorithm is employed to build a classifier (Classification
model).
Step 2: the classifier is used for classification
Classification models :
Decision Tree
Naïve Bayesian Classification
Support Vector Machine
38. Data Miner can modify the original data via randomization, blocking, or
reconstruction. The modification often has negative affect on the utility of the
data.
Data Miner needs to make a balance between privacy and utility. The implication
of privacy and utility vary with the characteristic of data and purpose of the
mining task.
39. The user who makes decisions based on the data
mining results in order to achieve certain goals.
40. Data Database
Data Provider Data Collector Data Minor
Extracted Info.
Information Transmitter
Decision Maker
41. The privacy concerns of the Decision Maker are:
How to prevent unwanted disclosure of sensitive mining result
How to evaluate the credibility of the received mining result.
42. 1ST Issue:
Legal Measures
making a contract with the data miner to forbid the miner from disclosing the mining result to a
third party.
2nd Issue:
The Decision Maker can utilize methodologies from Data Provenance, credibility
analysis of web information, or other related research fields
43. DATA PROVENANCE Data Provenance :
The information that helps determine the derivation history of the data, starting from the original source
Provenance, which describe Where the data come from, and How the data evolved over the time, can
help people to evaluate the credibility of the data.
Provenance contains two kinds of information:
Ancestral data from which current data evolved.
Transformations applied to ancestral data that helped to produce the current data.
However, in most cases provenance of the data mining results is not available
The major approach to present the provenance information is adding annotations to data.
44. WEB INFORMATION CREDIBILITY
Web Information Credibility
Users can differentiate false information from the truth based on :
Authority : the real author of false information is usually not clear
Accuracy: false information does not contain accurate data
Objectivity: false information is often prejudicial
Currency: for false information, the data about its source, time, and place of its origin is
incomplete, out of date or missing
Coverage : false information usually contains no effective links to other information online
45.
46. Game theory provides a formal approach to model situations where a group of
agents have to choose optimum actions considering the mutual effects of other
agents' decisions.
The essential elements of a game are: players, actions, payoffs, and information.
Players have actions that they can perform at designated times in the game. As a
result of the performed actions, players receive payoffs.
47. PRIVATE DATA COLLECTION AND PUBLICATION
In this data collection game, the level of privacy protection has significant influence on
each player's action and payoff.
PRIVACY PRESERVING DISTRIBUTED DATA MINING
SMC-Bases privacy preserving distributed Data Mining
Recommender System
Linear Progression as a non-cooperative game
DATA ANONYMIZATION
48. Game Model :
Define the elements of the game, namely the players, the actions and the payoffs
Determine the type of the game: static or dynamic, complete information or incomplete
information
Solve the game to find equilibriums
Analyze the equilibriums to obtain some implications for practice
49.
50. The Data Collector wants Data Providers to participate in the data mining
activity, i.e. hand over their private data, but the Data Providers may choose to
opt-out because of the privacy concerns. In order to get useful data mining results,
the Data Collector needs to design mechanisms to encourage Data Providers to
opt-in.
Mechanisms for Truthful Data Sharing
A mechanism requires agents to report their preferences over the outcomes.
Privacy Auctions
51.
52. Law and regulations
USA – Privacy Act 1974
European commission – General Data Protection Regulation 2012
Industry conventions.
Agreement between organization to how to collect, analyze, and store personal data,
should help to create Privacy-Safe environment
Enhance the education to increase the awareness of information security
53.
54. Personalized Privacy Preserving
Developing practical personalized anonymization methods.
Introducing Personalize Privacy into other type of PPDP/PPDM.
Data Customization
A concept was introduced for data mining called “Reverse Data Management “ (RDM)
which it is similar to Inverse data mining. RDM covers a lot of Data problems: Inversion
mapping, provenance, data generation, view update, constraint-based repair, etc.
(We may consider RDM to be a family of data customization methods)
Provenance for Data Mining
New techniques and mechanisms that can support Provenance in Data Mining context
should receive more attention.
55.
56. Each user role has its own privacy concerns and approaches to Preserve-Privacy
with maintain the data utility.
57.
58. Lei Xu, Chunxiao Jiang, Jian Wang, Jain Yuan and Young
Ren, Information security in Big Data: Privacy and Data
Mining, Access, IEEE, 2014