Data warehouses contain sensitive data that must be secured in two ways: by defining appropriate access rights to the users and by preventing potential data inferences. Inspired from development methods for information systems, the first way of securing a data warehouse has been treated in the literature during the early phases of the development cycle. However, despite the high risks of inferences, the second way is not sufficiently taken into account in the design phase; it is rather left to the administrator of the data warehouse. However, managing inferences during the exploitation phase may induce high maintenance costs and complex OLAP server administration. In this paper, we propose an approach that, starting from the conceptual model of the data sources, assists the designer of the data warehouse in indentifying
multidimensional sensitive data and those that may be subject to inferences.
Securing Data Warehouses: A Semi-automatic Approach for Inference Prevention at the Design Level
1. Securing Data Warehouses:
A Semi-automatic Approach for Inference
Prevention at the Design Level
Salah Triki
Hanene Ben-Abdallah (Mir@cl, University of Sfax)
Nouria Harbi, Omar Boussaid (ERIC, University of Lyon)
1
4. Introduction
• A data warehouse is a collection of data:
– integrated
– subject-oriented
– nonvolatile
– historized
– available for querying and analysis
• A DW can be deployed in various domains:
Commerce, Hospital ...
5. Introduction
• Data warehouses contain:
– Sensitive data
– Some personal/propriatary data
• Legal requirements:
– HIPPA
– GLBA
– Safe Harbor
– Sarbanes-Oxley
• Organizations must comply with these laws
9. Entrepôt de
données
• The types of
inferences :
– Precise
Inference
– Partial Inference
Query Not
Authorized
Data
Authorized
Data
• At the physical level
Securing Data Warehouses
10. • Prevention of inferences at the physical level
[Haibing and al. 2008, Cuzzocrea 2009, Zhang and al. 2011]
can induce :
– high administrative costs
– high maintenance.
• Prevention of inferences at the design level
[Steger and al. 2000, Blanco and al. 2010] :
– do not take into account the potential inferences
from the available data
– specific to a particular application domain.
Securing Data Warehouses
12. • Assumptions :
– The data sources’ class diagram is
available.
– The star schema is already designed.
– The star schema is mapped to the data
sources’ class diagram.
An approach for assisting the design
of secure DW
14. • Inferences Graph : a set of nodes
connected by oriented arcs.
– The nodes represent the data :
●
Node colored in gray : sensitive data
●
Node colored in white : none sensitive data
– The arcs indicate the direction of inference :
●
Solid arc : precise inference
●
Dotted arc : partial inference
B C
A
Inferences graph construction
18. Types of inferences
• The automatic construction of the
inferences graph does not indicate the
type of inferences: partial or precise.
• The indication cannot be, unfortunately,
deducted automatically.
• The security designer must distinguish
partial inferences (drawn by dotted arcs).
19. Detection of new inferences
A
B C
D E
• Calculation of the transitive closure
Partial path Precise path
20. Enrichment of the star schema
A
B C
D E
Partial path Precise path
<<Partial Inference : D:A>>
<<Precise Inference : E:A>>
<<Sensitive Data >>
26. •Inference type specification
Example
<< Partial Inference : Date : Illness>>
<< Partial Inference : Time : Illness>>
<< Sensitive Data >>
<<Partial Inference : Transfer :Critical Illness>>
28. • An approach to produce a conceptual
multidimensional model annotated with
information for inference prevention:
– A graph of inferences based on the class
diagram of data sources.
– The class diagram allows us to identify the
elements to lead to precise/partial inferences.
• Studying how to transfer to the logical level
the annotations defined at the design level.
Conclusion