FARSHAD BADIE , Computer Science MSC UNIVERSITY OF DEBRECEN
Database Security Overview Database security concerns the use of a broad range of information security controls to protect databases against compromises of their confidentiality, integrity and availability. It involves various types or categories of controls, such as technical, procedural/administrative and physical. Database Security is a specialist topic within the broader realms of Computer Security, Information Security and Risk Management.
Introduction to OLAP & Data Warehouses Nowadays, On-Line Analytical Processing (OLAP) systems and Data Warehouses are used to store all information of an organization. The Security of “OLAP” and “Data Warehouses” is crucial to the interest of both organizations and individuals. OLAP systems help analysts to gain insights to different perspectives to large amount of data stored. Due to difference in data models, access control techniques in traditional data management systems , are usually not directly applicable to these types of systems.
Task • Using Enterprise Data • Stores data collected from multiple data sources, such as Transactional DBData Warehouse throughout an organization Background Description • The data are organized based on a Star Schema; which has a fact table with part of the attributes called dimensions and rest called measures. In fact, each dimension is associated with a dimension table indicating a dimension hierarchy.
Task • Analyzing business data collected from daily transactions. Main purpose: • To enable analysts to construct a mental image about the underlying data by exploring it from differentBackground perspectives . Architectures: OLAP 1. ROLAP(Relation OLAP): Provides a front-end tool that translates multi-dimensional queries into corresponding SQL queries to be processed by the relational backend. 2. MOLAP(Multi Dimensional OLAP): Instead of relying on the relational model, materializes the multi-dimensional views. 3. HOLAP (Hybrid OLAP): Using MOLAP for dense parts of the data, and ROLAP for the others.
Analyzing OLAPThe OLAP system must be highly efficient in answering queries OLAP organizes and generalizes data along multiple dimensions and dimension hierarchies In order to make an abstract model for this purpose, we define Data Cube Model. It’s proposed as a SQL operator to support commonOLAP tasks. Even Though such tasks are usually possiblewith standard SQL queries, the queries may become very complex. Such a complex query may lead to poor performance. So, we define this new operator.
Related Work In relational databases, accesses to sensitive data are regulated based on various models: o Discretional Access Control (DAC): Uses owner-specified grants and revokes to achieve an owner-centric control of objects. o Role-based Access Control (RBAC): Simplifies access control tasks by introducing and intermediate tier of roles that aggregates and bridges users and permissions. o Flexible Access Control Framework (FAF): Provides a universal solution to handling conflicts in access control policies through authorization derivation and conflict resolution logic rules. “The proposed methods can roughly be classified into Restriction-based techniques and Perturbation-based Techniques”
Inference Control methods Restriction-based Perturbation-based Prevent malevolent Prevent Inference by inferences by denying unsafe inserting random noises to queries 1. Sensitive data to answer of Determines the safety of queries , queries based on … Or1. The minimal number of 2. Database structure values aggregated by different queries Have been proposed for2. The maximal number of preserving privacy in data common values aggregated mining by different queries3. The maximal rank of a matrix representing answered queries
And … Cell Suppression To protect census data released in statistical tables, the cells that contains small COUNT values are suppressed, and possible inferences of the suppressed cells are then detected and removed using linear programming-based technologies Partitioning Defines a partition on sensitive data and restricts queries to aggregate only complete blocks in the partition
The Threat of InferencesUnlike in traditional databases where unauthorized accessesare the main security concern, a rival using an OLAP systemcan more easily infer prohibited data from answers tolegitimate queries. The Requirements In OLAP systems, we need to combine “access control”and “Interface control” to remove security threats and find agood solution. Providing security should not adversely reduce theusefulness of DW and OLAP systems
The main challenge lies in the inherent trade off between these objects … Security: Sensitive data stored in underlying DW should be guarded from both unauthorized accesses and malicious inferences. Applicability: The security provided by a solution should not rely on any unrealistic assumptions abut OLAP systems. Efficiency: The nature of OLAP systems is “interactive”. A desired security must be computationally efficient, especially with respect to on-line overhead. Availability: Data should be readily available to legal users who have sufficient privileges . Practically: A practical security solution should not demand significant modifications to the existing infrastructure of an OLAP system.
A three-tier Security Architecture Security in statistical databases usually has two tiers: Sensitive Data Aggregation Queries Applying such a two-tier architecture to OLAP systems has some inherent drawbacks: 1. Checking queries for inferences at run time may bring unacceptable delay to query processing. 2. Inference control methods cannot take advantage of the special characteristics of an OLAP application under the 2-tier architecture. We define the three-tier architecture in order to have Access Control between the fist and the second tier, and Inference Control between the second and the third tier.
User Queries (Q) RAQ Access ControlPre-defined Aggregation (A) RDA Inference Control This Architecture helps to reduce the performance Data Set overhead of inference control in several aspects … (D) 1. Aggregation tier can be pre-computed (The computation-intensive part of inference control can be shifted to off-line processing) 2. The On-line part: To enforce access control ( If a query can be rewritten using the aggregation tier) 3. Both of previous reduce the size of inputs to inference control algorithms and consequently reduce the complexity
SUM-only Data Cubes As an inherited limitation, only SUMs are considered; Moreover, only the core cuboid is considered as sensitive. Improved results can be obtained by exploring the uniquestructures of data cubes. The dependency relation can be modeled as linear equations.
Cardinality-Based Method (A sample method based on described methodology)Specification Determines the existence of inferences based on the number of answered queries. Aggregations are pre-defined based on the dimension hierarchy. The Queries are limited to data cube cells. Here, we only consider one level dimension hierarchy where each dimension can only have two attributes. (Attribute in core cuboids and the all) We need only consider the values that appear in at least one non-empty cell in the given data cube instance. The value in any non-empty cell is unknown, hence the cell is denoted by an unknown variable.
Generic Data Cubes The previous method can only deal with SUM-only data cubes, which is a limitation inherited from statistical databases. It has been shown that even to detect inferences caused by queries involving both MAXs and SUMs is intractable. We are going to enable the method to deal with data cubes with generic aggregation types. (It does not directly detect inferences)
Access Control Limiting access control to the core cuboids is not always appropriate. Values in aggregation cuboids may also carry sensitive information. The requirement may make the values in both the core cuboids and the aggregation cuboids. The data cuboids will be partitioned along our dependency. • Also, sometimes the data cube should be partitioned along different dimensions.
To meet such a security requirements …We describe a framework for specifying authorizationobjects in data cubes.The object specifying satisfies the following desiredproperty … 1. For any cell in an object, the object will also include all the ancestors of that cell. 2. Ancestors of a sensitive cell contain more detailed information and should also be regarded as sensitive.
Lattice-based inference method We have these kinds of basic decision rule sets, those are respectively produced from the formal concept lattice … The dual formal concept lattice The object-oriented concept lattice The attribute-oriented concept lattice in a formal context Based on these sets, two types of decision inferences are established via an inclusion degree. The corresponding decisions by inferences are proved to be the lower and the upper approximated decisions. Thus total decision rules described as the lower and the upper approximated decision rules are obtained. And they are accordant and consistent with the basic decision rule sets.
Lattice-based inference method This method can be implemented based on the 3-tier inference control model The authorization object computed through the above iterative process comprises the data tier The complement of the object is the aggregation tier since it does not cause any inferences to the data tier The first property of the 3-tier model is satisfied , because the number of cuboids is constant compared to the number of cells, and hence the size of the aggregation tier must be polynomial in the size of the data tier Because the aggregation tier is a collection of descendant closures of single cuboids, the aggregation tier naturally forms a partition on the data tier, satisfying the 2nd property. The aggregation tier apparently satisfies the last property
Conclusion We have argued that the most challenging security treat lies in that sensitive data stored in a data warehouse may be disclosed through seemingly innocent OLAP queries We then described two different methodologies specifically proposed for securing OLAP data cubes, A. First one by existing inference control methods in statistical databases B. The second one is aimed to remove many limitations of the previous one These two types of methods could be implemented on the basis of a three-tier inference control architecture that is specifically suitable for OLAP systems