2. Breach Reporting Report accesses to unauthorized data Typically, access controls restrict access to sensitive data Unfortunately, access controls are difficult to configure Misconfigurations allow unauthorized data to be accessed 2
3. Goal Goal:Given a DB, the operations executed on the DB, an incorrect (old) policy and a correct (new) policy, find all queries that disclosed unauthorized data. 3
4. Example Medical records are stored in hospital databases Security and privacy of patient records is important Patient data is sensitive (e.g., disease, medication) Access control policy restricts access to medical records When a misconfiguration occurs: Patient information is inappropriately accessed 4
5. New Legal Requirements For Reporting Medical Data Breaches Health Information Technology for Economic and Clinical Health Act (HITECH) of 2009, USA Expanded security and privacy protections Monetary fine for disclosure of patient data Covered entities (e.g., hospitals) must report breaches New mechanisms are needed to report breaches 5
6. Outline Motivation Finding Queries That Disclose Unauthorized Data Framework Components Improving Misconfiguration Response Performance Evaluation 6
7. Finding Queries That Disclose Unauthorized Data What does it mean for a query to be suspicious? How to find these suspicious queries? Straw man approaches: Database Auditing Techniques Annotation/Provenance Techniques 7
8. Database Auditing Techniques [Agrawal ’04] Applicable for logs with only queries (no updates) During normal execution, record SQL text of all operations At audit time: Auditor specifies sensitive data Retrieve those queries that used sensitive data 8 Sensitive Data Suspicious Query Patients Table SQL Operation Log
9. Misconfiguration As Audit Problem For misconfigurations, sensitive data is: Data in the DB accessible under incorrect (old) policy, but no longer accessible under corrected (new) policy Consider the policies: Old Policy (Patients): age < 30 New Policy (Patients): age < 18 9 Sensitive Data Suspicious Query Patients Table SQL Operation Log 9 9
10. Limitations: Data Modifications Old Policy (Patients): age < 30 New Policy (Patients): age < 18 Temp: No restrictions Sensitive Data Patients Table 10 Temp Table
11. Limitations: Data Modifications Old Policy (Patients): age < 30 New Policy (Patients): age < 18 Temp: No restrictions Sensitive Data Patients Table 11 Temp Table
12. Limitations: Data Modifications Patients Table 12 Old Policy (Patients): age < 30 New Policy (Patients): age < 18 Temp: No restrictions Sensitive Data Temp Table Not Suspicious Misses the propagation of information!
13. Annotations/Provenance Techniques During normal execution: Record SQL text of all operations Record the dependencies between rows (e.g., Bhagwat ’04) At audit time: Auditor specifies sensitive data Retrieve those queriesthat use data derived from sensitive data 13
14. Example (cont.) Sensitive Data Patients Table Temp Table Suspicious Query Suspicious: Accesses a row that depends on sensitive data 14
15. Annotations/Provenance Techniques Tracks the derivation of data Solves the ‘copy’ problem Limitations: Empty results – no annotations to analyze More generally, lack of a row in a result can disclose info 15
16. Previous Work Is Not Applicable Data Modification Operations Explicit flow of information (Copy) Implicit flow of information (UPDATE Patients SET age = 999 WHERE disease = flu) Empty Results Information learned from multiple queries 16
17. Our Solution: The Misconfiguration Response (MR) Query Conceptually, replay the log under the new policy Compare query results from the old and new policy Returns queries that disclosed unauthorized data 17
18. Misconfiguration-Response Query Observation: Unauthorized (sensitive) data did not contribute to the result of the query if: The query’s result is the same when the log is completely replayed under the incorrect and correct policies. If different, no guarantees about the data disclosed 18
19. Misconfiguration-Response Query Cleanly addresses previously discussed limitations Replaying the log and comparing results captures: Different information learned between the policies Data modifications Empty results and missing rows 19
20. Misconfiguration-Response Query Naïve Algorithm: Copy the database at time of the misconfiguration Replay log of operation under the new policy For each query, execute it on the old and new DB: If the query result is not the same under both policies, Then mark it as suspicious 20
21. Outline Motivation Finding Queries That Disclose Unauthorized Data Framework Components Improving Misconfiguration Response Performance Evaluation 21
22. Framework Components Components should easily integrate into existing DBMS Row-level access control Re-write operations with an added selection condition Restricts users to a subset of rows E.g., Oracle Fine Grained Access Control Operation log Stores all DB events E.g., (username, SQL text) of operations executed on the DB Separate from a recovery log Available in Oracle, SQL Server, DB2 22
23. Framework Components Temporal (Historical) Databases Create database state that existed at a previous time One possible implementation [Jensen ‘91]: Backlog tables: Append only (inserts & deletes) Additional metadata stored to re-construct database state 23 Patients Table (time = 2) Patients Backlog Table
24. Performance Considerations Naïve approach can be costly Copy large amounts of data Replay the entire log Execute queries twice (once on the old and new DBs) 24
25. Outline Motivation Finding Queries That Disclose Unauthorized Data Framework Components Improving Misconfiguration Response Performance Static Pruning Delta Tables Partial and Simultaneous Re-execution Evaluation 25
26. Static Pruning (Queries Only) Guarantee the query never access unauthorized data Method: Analyze SQL text of the (i) policies and (ii) the query Data-independent analysis Example: Old Policy (Patients): age < 30 New Policy (Patients): age < 18 26 Prunable
27. Pruning With Data Modifications Static approach is applicable for logs with only queries Example: Old Policy (Patients): age < 30 New Policy (Patients): age < 18 27 Sensitive Data Patients Table (Old) Patients Table (New)
28. Pruning With Data Modifications Static approach is applicable for logs with only queries Example: Old Policy (Patients): age < 30 New Policy (Patients): age < 18 28 Sensitive Data Patients Table (New) Patients Table (Old) Should Not Be Pruned
29. Handling Updates Delta Tables: Stores the differences between the old and new backlog tables Set of rows added under old, but not new policy (Delta Minus) Set of rows added under new, but not old policy (Delta Plus) 29 Patients Backlog Table (New) Patients Backlog Table (Old) Patients Delta Minus Backlog Table
30. Pruning With Data Modifications Can Prune If: Static pruning condition is satisfied All rows from the delta tables are filtered by the query Example (cont.): 30 Not Filtered By Query Not Prunable Patients Delta Minus Backlog Table
31. Pruning With Data Modifications Can Prune If: Static pruning condition is satisfied All rows from the delta tables are filtered by the query No longer a static pruning condition But, the delta tables are typically smaller than full tables No longer need to copy the database Can use the old DB and the delta tables to create the new DB 31
32. Re-Execution When an operation cannot be pruned: Re-execute the operation to test if it is suspicious Executing a query on the old and new DBs wastes work E.g., Old and new queries may join the same rows Can we improve re-execution performance? 32
33. Simultaneous Re-Execution Observation: Same operation, different data (old vs. new) Can we do the shared computation simultaneously? Combine data from old and new databases Flags track origins of each row (new, old, common) Flags used to ensure correctness 33
34. Partial & Simultaneous Re-Execution Not suspicious if: Only common rows are in the result Partial Re-Execution Stop mid-execution if only common rows exist on a cut 34 Cut In Query Plan For Partial Re-Execution
35. Outline Motivation Finding Queries That Disclose Unauthorized Data Framework Components Improving Misconfiguration Response Performance Evaluation 35
36. Implementation Implemented on top of PostgreSQL Constraint solver used for static pruning Goal: Understand how the workload and data affect performance Synthetic data and workload Parameters: Operation selectivity Select-to-update ratio Misconfiguration size Number of the operations 36
37. Static Pruning Results(Queries Only) Fewer operations re-executed 500 queries, 1% selectivity, 250 K rows 37 Higher selectivity/larger misconfiguration reduces number of queries pruned
38. Performance With Updates(Small Misconfiguration) Simultaneous re-execution improves naïve method 500 operations, 0.9 select to update ratio, 250 K rows, 1% selectivity 38
39. Performance With Updates(Small Misconfiguration) Pruning improves performance for common cases 500 operations, 0.9 select to update ratio, 250 K rows, 1% selectivity 39
40. Summary of Additional Results Large misconfigurations – naïve approach can be better Cost of tracking differences between old and new is high Pruning is not effective Future Work: Optimizer to choose MR-query method given parameters 40
41. Summary PolicyReplay Policy misconfigurations are a security concern Existing approaches are not able to find all breaches Presented the misconfiguration response query Optimizations to improve performance 41
44. Annotation Limitation Old Policy (Patients): age < 30 New Policy (Patients): age < 18 Temp: No restrictions Patients Table 44 Sensitive Data Temp Table Learns that Bob has the flu
45. Annotation Limitation Old Policy (Patients): age < 30 New Policy (Patients): age < 18 Temp: No restrictions Patients Table 45 Sensitive Data Temp Table Deletes rows with the same disease
46. Annotation Limitation Old Policy (Patients): age < 30 New Policy (Patients): age < 18 Temp: No restrictions Patients Table 46 Sensitive Data Temp Table Learns that someone in Patients table has the flu No annotations in the empty result!
Editor's Notes
RBACA reactive mechanism is needed to find breaches in any relational databases
The goal of the policy replay system is a followsLog with updates (data modifciations)
Hospital database store patient medical recordsData is sensitive, for example, lists the medications a person is prescribedBecause of this sensitive nature, access controls are used to restrict access so that each user can only access a specific subset of the database
Data that is accessed, acquired, or disclosed as a result of such breach.
Build on previous workDifferent problem
Model the misconfiguration problem as an audit problemPediatrics doctor example
Next logical approach, for a log with updates, is to track the dependencies between data in the database
Added meta data to track annotations dependencies
Lack of a row in a result can be used to learn informationMINUS
Addresses different problemsHow can we find those queries that accessed unauthorized data
Only sees data visible under correct policyDid not contribute to query result
Deals with updates: different rows added under the old and new policiesEmpty result: If the result is empty under one policy, but not the other, marked as suspicious
Integrate with existing DB easily
Need to go back in time in OLD DB state to compare query results
Optimizations to improve performance
If no updates, only sensitive data in the difference between the policiesFilter out all rows that are accessible under one policy, but not the other – then prune
See the static pruning is not appropriate when the log contains updates (create data initially not contained by difference in policies)Want to expand the pruning condition to manage logs with updates
Intuition: None of the data selected is different between the two policies, thus all the data is common
Intuition: None of the data selected is different between the two policies, thus all the data is commonWhile this condition is no longer static, for many common cases, the size of these tables is smaller than the full tables.
Done in SQLConsideration for join correctness common rows only joined once
Tune parameters to determine under what workloads the optimizations improve MR-query efficiencyNot to produce a real workload
X – size of misconfiguration in terms of percentage of database that is impactedY – time in seconds to evaluate the MR-queryStatic – runs naïve buts prunes queriesSelectivity – randomly selected using uniform distribution40 MB/s off read per query
Ratio - .9, 250 K rows initially, 1% selectivityOrder of magnitude
Ratio - .9, 250 K rows initially, 1% selectivityOrder of magnitude
Need optimizer to choose between approaches to use given workload/misconfiguraiton
Logs with updatesReplay log and compare query results
No annotations in the result to analyzeOther weaknesses, for example when the policy is too restrictive