SlideShare a Scribd company logo
1 of 46
PolicyReplay: Misconfiguration-Response Queries for Data Breach Reporting Daniel Fabbri*, Kristen LeFevre*, Qiang Zhu+ *University of Michigan +University of Michigan, Dearborn
Breach Reporting Report accesses to unauthorized data Typically, access controls restrict access to sensitive data Unfortunately, access controls are difficult to configure Misconfigurations allow unauthorized data to be accessed 2
Goal Goal:Given a DB, the operations executed on the DB, an incorrect (old) policy and a correct (new) policy,    find all queries that disclosed unauthorized data. 3
Example Medical records are stored in hospital databases Security and privacy of patient records is important Patient data is sensitive (e.g., disease, medication) Access control policy restricts access to medical records When a misconfiguration occurs: Patient information is inappropriately accessed 4
New Legal Requirements For Reporting Medical Data Breaches Health Information Technology for Economic and Clinical Health Act (HITECH) of 2009, USA Expanded security and privacy protections Monetary fine for disclosure of patient data Covered entities (e.g., hospitals) must report breaches New mechanisms are needed to report breaches 5
Outline Motivation Finding Queries That Disclose Unauthorized Data Framework Components Improving Misconfiguration Response Performance Evaluation 6
Finding Queries That Disclose Unauthorized Data What does it mean for a query to be suspicious? How to find these suspicious queries? Straw man approaches: Database Auditing Techniques Annotation/Provenance Techniques 7
Database Auditing Techniques [Agrawal ’04] Applicable for logs with only queries (no updates) During normal execution, record SQL text of all operations At audit time: Auditor specifies sensitive data Retrieve those queries that used sensitive data 8 Sensitive Data Suspicious Query Patients Table SQL Operation Log
Misconfiguration As Audit Problem For misconfigurations, sensitive data is: Data in the DB accessible under incorrect (old) policy,  	but no longer accessible under corrected (new) policy Consider the policies: Old Policy (Patients):   age < 30 New Policy (Patients): age < 18 9 Sensitive Data Suspicious Query Patients Table SQL Operation Log 9 9
Limitations: Data Modifications Old Policy (Patients):   age < 30 New Policy (Patients): age < 18 Temp: No restrictions Sensitive Data Patients Table 10 Temp Table
Limitations: Data Modifications Old Policy (Patients):   age < 30 New Policy (Patients): age < 18 Temp: No restrictions Sensitive Data Patients Table 11 Temp Table
Limitations: Data Modifications Patients Table 12 Old Policy (Patients):   age < 30 New Policy (Patients): age < 18 Temp: No restrictions Sensitive Data Temp Table Not Suspicious Misses the propagation of information!
Annotations/Provenance Techniques During normal execution: Record SQL text of all operations Record the dependencies between rows (e.g., Bhagwat ’04) At audit time: Auditor specifies sensitive data Retrieve those queriesthat use data derived from sensitive data 13
Example (cont.) Sensitive Data Patients Table Temp Table Suspicious Query Suspicious: Accesses a row that depends on sensitive data 14
Annotations/Provenance Techniques Tracks the derivation of data Solves the ‘copy’ problem Limitations: Empty results – no annotations to analyze More generally, lack of a row in a result can disclose info 15
Previous Work Is Not Applicable  Data Modification Operations Explicit flow of information  	(Copy) Implicit flow of information (UPDATE Patients SET age = 999 WHERE disease = flu) Empty Results Information learned from multiple queries 16
Our Solution: The Misconfiguration Response (MR) Query Conceptually, replay the log under the new policy Compare query results from the old and new policy Returns queries that disclosed unauthorized data 17
Misconfiguration-Response Query 	Observation: Unauthorized (sensitive) data did not contribute to the result of the query if: The query’s result is the same when the log is completely replayed under the incorrect and correct policies. If different, no guarantees about the data disclosed 18
Misconfiguration-Response Query Cleanly addresses previously discussed limitations Replaying the log and comparing results captures: Different information learned between the policies Data modifications Empty results and missing rows 19
Misconfiguration-Response Query Naïve Algorithm: Copy the database at time of the misconfiguration  Replay log of operation under the new policy For each query, execute it on the old and new DB: If the query result is not the same under both policies, 	Then mark it as suspicious 20
Outline Motivation Finding Queries That Disclose Unauthorized Data Framework Components Improving Misconfiguration Response Performance Evaluation 21
Framework Components Components should easily integrate into existing DBMS Row-level access control Re-write operations with an added selection condition Restricts users to a subset of rows E.g., Oracle Fine Grained Access Control Operation log Stores all DB events E.g., (username, SQL text) of operations executed on the DB Separate from a recovery log Available in Oracle, SQL Server, DB2 22
Framework Components Temporal (Historical) Databases Create database state that existed at a previous time One possible implementation [Jensen ‘91]: Backlog tables: Append only (inserts & deletes) Additional metadata stored to re-construct database state 23 Patients Table (time = 2) Patients Backlog Table
Performance Considerations Naïve approach can be costly Copy large amounts of data Replay the entire log Execute queries twice (once on the old and new DBs) 24
Outline Motivation Finding Queries That Disclose Unauthorized Data Framework Components Improving Misconfiguration Response Performance Static Pruning Delta Tables Partial and Simultaneous Re-execution Evaluation 25
Static Pruning (Queries Only) Guarantee the query never access unauthorized data Method: Analyze SQL text of the (i) policies and (ii) the query Data-independent analysis Example: 			Old Policy (Patients):   age < 30 			New Policy (Patients): age < 18 26 Prunable
Pruning With Data Modifications	 Static approach is applicable for logs with only queries Example: Old Policy (Patients):   age < 30	 New Policy (Patients): age < 18 27 Sensitive Data Patients Table (Old) Patients Table (New)
Pruning With Data Modifications	 Static approach is applicable for logs with only queries Example: Old Policy (Patients):   age < 30	 New Policy (Patients): age < 18 28 Sensitive Data Patients Table (New) Patients Table (Old) Should Not Be Pruned
Handling Updates Delta Tables: Stores the differences between the old and new backlog tables Set of rows added under old, but not new policy (Delta Minus) Set of rows added under new, but not old policy (Delta Plus) 29 Patients Backlog Table (New) Patients Backlog Table (Old) Patients Delta Minus Backlog Table
Pruning With Data Modifications Can Prune If: Static pruning condition is satisfied All rows from the delta tables are filtered by the query Example (cont.): 30 Not Filtered By Query Not Prunable Patients Delta Minus Backlog Table
Pruning With Data Modifications Can Prune If: Static pruning condition is satisfied All rows from the delta tables are filtered by the query No longer a static pruning condition But, the delta tables are typically smaller than full tables No longer need to copy the database Can use the old DB and the delta tables to create the new DB 31
Re-Execution When an operation cannot be pruned: Re-execute the operation to test if it is suspicious Executing a query on the old and new DBs wastes work E.g., Old and new queries may join the same rows Can we improve re-execution performance? 32
Simultaneous Re-Execution Observation: Same operation, different data (old vs. new) Can we do the shared computation simultaneously? Combine data from old and new databases Flags track origins of each row (new, old, common) Flags used to ensure correctness  33
Partial & Simultaneous Re-Execution Not suspicious if: Only common rows are in the result Partial Re-Execution Stop mid-execution if only common rows exist on a cut  34 Cut In Query Plan For Partial Re-Execution
Outline Motivation Finding Queries That Disclose Unauthorized Data Framework Components Improving Misconfiguration Response Performance Evaluation 35
Implementation Implemented on top of PostgreSQL Constraint solver used for static pruning Goal: Understand how the workload and data affect performance Synthetic data and workload  Parameters: Operation selectivity Select-to-update ratio Misconfiguration size Number of the operations 36
Static Pruning Results(Queries Only) Fewer operations re-executed   500 queries, 1% selectivity, 250 K rows 37 Higher selectivity/larger misconfiguration reduces number of queries pruned
Performance With Updates(Small Misconfiguration) Simultaneous re-execution improves naïve method  500 operations, 0.9 select to update ratio, 250 K rows, 1% selectivity 38
Performance With Updates(Small Misconfiguration) Pruning improves performance for common cases  500 operations, 0.9 select to update ratio, 250 K rows, 1% selectivity 39
Summary of Additional Results Large misconfigurations – naïve approach can be better Cost of tracking differences between old and new is high Pruning is not effective Future Work: Optimizer to choose MR-query method given parameters 40
Summary PolicyReplay Policy misconfigurations are a security concern Existing approaches are not able to find all breaches Presented the misconfiguration response query Optimizations to improve performance 41
Questions? More info at: http://www.eecs.umich.edu/db 42
Backup 43
Annotation Limitation Old Policy (Patients):   age < 30 New Policy (Patients): age < 18 Temp: No restrictions Patients Table 44 Sensitive Data Temp Table Learns that Bob has the flu
Annotation Limitation Old Policy (Patients):   age < 30 New Policy (Patients): age < 18 Temp: No restrictions Patients Table 45 Sensitive Data Temp Table Deletes rows with the same disease
Annotation Limitation Old Policy (Patients):   age < 30 New Policy (Patients): age < 18 Temp: No restrictions Patients Table 46 Sensitive Data Temp Table Learns that someone in Patients table has the flu No annotations in the empty result!

More Related Content

What's hot

Introduction to Oracle Clinical Data Model
Introduction to Oracle Clinical Data ModelIntroduction to Oracle Clinical Data Model
Introduction to Oracle Clinical Data Model
Perficient
 
Building clinical data warehouse for traditional chinese medicine knowledge...
Building clinical data warehouse for traditional chinese medicine   knowledge...Building clinical data warehouse for traditional chinese medicine   knowledge...
Building clinical data warehouse for traditional chinese medicine knowledge...
nurulbahi
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools for
IJDKP
 

What's hot (12)

IRJET- Review on Knowledge Discovery and Analysis in Healthcare using Dat...
IRJET-  	  Review on Knowledge Discovery and Analysis in Healthcare using Dat...IRJET-  	  Review on Knowledge Discovery and Analysis in Healthcare using Dat...
IRJET- Review on Knowledge Discovery and Analysis in Healthcare using Dat...
 
Introduction to Oracle Clinical Data Model
Introduction to Oracle Clinical Data ModelIntroduction to Oracle Clinical Data Model
Introduction to Oracle Clinical Data Model
 
OC Backend_Katalyst HLS
OC Backend_Katalyst HLSOC Backend_Katalyst HLS
OC Backend_Katalyst HLS
 
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEAPPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
 
Building clinical data warehouse for traditional chinese medicine knowledge...
Building clinical data warehouse for traditional chinese medicine   knowledge...Building clinical data warehouse for traditional chinese medicine   knowledge...
Building clinical data warehouse for traditional chinese medicine knowledge...
 
Data Loading and Data Entry_Katalyst HLS
Data Loading and Data Entry_Katalyst HLSData Loading and Data Entry_Katalyst HLS
Data Loading and Data Entry_Katalyst HLS
 
Applying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extractionApplying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extraction
 
Transaction
TransactionTransaction
Transaction
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools for
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
 
An Efficient Approach for Asymmetric Data Classification
An Efficient Approach for Asymmetric Data ClassificationAn Efficient Approach for Asymmetric Data Classification
An Efficient Approach for Asymmetric Data Classification
 

Viewers also liked

010 the vowel sound
010 the vowel sound010 the vowel sound
010 the vowel sound
jomel
 
010 module 1
010 module 1010 module 1
010 module 1
jomel
 
Republic Of The Philippines
Republic Of The  PhilippinesRepublic Of The  Philippines
Republic Of The Philippines
jomel
 
Reading Specialists
Reading SpecialistsReading Specialists
Reading Specialists
guested97426
 
002 vmg os
002 vmg os002 vmg os
002 vmg os
jomel
 
Bead Presentation
Bead PresentationBead Presentation
Bead Presentation
bpstephe
 

Viewers also liked (7)

010 the vowel sound
010 the vowel sound010 the vowel sound
010 the vowel sound
 
010 module 1
010 module 1010 module 1
010 module 1
 
Republic Of The Philippines
Republic Of The  PhilippinesRepublic Of The  Philippines
Republic Of The Philippines
 
Reading Specialists
Reading SpecialistsReading Specialists
Reading Specialists
 
ANHELOS
ANHELOSANHELOS
ANHELOS
 
002 vmg os
002 vmg os002 vmg os
002 vmg os
 
Bead Presentation
Bead PresentationBead Presentation
Bead Presentation
 

Similar to PolicyReplay Talk

1 Week 6 - What Well Be Working On This Week In th.docx
1 Week 6 - What Well Be Working On This Week  In th.docx1 Week 6 - What Well Be Working On This Week  In th.docx
1 Week 6 - What Well Be Working On This Week In th.docx
dorishigh
 

Similar to PolicyReplay Talk (20)

Analyzing Performance Test Data
Analyzing Performance Test DataAnalyzing Performance Test Data
Analyzing Performance Test Data
 
From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)From data lakes to actionable data (adventures in data curation)
From data lakes to actionable data (adventures in data curation)
 
Intro to Data warehousing lecture 10
Intro to Data warehousing   lecture 10Intro to Data warehousing   lecture 10
Intro to Data warehousing lecture 10
 
Oracle Clinical Overview_Katalyst HLS
Oracle Clinical Overview_Katalyst HLSOracle Clinical Overview_Katalyst HLS
Oracle Clinical Overview_Katalyst HLS
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
 
Synthetic Data Generation for Statistical Testing
Synthetic Data Generation for Statistical TestingSynthetic Data Generation for Statistical Testing
Synthetic Data Generation for Statistical Testing
 
analysis plan.ppt
analysis plan.pptanalysis plan.ppt
analysis plan.ppt
 
Standardization of “Safety Drug” Reporting Applications
Standardization of “Safety Drug” Reporting ApplicationsStandardization of “Safety Drug” Reporting Applications
Standardization of “Safety Drug” Reporting Applications
 
IRJET- Medical Data Mining
IRJET- Medical Data MiningIRJET- Medical Data Mining
IRJET- Medical Data Mining
 
System testing
System testingSystem testing
System testing
 
Using Investigative Analytics to Speed New Drugs to Market
Using Investigative Analytics to Speed New Drugs to MarketUsing Investigative Analytics to Speed New Drugs to Market
Using Investigative Analytics to Speed New Drugs to Market
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
 
Oracle Clinical Overview_Katalyst HLS
Oracle Clinical Overview_Katalyst HLSOracle Clinical Overview_Katalyst HLS
Oracle Clinical Overview_Katalyst HLS
 
Painting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDBPainting the Future of Big Data with Apache Spark and MongoDB
Painting the Future of Big Data with Apache Spark and MongoDB
 
1 Week 6 - What Well Be Working On This Week In th.docx
1 Week 6 - What Well Be Working On This Week  In th.docx1 Week 6 - What Well Be Working On This Week  In th.docx
1 Week 6 - What Well Be Working On This Week In th.docx
 
Data Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationData Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data Proliferation
 
GP_Training_Introduction-to-MSA__RevAF.pptx
GP_Training_Introduction-to-MSA__RevAF.pptxGP_Training_Introduction-to-MSA__RevAF.pptx
GP_Training_Introduction-to-MSA__RevAF.pptx
 
CHIME College Live - Anatomy of a Measure
CHIME College Live - Anatomy of a MeasureCHIME College Live - Anatomy of a Measure
CHIME College Live - Anatomy of a Measure
 

PolicyReplay Talk

  • 1. PolicyReplay: Misconfiguration-Response Queries for Data Breach Reporting Daniel Fabbri*, Kristen LeFevre*, Qiang Zhu+ *University of Michigan +University of Michigan, Dearborn
  • 2. Breach Reporting Report accesses to unauthorized data Typically, access controls restrict access to sensitive data Unfortunately, access controls are difficult to configure Misconfigurations allow unauthorized data to be accessed 2
  • 3. Goal Goal:Given a DB, the operations executed on the DB, an incorrect (old) policy and a correct (new) policy, find all queries that disclosed unauthorized data. 3
  • 4. Example Medical records are stored in hospital databases Security and privacy of patient records is important Patient data is sensitive (e.g., disease, medication) Access control policy restricts access to medical records When a misconfiguration occurs: Patient information is inappropriately accessed 4
  • 5. New Legal Requirements For Reporting Medical Data Breaches Health Information Technology for Economic and Clinical Health Act (HITECH) of 2009, USA Expanded security and privacy protections Monetary fine for disclosure of patient data Covered entities (e.g., hospitals) must report breaches New mechanisms are needed to report breaches 5
  • 6. Outline Motivation Finding Queries That Disclose Unauthorized Data Framework Components Improving Misconfiguration Response Performance Evaluation 6
  • 7. Finding Queries That Disclose Unauthorized Data What does it mean for a query to be suspicious? How to find these suspicious queries? Straw man approaches: Database Auditing Techniques Annotation/Provenance Techniques 7
  • 8. Database Auditing Techniques [Agrawal ’04] Applicable for logs with only queries (no updates) During normal execution, record SQL text of all operations At audit time: Auditor specifies sensitive data Retrieve those queries that used sensitive data 8 Sensitive Data Suspicious Query Patients Table SQL Operation Log
  • 9. Misconfiguration As Audit Problem For misconfigurations, sensitive data is: Data in the DB accessible under incorrect (old) policy, but no longer accessible under corrected (new) policy Consider the policies: Old Policy (Patients): age < 30 New Policy (Patients): age < 18 9 Sensitive Data Suspicious Query Patients Table SQL Operation Log 9 9
  • 10. Limitations: Data Modifications Old Policy (Patients): age < 30 New Policy (Patients): age < 18 Temp: No restrictions Sensitive Data Patients Table 10 Temp Table
  • 11. Limitations: Data Modifications Old Policy (Patients): age < 30 New Policy (Patients): age < 18 Temp: No restrictions Sensitive Data Patients Table 11 Temp Table
  • 12. Limitations: Data Modifications Patients Table 12 Old Policy (Patients): age < 30 New Policy (Patients): age < 18 Temp: No restrictions Sensitive Data Temp Table Not Suspicious Misses the propagation of information!
  • 13. Annotations/Provenance Techniques During normal execution: Record SQL text of all operations Record the dependencies between rows (e.g., Bhagwat ’04) At audit time: Auditor specifies sensitive data Retrieve those queriesthat use data derived from sensitive data 13
  • 14. Example (cont.) Sensitive Data Patients Table Temp Table Suspicious Query Suspicious: Accesses a row that depends on sensitive data 14
  • 15. Annotations/Provenance Techniques Tracks the derivation of data Solves the ‘copy’ problem Limitations: Empty results – no annotations to analyze More generally, lack of a row in a result can disclose info 15
  • 16. Previous Work Is Not Applicable Data Modification Operations Explicit flow of information (Copy) Implicit flow of information (UPDATE Patients SET age = 999 WHERE disease = flu) Empty Results Information learned from multiple queries 16
  • 17. Our Solution: The Misconfiguration Response (MR) Query Conceptually, replay the log under the new policy Compare query results from the old and new policy Returns queries that disclosed unauthorized data 17
  • 18. Misconfiguration-Response Query Observation: Unauthorized (sensitive) data did not contribute to the result of the query if: The query’s result is the same when the log is completely replayed under the incorrect and correct policies. If different, no guarantees about the data disclosed 18
  • 19. Misconfiguration-Response Query Cleanly addresses previously discussed limitations Replaying the log and comparing results captures: Different information learned between the policies Data modifications Empty results and missing rows 19
  • 20. Misconfiguration-Response Query Naïve Algorithm: Copy the database at time of the misconfiguration Replay log of operation under the new policy For each query, execute it on the old and new DB: If the query result is not the same under both policies, Then mark it as suspicious 20
  • 21. Outline Motivation Finding Queries That Disclose Unauthorized Data Framework Components Improving Misconfiguration Response Performance Evaluation 21
  • 22. Framework Components Components should easily integrate into existing DBMS Row-level access control Re-write operations with an added selection condition Restricts users to a subset of rows E.g., Oracle Fine Grained Access Control Operation log Stores all DB events E.g., (username, SQL text) of operations executed on the DB Separate from a recovery log Available in Oracle, SQL Server, DB2 22
  • 23. Framework Components Temporal (Historical) Databases Create database state that existed at a previous time One possible implementation [Jensen ‘91]: Backlog tables: Append only (inserts & deletes) Additional metadata stored to re-construct database state 23 Patients Table (time = 2) Patients Backlog Table
  • 24. Performance Considerations Naïve approach can be costly Copy large amounts of data Replay the entire log Execute queries twice (once on the old and new DBs) 24
  • 25. Outline Motivation Finding Queries That Disclose Unauthorized Data Framework Components Improving Misconfiguration Response Performance Static Pruning Delta Tables Partial and Simultaneous Re-execution Evaluation 25
  • 26. Static Pruning (Queries Only) Guarantee the query never access unauthorized data Method: Analyze SQL text of the (i) policies and (ii) the query Data-independent analysis Example: Old Policy (Patients): age < 30 New Policy (Patients): age < 18 26 Prunable
  • 27. Pruning With Data Modifications Static approach is applicable for logs with only queries Example: Old Policy (Patients): age < 30 New Policy (Patients): age < 18 27 Sensitive Data Patients Table (Old) Patients Table (New)
  • 28. Pruning With Data Modifications Static approach is applicable for logs with only queries Example: Old Policy (Patients): age < 30 New Policy (Patients): age < 18 28 Sensitive Data Patients Table (New) Patients Table (Old) Should Not Be Pruned
  • 29. Handling Updates Delta Tables: Stores the differences between the old and new backlog tables Set of rows added under old, but not new policy (Delta Minus) Set of rows added under new, but not old policy (Delta Plus) 29 Patients Backlog Table (New) Patients Backlog Table (Old) Patients Delta Minus Backlog Table
  • 30. Pruning With Data Modifications Can Prune If: Static pruning condition is satisfied All rows from the delta tables are filtered by the query Example (cont.): 30 Not Filtered By Query Not Prunable Patients Delta Minus Backlog Table
  • 31. Pruning With Data Modifications Can Prune If: Static pruning condition is satisfied All rows from the delta tables are filtered by the query No longer a static pruning condition But, the delta tables are typically smaller than full tables No longer need to copy the database Can use the old DB and the delta tables to create the new DB 31
  • 32. Re-Execution When an operation cannot be pruned: Re-execute the operation to test if it is suspicious Executing a query on the old and new DBs wastes work E.g., Old and new queries may join the same rows Can we improve re-execution performance? 32
  • 33. Simultaneous Re-Execution Observation: Same operation, different data (old vs. new) Can we do the shared computation simultaneously? Combine data from old and new databases Flags track origins of each row (new, old, common) Flags used to ensure correctness 33
  • 34. Partial & Simultaneous Re-Execution Not suspicious if: Only common rows are in the result Partial Re-Execution Stop mid-execution if only common rows exist on a cut 34 Cut In Query Plan For Partial Re-Execution
  • 35. Outline Motivation Finding Queries That Disclose Unauthorized Data Framework Components Improving Misconfiguration Response Performance Evaluation 35
  • 36. Implementation Implemented on top of PostgreSQL Constraint solver used for static pruning Goal: Understand how the workload and data affect performance Synthetic data and workload Parameters: Operation selectivity Select-to-update ratio Misconfiguration size Number of the operations 36
  • 37. Static Pruning Results(Queries Only) Fewer operations re-executed 500 queries, 1% selectivity, 250 K rows 37 Higher selectivity/larger misconfiguration reduces number of queries pruned
  • 38. Performance With Updates(Small Misconfiguration) Simultaneous re-execution improves naïve method 500 operations, 0.9 select to update ratio, 250 K rows, 1% selectivity 38
  • 39. Performance With Updates(Small Misconfiguration) Pruning improves performance for common cases 500 operations, 0.9 select to update ratio, 250 K rows, 1% selectivity 39
  • 40. Summary of Additional Results Large misconfigurations – naïve approach can be better Cost of tracking differences between old and new is high Pruning is not effective Future Work: Optimizer to choose MR-query method given parameters 40
  • 41. Summary PolicyReplay Policy misconfigurations are a security concern Existing approaches are not able to find all breaches Presented the misconfiguration response query Optimizations to improve performance 41
  • 42. Questions? More info at: http://www.eecs.umich.edu/db 42
  • 44. Annotation Limitation Old Policy (Patients): age < 30 New Policy (Patients): age < 18 Temp: No restrictions Patients Table 44 Sensitive Data Temp Table Learns that Bob has the flu
  • 45. Annotation Limitation Old Policy (Patients): age < 30 New Policy (Patients): age < 18 Temp: No restrictions Patients Table 45 Sensitive Data Temp Table Deletes rows with the same disease
  • 46. Annotation Limitation Old Policy (Patients): age < 30 New Policy (Patients): age < 18 Temp: No restrictions Patients Table 46 Sensitive Data Temp Table Learns that someone in Patients table has the flu No annotations in the empty result!

Editor's Notes

  1. RBACA reactive mechanism is needed to find breaches in any relational databases
  2. The goal of the policy replay system is a followsLog with updates (data modifciations)
  3. Hospital database store patient medical recordsData is sensitive, for example, lists the medications a person is prescribedBecause of this sensitive nature, access controls are used to restrict access so that each user can only access a specific subset of the database
  4. Data that is accessed, acquired, or disclosed as a result of such breach.
  5. Build on previous workDifferent problem
  6. Model the misconfiguration problem as an audit problemPediatrics doctor example
  7. Next logical approach, for a log with updates, is to track the dependencies between data in the database
  8. Added meta data to track annotations dependencies
  9. Lack of a row in a result can be used to learn informationMINUS
  10. Addresses different problemsHow can we find those queries that accessed unauthorized data
  11. Only sees data visible under correct policyDid not contribute to query result
  12. Deals with updates: different rows added under the old and new policiesEmpty result: If the result is empty under one policy, but not the other, marked as suspicious
  13. Integrate with existing DB easily
  14. Need to go back in time in OLD DB state to compare query results
  15. Optimizations to improve performance
  16. If no updates, only sensitive data in the difference between the policiesFilter out all rows that are accessible under one policy, but not the other – then prune
  17. See the static pruning is not appropriate when the log contains updates (create data initially not contained by difference in policies)Want to expand the pruning condition to manage logs with updates
  18. Intuition: None of the data selected is different between the two policies, thus all the data is common
  19. Intuition: None of the data selected is different between the two policies, thus all the data is commonWhile this condition is no longer static, for many common cases, the size of these tables is smaller than the full tables.
  20. Done in SQLConsideration for join correctness common rows only joined once
  21. Tune parameters to determine under what workloads the optimizations improve MR-query efficiencyNot to produce a real workload
  22. X – size of misconfiguration in terms of percentage of database that is impactedY – time in seconds to evaluate the MR-queryStatic – runs naïve buts prunes queriesSelectivity – randomly selected using uniform distribution40 MB/s off read per query
  23. Ratio - .9, 250 K rows initially, 1% selectivityOrder of magnitude
  24. Ratio - .9, 250 K rows initially, 1% selectivityOrder of magnitude
  25. Need optimizer to choose between approaches to use given workload/misconfiguraiton
  26. Logs with updatesReplay log and compare query results
  27. No annotations in the result to analyzeOther weaknesses, for example when the policy is too restrictive