Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
LDV: Light-weight
DatabaseVirtualization
Quan Pham2,Tanu Malik1, Boris Glavic3 and Ian Foster1,2
Computation Institute1and...
Share and Reproduce
Alice wants to share her models and
simulation output with Bob, and Bob wants to
re-execute Alice’s ap...
Significance
|reportingchecklistforlifesciencesarticles
1. Howwasthesamplesizechosentoensureadequatepower
todetectapre-spec...
Alice’s Options
1.A tar and gzip
2. Submit to a repository
3. Build website with code, parameters, and data
4. Create a vi...
Bob’s Frustration
1-3. I do not find the lib.so required for
building the model.
4. How do I?
Lack of easy and efficient met...
ApplicationVirtualization
Alice’s Machine Bob’s Machine
Thursday, April 23, 15
ApplicationVirtualization
Alice’s Machine Bob’s Machine
Thursday, April 23, 15
ApplicationVirtualization for
DB Applications
Application
Operating System
File System File System
Slice
Pkg
Copy
AV
Alice...
ApplicationVirtualization for
DB Applications
Application
Operating System
File System File System
Slice
Pkg
Copy
AV
Alice...
ApplicationVirtualization for
DB Applications
• Applications that interact with a relational database
• Examples:
• Text-m...
ApplicationVirtualization for
DB Applications
• Applications that interact with a relational database
• Examples:
• Text-m...
Why doesn’t it work?
• Application virtualization methods are
oblivious to semantics of data in a database
system
• The da...
LDV: Light-weight
DatabaseVirtualization
• Goal: Easily and efficiently share and repeat
DB applications.
Thursday, April 2...
Key Ideas
• DB application = Application (OS) part + DB part
• Use data provenance to capture interactions from/to the
app...
Related Work
• Application virtualization
• Linux Containers, CDE[Usenix’11]
• Packaging with annotations
• Docker
• Packa...
How does LDV work?
Application
Operating System
File System
DB Server
Execution
Trace
DB Server
DB Slice
File System
Slice...
• Redirecting file access
• Redirecting DB access
• Server-included packages
• Server-excluded packages
File System
Bob's
C...
Example
Alice:~$ ldv-audit app.sh
Application package created as app-pkg
Alice:~$ ls
app-pkg app.sh src data
Alice:~$echo ...
LDV Issues
• Monitoring system calls
• Monitoring SQL
• Execution traces
• Relevant DB slices
• Redirecting file access
• S...
LDV Issues
• Monitoring system calls
• Monitoring SQL
• Execution traces
• Relevant DB slices
• Redirecting file access
• S...
An Execution Trace
A B
P1
Insert1
Insert2
t1
t2
t3
Query
t4
t5
P2
C
[1, 6] [7, 8]
[5, 5]
[8, 8]
[5, 5]
[5, 5]
[8, 8]
[9, 9...
Data Dependencies from Provenance
Systems
t1
t2
t3
Query
t4
t5
P2
C
[9, 9]
[9, 9]
[9, 9]
[9, 9]
[9, 9]
[9, 9]
[9, 9]
[7, 1...
Can we use temporal annotations and
known direct data dependencies to infer
a sound and complete set of
dependencies that ...
Axioms for
Dependency Inference
• no direct data dependencies implies there
is no data flow
• state of node at point in tim...
Inferring Dependencies
(a) No Dependency between C and A
A P1 B P2 C[2, 3] [6, 7] [1, 5] [6, 6]
(b) C depends on A at time...
Experiments
• 3 Metrics
• Performance
• Usability
• Generality
1e-05
0.0001
Test
Prepare
Inserts First
Select
Other
Select...
TPC-H Queries
• Most TPC-H queries touch large fractions
of tables
• Modified by varying parameters and
selectivity
TABLE I...
Size Comparison
10
100
1000
10000
1-1 1-2 1-3 1-4 1-5 2-1 2-2 2-3 2-4 3-1 3-2 3-3 3-4 4-1 4-2 4-3 4-4 4-5
Packagesize(MB)
...
Audit and Replay1e-05
0.0001
0.001
0.01
0.1
Inserts First
Select
Other
Selects
Updates
Executio
1e-05
0.0001
0.001
0.01
0....
Summary
• LDV permits sharing and repeating DB
applications
• LDV combines OS and DB provenance to
determine file and DB sl...
Q&A
• ?
Thursday, April 23, 15
Inferring Dependencies
s:
ncies
nter-
ns of
entity
n the
nance
ncies,
h do
there
C).
e file
(a) No Dependency between C and...
Upcoming SlideShare
Loading in …5
×

LDV: Light-weight Database Virtualization

540 views

Published on

Presented at ICDE 2015

Published in: Technology
  • Be the first to comment

  • Be the first to like this

LDV: Light-weight Database Virtualization

  1. 1. LDV: Light-weight DatabaseVirtualization Quan Pham2,Tanu Malik1, Boris Glavic3 and Ian Foster1,2 Computation Institute1and Department of Computer Science2,3 University of Chicago1,2,Argonne National Laboratory1 Illinois Institute of Technology3 Thursday, April 23, 15
  2. 2. Share and Reproduce Alice wants to share her models and simulation output with Bob, and Bob wants to re-execute Alice’s application to validate her inputs and outputs. Alice Bob Thursday, April 23, 15
  3. 3. Significance |reportingchecklistforlifesciencesarticles 1. Howwasthesamplesizechosentoensureadequatepower todetectapre-specifiedeffectsize? Foranimalstudies,includeastatementaboutsamplesize estimateevenifnostatisticalmethodswereused. 2. Describeinclusion/exclusioncriteriaifsamplesoranimalswere excludedfromtheanalysis.Werethecriteriapre-established? 3. Ifamethodofrandomizationwasusedtodeterminehow samples/animalswereallocatedtoexperimentalgroupsand processed,describeit. Foranimalstudies,includeastatementaboutrandomization evenifnorandomizationwasused. 4. Iftheinvestigatorwasblindedtothegroupallocationduring theexperimentand/orwhenassessingtheoutcome,state theextentofblinding. Foranimalstudies,includeastatementaboutblindingeven ReportingChecklistForLifeSciencesArticles Thischecklistisusedtoensuregoodreportingstandardsandtoimprovethereproducibilityofpublishedresults.Formoreinformation, pleasereadReportingLifeSciencesResearch. Figurelegends Eachfigurelegendshouldcontain,foreachpanelwheretheyarerelevant: theexactsamplesize(n)foreachexperimentalgroup/condition,givenasanumber,notarange; a description of the sample collection allowing the reader to understand whether the samples represent technical or biological replicates(includinghowmanyanimals,litters,cultures,etc.); astatementofhowmanytimestheexperimentshownwasreplicatedinthelaboratory; definitionsofstatisticalmethodsandmeasures: ○ verycommontests,suchast-test,simpleχ2 tests,WilcoxonandMann-Whitneytests,canbeunambiguouslyidentifiedbynameonly, butmorecomplextechniquesshouldbedescribedinthemethodssection; ○ aretestsone-sidedortwo-sided? ○ arethereadjustmentsformultiplecomparisons? ○ statisticaltestresults,e.g.,Pvalues; ○ definitionof‘centervalues’asmedianoraverage; ○ definitionoferrorbarsass.d.ors.e.m. Anydescriptionstoolongforthefigurelegendshouldbeincludedinthemethodssection. Pleaseensurethattheanswerstothefollowingquestionsarereportedinthemanuscriptitself.Weencourageyoutoincludeaspecific subsectioninthemethodssectionforstatistics,reagentsandanimalmodels.Below,providethepagenumber(s)orfigurelegend(s) wheretheinformationcanbelocated. Statisticsandgeneralmethods Reportedonpage(s)orfigurelegend(s): CorrespondingAuthorName: ________________________________________ ManuscriptNumber: ______________________________ Metrics aims to improve the reproducibility of scientific research. NY Times, Dec, 2014 Thursday, April 23, 15
  4. 4. Alice’s Options 1.A tar and gzip 2. Submit to a repository 3. Build website with code, parameters, and data 4. Create a virtual machine Thursday, April 23, 15
  5. 5. Bob’s Frustration 1-3. I do not find the lib.so required for building the model. 4. How do I? Lack of easy and efficient methods for sharing and reproducibility Amount of pain Bob suffers Amount of pain Alice suffers Thursday, April 23, 15
  6. 6. ApplicationVirtualization Alice’s Machine Bob’s Machine Thursday, April 23, 15
  7. 7. ApplicationVirtualization Alice’s Machine Bob’s Machine Thursday, April 23, 15
  8. 8. ApplicationVirtualization for DB Applications Application Operating System File System File System Slice Pkg Copy AV Alice's Computer chdir(“/usr”) open (“lib/libc.so.6”) Thursday, April 23, 15
  9. 9. ApplicationVirtualization for DB Applications Application Operating System File System File System Slice Pkg Copy AV Alice's Computer chdir(“/usr”) open (“lib/libc.so.6”)DB Server Thursday, April 23, 15
  10. 10. ApplicationVirtualization for DB Applications • Applications that interact with a relational database • Examples: • Text-mining applications that download data, preprocess and insert into a personal DB • Analysis scripts using parts of a hosted database Application Operating System File System File System Slice Pkg Copy AV Alice's Computer chdir(“/usr”) open (“lib/libc.so.6”)DB Server Thursday, April 23, 15
  11. 11. ApplicationVirtualization for DB Applications • Applications that interact with a relational database • Examples: • Text-mining applications that download data, preprocess and insert into a personal DB • Analysis scripts using parts of a hosted database Application Operating System File System File System Slice Pkg Copy AV Alice's Computer chdir(“/usr”) open (“lib/libc.so.6”)DB Server Thursday, April 23, 15
  12. 12. Why doesn’t it work? • Application virtualization methods are oblivious to semantics of data in a database system • The database state at the time of sharing the application may not be the same as the start of the application ared among multiple users and Thus, to re-execute an applica- as of the start of the application, to understand a shared applica- application provenance are well these two types of provenance ned methods - companion web- cation virtualization - addresses o automatic mechanism for cap- on and DB provenance, these s for determining which data is they do not solve the issue of vious state, and do not address ring the binaries of commercial irtualization is currently limited not communicate to server pro- r or a database server. In fact, nicates with a database server, ord the communication between ver. This is not sufficient for sed by the application (and, thus, ackage) and to be able to reset re application execution started. a solution for the later problem, share with Bob (Figure 1). Alice would preferably like to share this application in the form of package P with Bob, who may want to re-execute the application in its entirety or may want to validate, just the analysis task, or provide his own data inputs to examine the analysis result. If Alice wants Bob to re-execute and build upon her database application, then Bob must have access to an en- vironment that consists of application binaries and data, any extension modules that the code depends upon (e.g., dynam- ically linked libraries), a database server and a database on which the application can be re-executed. Ideally, it would be useful if Alice’s environment can be virtualized and thus automatically set up for Bob. P3 P4Other experiments f1 P1 Insert t1 t2 t3 Query P2 t4 f2 Alice’s experiment Database Fig. 1: Alice’s experiment with processes P1 and P2 uses tupleThursday, April 23, 15
  13. 13. LDV: Light-weight DatabaseVirtualization • Goal: Easily and efficiently share and repeat DB applications. Thursday, April 23, 15
  14. 14. Key Ideas • DB application = Application (OS) part + DB part • Use data provenance to capture interactions from/to the application side to the database side • Limited formal mechanisms so far to combine the two kinds of provenance models • Create a virtualized package that can be re- executed • Either include the server and data, or replay interactions (for licensed databases) • No virtualization mechanism for database replay Thursday, April 23, 15
  15. 15. Related Work • Application virtualization • Linux Containers, CDE[Usenix’11] • Packaging with annotations • Docker • Packaging with provenance • PTU1[TaPP’13], ReproZip[TaPP’13], Research Objects • Unified provenance models • based on program instrumentation [TaPP’12] 1 Q. Pham,T. Malik, and I. Foster. Using provenance for repeatability. In Theory and Practice of Provenance (TaPP), 2013. Thursday, April 23, 15
  16. 16. How does LDV work? Application Operating System File System DB Server Execution Trace DB Server DB Slice File System Slice Pkg CopyLDV Alice's Computer Alice’s Machine ldv-audit db-app • Monitoring system calls • Monitoring SQL • Server-included packages • Server-excluded packages • Execution traces • Relevant DB and filesystem slices Thursday, April 23, 15
  17. 17. • Redirecting file access • Redirecting DB access • Server-included packages • Server-excluded packages File System Bob's ComputerUser Application Operating System DB Server Execution Trace DB Server DB Slice File System Slice Pkg LDV Redirect Bob’s Machine ldv-exec db-app How does LDV work? Thursday, April 23, 15
  18. 18. Example Alice:~$ ldv-audit app.sh Application package created as app-pkg Alice:~$ ls app-pkg app.sh src data Alice:~$echo "Hi Bob, Please find the pkg --Alice" | mutt -s "Sharing DB Application -a "./app-pkg" -- bob-vldb2015@gmail.com Bob:~$ ls . app-pkg Bob:~$ cd app-pkg Bob:~$ ls app.sh src data Bob:~$ldv-exec app.sh Running app-pkg.... Ubuntu 14.04 (Kernel 3.13) + Postgres 9.1 CentOS 6.2 (Kernel 2.6.32) + MySQL Thursday, April 23, 15
  19. 19. LDV Issues • Monitoring system calls • Monitoring SQL • Execution traces • Relevant DB slices • Redirecting file access • Server-included packages • Server-excluded packages • Redirecting DB access Thursday, April 23, 15
  20. 20. LDV Issues • Monitoring system calls • Monitoring SQL • Execution traces • Relevant DB slices • Redirecting file access • Server-included packages • Server-excluded packages • Redirecting DB access Thursday, April 23, 15
  21. 21. An Execution Trace A B P1 Insert1 Insert2 t1 t2 t3 Query t4 t5 P2 C [1, 6] [7, 8] [5, 5] [8, 8] [5, 5] [5, 5] [8, 8] [9, 9] [9, 9] [9, 9] [9, 9] [9, 9] [9, 9] [9, 9] [7, 12] Fig. 2: An execution trace with processes and database operations t1 t2 t3 Q1 [4, 4] [4, 4] [4, 4] [4, 4] Fig. 3: PLin trace and data de A B P1 [1, 5] [5, 7] [2, 3] [8, 8] Fig. 4: PBB trace and data de (a) the first process reads file f0 and the last process writes file f0 , and (b) each process Pi was executed by process Pi 1. Example 6. Consider the trace shown in Figure 4. Process P1 reads files A and B and writes files C and D. Thus, both graph. In contrast, we assume the temporal c given (recorded when creating an execution tr these annotations to restrict what edges have to Similarly, Dey et al. [8] determine all possible ord a file a process a tuple a query temporal annotations Uses provenance entities and activities to model the execution of a DB application Thursday, April 23, 15
  22. 22. Data Dependencies from Provenance Systems t1 t2 t3 Query t4 t5 P2 C [9, 9] [9, 9] [9, 9] [9, 9] [9, 9] [9, 9] [9, 9] [7, 12] t1 t2 t3 Q1 t4 [4, 4] [4, 4] [4, 4] [4, 4] Fig. 3: PLin trace and data dependenci A B P1 C D [1, 5] [5, 7] [2, 3] [8, 8] Fine-Grained DB Provenance t1 t2 t3 Q1 t4 [4, 4] [4, 4] [4, 4] [4, 4] Fig. 3: PLin trace and data dependencies. A B P1 C D [1, 5] [5, 7] [2, 3] [8, 8] Fig. 4: PBB trace and data dependencies. st, we assume the temporal constraints as when creating an execution trace) and use s to restrict what edges have to be inferred. al. [8] determine all possible orders of events le for an OPM provenance graph. File Operations A DB execution trace has more edges than those determined by individual provenance systems A combined execution trace models the execution of a DB application including its processes, file operations, and DB accesses based on a OS and a DB provenance model. Definition 6 (Combined Execution Trace). Let PDB and POS be DB and OS provenance models. Every execution trace for PDB+OS is a combined execution trace for PDB and POS. Example 3. A combined execution trace for the PLin and PBB models is shown in Figure 2. This trace models the execution of two processes P1 and P2. Process P1 reads two files A and B, and executes two insert statements (at time 5 and 8 respectively). These insert statements create three tuple versions t1, t2, and t3. Process P2 executes a query which returns tuples t4 and t5. These tuples depend on tuples t1 and t3. Finally, process P2 writes file C. VI. DATA DEPENDENCIES The above definitions describe interactions of activities and entities in an execution trace of a provenance model, but do not model data dependencies, i.e., dependencies between entities. In our model, a dependency is an edge between two entities e and e0 where a change to the input node (e0 ) may result in a change to the output node (e). Given a provenance model, de- pendency information may or may not be explicitly available; it depends upon the granularity at which information about entities and activities is tracked and stored. For instance, the blackbox provenance model PBB operates at the granularity of processes and files and may not compute exact dependency information. Consider a process P that reads from files A and sales id price {t1} 1 5 {t2} 2 11 {t3} 3 14 result ttl {t2, t3} 25 Fig. 5: Annotated relation sales and query result compute provenance polynomials (and thus also Lineage) on demand for an input query. In the following we will us Lin(Q, t) to denote the Lineage of a tuple t in the result of query Q. Example 4. Consider the sales table shown in Fig ure 5. The Lineage of each tuple in the sales ta ble is a singleton set containing the tuple’s identi fier. The result of a query SELECT sum(value) AS ttl FROM sales WHERE price > 10 is a single row with ttl = 11+14 = 25. The Lineage contains all tuples (t2 and t3) tha were used to compute this results. We define data dependencies in the PLin model based on Lineage. We connect each tuple t in the result of a query Q to all input tuples of the query that are in t’s Lineage. Similarly we connect a modified tuple t in the result of an update to th corresponding tuple t0 in the input of the update. Definition 7 (PLin Data Dependencies). Let G be a PLin trace. Let Lin(s, t) denote the Lineage of tuple t in the resul of DB operation s, and let t and t0 denote entities (tuples) The dependencies D(G) ⇢ D ⇥ D of G are defined as: Using PTU1 Using Perm2 1 Q. Pham,T. Malik, and I. Foster. Using provenance for repeatability. In Theory and Practice of Provenance (TaPP), 2013. 2 B. Glavic et al. Perm: Processing Provenance and Data on the same Data Model through Query Rewriting. In ICDE, 2009. Thursday, April 23, 15
  23. 23. Can we use temporal annotations and known direct data dependencies to infer a sound and complete set of dependencies that helps us determine the smallest size repeatability package? Key Question Thursday, April 23, 15
  24. 24. Axioms for Dependency Inference • no direct data dependencies implies there is no data flow • state of node at point in time depends on past interactions only • flow of data should not violate temporal causality Thursday, April 23, 15
  25. 25. Inferring Dependencies (a) No Dependency between C and A A P1 B P2 C[2, 3] [6, 7] [1, 5] [6, 6] (b) C depends on A at time 4 A P1 B P2 C[1, 1] [4, 7] [2, 5] [1, 6] (c) No Dependency between C and A (a) No Dependency between C and A A P1 B P2 C[2, 3] [6, 7] [1, 5] [6, 6] (b) C depends on A at time 4 A P1 B P2 C[1, 1] [4, 7] [2, 5] [1, 6] (c) No Dependency between C and A1 4 5 6 2 6 5 no such sequence exists • to determine whether information has flown from A to C • find increasing sequence of times for edges so each time lies in edge’s interval sequence shown on the left Thursday, April 23, 15
  26. 26. Experiments • 3 Metrics • Performance • Usability • Generality 1e-05 0.0001 Test Prepare Inserts First Select Other Selects Updates 1e-05 0.0001 Test Prepare Inserts First Select Other Selects U Fig. 7: Execution time of each step in an execution of TPC-H (a) Q1 1e-05 0.0001 0.001 0.01 0.1 1 10 100 1000 Test Prepare Inserts First Select Other Selects Updates Executiontime(seconds) PostgreSQL 0.03 0.00357 0.562 0.3753 0.00084 Open-Source DB Server scenario 178.2 0.003042 0.492 0.3921 0.001 Proprietary DB Server scenario 0.016 4E-05 0.016 0.0003 0.00018 (b) Q2 1e-05 0.0001 0.001 0.01 0.1 1 10 100 1000 Test Prepare Inserts First Select Other Selects U Executiontime(seconds) Postgre 0.03 0.00348 0.088 0.02872 Open-Source DB Server sce 34.09 0.003072 0.18 0.08532 Proprietary DB Server sce 0.036 0.000376 0.218 0.1555 Fig. 8: Re-Execution time of each step in an execution of TPC-H Package Software binaries Server binaries Data directory Database provenance PTU 3 3 3(full) 7 Open-Source DBS 3 3 3(empty) 3 Proprietary DBS 3 7 7 3 TABLE III: Content of PTU and LDV packages: PTU pack- ages contain data directory of the full database, whereas Open- Source DBS LDV packages contain a data directory of an empty database (created by the initdb command) 100 200 300 400 500 Totalpackagesize(MB) ApplicationVirtualization + Database LDV + Open-source DB LDV + Proprietary DB Thursday, April 23, 15
  27. 27. TPC-H Queries • Most TPC-H queries touch large fractions of tables • Modified by varying parameters and selectivity TABLE II: The 18 TPC-H benchmark queries used in our experiments Queries SQL PARAM Sel. Q1-1 to Q1-5 SELECT l quantity, l partkey , l extendedprice , l shipdate , l receiptdate FROM lineitem WHERE l suppkey BETWEEN 1 AND PARAM 10, 20, 50, 100, 250 1%, 2%, 5%, 10%, 25% Q2-1 to Q2-4 SELECT o comment, l comment FROM lineitem l, orders o, customer c WHERE l.l orderkey = o.o orderkey AND o.o custkey = c.c custkey AND c.c name LIKE ’%PARAM%’; 0000000, 000000, 00000, 0000 66%, 6.6%, 0.66%, 0.06% Q3-1 to Q3-4 SELECT count(⇤) FROM lineitem l, orders o, customer c WHERE l.l orderkey = o.o orderkey AND o.o custkey = c.c custkey AND c.c name LIKE ’%PARAM%’; 0000000, 000000, 00000, 0000 66%, 6.6%, 0.66%, 0.06% Q4-1 to Q4-5 SELECT o orderkey, AVG(l quantity) AS avgQ FROM lineitem l, orders o WHERE l.l orderkey = o.o orderkey AND l suppkey BETWEEN 1 AND PARAM GROUP BY o orderkey; 10, 20, 50, 100, 250 1%, 2%, 5%, 10%, 25% (a) Audit 0.001 0.01 0.1 1 10 100 1000 10000 Executiontime(seconds) PostgreSQL + PTU Server-included package Server-excluded package (b) Replay 0.001 0.01 0.1 1 10 100 1000 10000 Executiontime(seconds) PostgreSQL + PTU 0.01001 0.08 0.053 01 Server-included package 4.19 0.00063 0.05 0.025 .0003 Server-excluded package 0.01 0.01 0.009 01 Thursday, April 23, 15
  28. 28. Size Comparison 10 100 1000 10000 1-1 1-2 1-3 1-4 1-5 2-1 2-2 2-3 2-4 3-1 3-2 3-3 3-4 4-1 4-2 4-3 4-4 4-5 Packagesize(MB) Query PTU package Server-included package Server-excluded package Fig. 9: LDV packages are significantly smaller than PTU packages when queries have low selectivity. and the LDV packages. The VMI is 8.2 GB: 80 times larger than the average LDV package (100MB). To evaluate runtime performance, we instantiate this VMI using the same number of cores and memory as in our machine to execute our queries. [4] C. T. Bro ivory.idy [5] J. Chene Foundati [6] F. Chirig provenan [7] F. S. Ch to suppo [8] S. C. De provenan [9] J. Freire reproduc 14(4), 20 [10] B. Glavi Data Mo [11] B. Glavi provenan practice [12] C. A. G for work on Workfl [13] P. J. Guo create p Conferen [14] B. How research. • LDV packages are significantly smaller than PTU packages when queries have low selectivity • TheVMI is 8.2 GB: 80 times larger than the average LDV package (100MB). Thursday, April 23, 15
  29. 29. Audit and Replay1e-05 0.0001 0.001 0.01 0.1 Inserts First Select Other Selects Updates Executio 1e-05 0.0001 0.001 0.01 0.1 Initialization Inserts First Select Other Selects Updates Executio 1E-05 0.0 0.0001 0.00063 0 0.0003 0.0 2E-05 0.0 0.0 0.0001 Fig. 7: Execution time of each step in an application with query Q1-1 (a) Audit 0.001 0.01 0.1 1 10 100 1000 10000 1-1 1-2 1-3 1-4 1-5 2-1 2-2 2-3 2-4 3-1 3-2 3-3 3-4 4-1 4-2 4-3 4-4 4-5 Executiontime(seconds) Query PostgreSQL + PTU Server-included package Server-excluded package (b) Replay 0.001 0.01 0.1 1 10 100 1-1 1-2 1-3 1-4 1-5 2-1 2-2 2-3 2-4 3-1 3-2 3-3 3-4 4-1 4-2 4-3 4-4 4-5 Executiontime(seconds) Query PostgreSQL + PTU Server-included package Server-excluded package VM Fig. 8: Execution time for each query, during audit (left) and replay (right) LE III: Package Contents: PTU packages contain all data of the full DB, whereas server-included LDV packages in the data files of an empty DB. kage type Software binaries DB server Data files DB provenance U 3 3 3(full) 7 V server-included 3 3 3(empty) 3 V server-excluded 3 7 7 3 tuples needed to re-execute the application—which, fo queries, is at most ⇠25% of all tuples. Server-excluded packages are often yet smaller, because they contain on query results—which, for many of our experiment queri smaller than the tuples required for re-execution. Ho recall that server-excluded packages have less flexibilit do server-included packages. LDV amortizes audit cost significantly at replay time Thursday, April 23, 15
  30. 30. Summary • LDV permits sharing and repeating DB applications • LDV combines OS and DB provenance to determine file and DB slices • LDV creates light-weight virtualized packages based on combined provenance • Results show LDV is efficient, usable, and general • LDV at http://github.com/lordpretzel/ldv.git Thursday, April 23, 15
  31. 31. Q&A • ? Thursday, April 23, 15
  32. 32. Inferring Dependencies s: ncies nter- ns of entity n the nance ncies, h do there C). e file (a) No Dependency between C and A A P1 B P2 C[2, 3] [6, 7] [1, 5] [6, 6] (b) C depends on A at time 4 A P1 B P2 C[1, 1] [4, 7] [2, 5] [1, 6] (c) No Dependency between C and A A P1 B P2 C[1, 1] [4, 7] [2, 5] [1, 6] Fig. 6: Example traces with different temporal annotations s: ncies nter- ns of ntity n the ance cies, h do there C). e file (a) No Dependency between C and A A P1 B P2 C[2, 3] [6, 7] [1, 5] [6, 6] (b) C depends on A at time 4 A P1 B P2 C[1, 1] [4, 7] [2, 5] [1, 6] (c) No Dependency between C and A A P1 B P2 C[1, 1] [4, 7] [2, 5] [1, 6] Fig. 6: Example traces with different temporal annotations between e0 and e, because if there is no path between e0 and s: ncies inter- ns of entity n the nance ncies, th do there ! C). e file (a) No Dependency between C and A A P1 B P2 C[2, 3] [6, 7] [1, 5] [6, 6] (b) C depends on A at time 4 A P1 B P2 C[1, 1] [4, 7] [2, 5] [1, 6] (c) No Dependency between C and A A P1 B P2 C[1, 1] [4, 7] [2, 5] [1, 6] Fig. 6: Example traces with different temporal annotations between e0 and e, because if there is no path between e0 and No Depedency between C and A C Depends on A at time 4 No Dependency between C and A Thursday, April 23, 15

×