Tradeoffs in Automatic Provenance Capture


Published on

Presentation from IPAW 2016. Preprint

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • What should I recommend?
    It depends?
  • Questions not answered only by intuition.
  • Traeger was focusing on using benchmarks for measuring filesystem/storage performance.
    His observation is pretty much valid for using benchmarks for measuring other types of performance.
  • SPEC includes several sub-benchmarks which may be atypical for provenance analysis. E.g. discrete event simulator or quantum computer simulator.
    If we want to also measure precision/recall, it is hard to get the ground truth for the provenance generated by these benchmarks.
  • 1. execl-xput: How fast the current process image can be replaced with a new one, as a result of an execve system call.
    2. fcopy-256, fcopy-1024, fcopy-4096: Speed of a file-to-file copy using dif- ferent buffer sizes.
    3. pipe-xput, pipe-cs: Speed of communication over pipes. In the first test, the read and writes on the pipe happen from a single process. In the second test a second process is spawned, so the communication also includes a context switch between the two.
    4. spawn-xput: A simple fork-wait loop to measure how much time is needed to create and then destroy a process.
    5. shell-1, shell-8: Execution speed for the processing of a data file. The processing is implemented using common unix utilities, wrapped in a shell script. The two tests differ in the number of concurrently executing scripts.
    6. syscall: System call overhead. The test uses getpid to measure this. The specific system call is chosen because it requires minimal in-kernel processing, so its main overhead comes from the switch between kernel and user mode.
  • Degradation depends on method used.
  • LPM [Bates 15] already supports the SPADE DSL reporter.
  • Reason for being slower: Lack of buffering. A new connection was opened each time we needed to output a piece of provenance.
  • SPADEv2 provides easy interfacing with other provenance systems via the DSL Report.
    Linux Provenance Modules [Bates 15] already support it.
    This makes it a good platform for measuring qualitative features (such as # of edges/vertices) and also to run queries that would verify if the ground truth was captured.
  • Execl: excve speed
    Fcopy-*: file copy with different buffer sizes
    Pipe-*: pipe communcation
    Spawn: fork-wait loop (process creation/destruction speed)
    Shell-*: Unix utilities wrapped in a script. Similar to what coreutils testing would yield.
    Syscall: system call overhead (uses getpid as the most “lightweight” system call)
  • Tradeoffs in Automatic Provenance Capture

    1. 1. Trade-offs in Automatic Provenance Capture Manolis Stamatogiannakis, Hasanat Kazmi, Hashim Sharif, Remco Vermeulen, Ashish Gehani, Herbert Bos, and Paul Groth
    2. 2. Capturing Provenance Disclosed Provenance + Accuracy + High-level semantics – Intrusive – Manual Effort Observed Provenance – False positives – Semantic Gap + Non-intrusive + Minimal manual effort CPL (Macko ‘12) Trio (Widom ‘09) PrIME (Miles ‘09) Taverna (Oinn ‘06) VisTrails (Fraire ‘06) ES3 (Frew ‘08) Trec (Vahdat ‘98) PASSv2 (Holland ‘08) DTrace Tool (Gessiou ‘12) 2 OPUS (Balakrishnan ‘ 13)
    3. 3. gehani/SPADE/wiki • Strace Reporter – Programs run under strace. Produced log is parsed to extract provenance. • LLVMTrace – Instrumentation added to function boundaries at compile time. • DataTracker – Dynamic Taint Analysis. Bytes associated with metadata which are propagated as the program executes. 3 SPADEv2 – Provenance Collection
    4. 4. SPADEv2 flow 4
    5. 5. Current Intuition 5
    6. 6. Current Intuition 6
    7. 7. Incomplete Picture • Faster, but how much? • What is the performance “price” for fewer false positives? • Does a compile-time solution worth the effort? 7
    8. 8. How can one get more insight? Run a benchmark! 8
    9. 9. Which one? • LMBench, UnixBench, Postmark, BLAST, SPECint… • [Traeger 08]: “Most popular benchmarks are flawed.” • No-matter what you chose, there will be blind spots. 9
    10. 10. Start simple: UnixBench • Well understood sub-benchmarks. • Emphasizes on performance of system calls. • System calls are commonly used for the extraction of provenance. • More insight on which collection backend would suit specific applications. • We’ll have a performance baseline to improve the specific implementations. 10
    11. 11. UnixBench Results 11
    12. 12. TRADEOFFS 12
    13. 13. Performance vs. Integration Effort • Capturing provenance from completely unmodified programs may degrade performance. • Modification of either the source (LLVMTrace) or the platform (LPM, Hi-Fi) should be considered for a production deployment. 13
    14. 14. Performance vs. Provenance Granularity • We couldn’t verify this intuition for the case of strace reoporter compared to LLVMTrace. – Strace reporter implementation is not optimal. • Tracking fine-grained provenance may interfere with existing optimizations. – E.g. buffering I/O does not benefit DataTracker. 14
    15. 15. Performance vs. False Positives/Analysis Scope • “Brute-forcing” a low false-positive ratio with the “track everything” approach of DataTracker is prohibitively expensive. • Limiting the analysis scope gives a performance boost. • If we exploit known semantics, we can have the best of both worlds. – Pre-existing semantic knowledge: LLVMTrace – Dynamically acquired knowledge: ProTracer [Ma 2016] 15
    16. 16. TAKEAWAYS 16
    17. 17. Takeaway: System Event Tracing • A good start for quick deployments • Simple versions may be expensive • What happens in the binary? 17
    18. 18. Takeaway: Compile-time Instrumentation • Middle-ground between disclosed and automatic provenance collection. • But you have to have access to source 18
    19. 19. Takeaway: Taint Analysis • Prohibitively expensive for computation- intensive programs. • Likely to remain so, even after optimizations. • Reserved for provenance analysis of unknown/legacy software. • Offline approach (Stamatogiannakis TAPP’15) 19
    20. 20. Generalizing the Results • Only one implementation was tested for each method. • Repeating testing with alternative implementations will provide confidence for the insights gained. • More confidence when choosing a specific collection method. 20 Different methods Differentimplementations
    21. 21. Implementation Details Matter • Our results are influenced by the specifics of the implementation. • Anecdote: The initial implementation of LLVMTrace was actually slower than strace reporter. 21
    22. 22. Provenance Quality • Qualitative features of the provenance are also very important. • How many vertices/edges are contained in the generated provenance graph? • Precision/Recall based on provenance ground truth. 22 Performance Benchmarks QualitativeBenchmarks
    23. 23. Where to go next? • UnixBench is a basic benchmark. • SPEC: Comprehensive in terms of performance evaluation. – Hard to get the provenance ground truth – assess quality of captured provenance. • Better directions: – Coreutils based micro-benchmarks. – Macro-benchmarks (e.g. Postmark, compilation benchmarks). 23
    24. 24. Conclusion • Automatic provenance capture is an important part of the ecosystem • Trade-offs in different capture modes • Benchmarking – to inform • Common platforms are essential 24
    25. 25. The End 25
    26. 26. UnixBench Results 26