Traeger was focusing on using benchmarks for measuring filesystem/storage performance. His observation is pretty much valid for using benchmarks for measuring other types of performance.
SPEC includes several sub-benchmarks which may be atypical for provenance analysis. E.g. discrete event simulator or quantum computer simulator. If we want to also measure precision/recall, it is hard to get the ground truth for the provenance generated by these benchmarks.
1. execl-xput: How fast the current process image can be replaced with a new one, as a result of an execve system call. 2. fcopy-256, fcopy-1024, fcopy-4096: Speed of a file-to-file copy using dif- ferent buffer sizes. 3. pipe-xput, pipe-cs: Speed of communication over pipes. In the first test, the read and writes on the pipe happen from a single process. In the second test a second process is spawned, so the communication also includes a context switch between the two. 4. spawn-xput: A simple fork-wait loop to measure how much time is needed to create and then destroy a process. 5. shell-1, shell-8: Execution speed for the processing of a data file. The processing is implemented using common unix utilities, wrapped in a shell script. The two tests differ in the number of concurrently executing scripts. 6. syscall: System call overhead. The test uses getpid to measure this. The specific system call is chosen because it requires minimal in-kernel processing, so its main overhead comes from the switch between kernel and user mode.
Degradation depends on method used.
LPM [Bates 15] already supports the SPADE DSL reporter.
Reason for being slower: Lack of buffering. A new connection was opened each time we needed to output a piece of provenance.
SPADEv2 provides easy interfacing with other provenance systems via the DSL Report. Linux Provenance Modules [Bates 15] already support it. This makes it a good platform for measuring qualitative features (such as # of edges/vertices) and also to run queries that would verify if the ground truth was captured.
Execl: excve speed Fcopy-*: file copy with different buffer sizes Pipe-*: pipe communcation Spawn: fork-wait loop (process creation/destruction speed) Shell-*: Unix utilities wrapped in a script. Similar to what coreutils testing would yield. Syscall: system call overhead (uses getpid as the most “lightweight” system call)
Tradeoffs in Automatic Provenance Capture
Trade-offs in Automatic
Manolis Stamatogiannakis, Hasanat Kazmi,
Hashim Sharif, Remco Vermeulen,
Ashish Gehani, Herbert Bos, and Paul Groth
• Strace Reporter
– Programs run under strace. Produced log is parsed
to extract provenance.
– Instrumentation added to function boundaries at
– Dynamic Taint Analysis. Bytes associated with
metadata which are propagated as the program
SPADEv2 – Provenance
• Faster, but how much?
• What is the performance “price” for fewer
• Does a compile-time solution worth the
How can one get more
Run a benchmark!
• LMBench, UnixBench, Postmark, BLAST,
• [Traeger 08]: “Most popular benchmarks
• No-matter what you chose, there will be
Start simple: UnixBench
• Well understood sub-benchmarks.
• Emphasizes on performance of system calls.
• System calls are commonly used for the
extraction of provenance.
• More insight on which collection backend
would suit specific applications.
• We’ll have a performance baseline to
improve the specific implementations.
Performance vs. Integration
• Capturing provenance from completely
unmodified programs may degrade
• Modification of either the source
(LLVMTrace) or the platform (LPM, Hi-Fi)
should be considered for a production
Performance vs. Provenance
• We couldn’t verify this intuition for the case
of strace reoporter compared to
– Strace reporter implementation is not optimal.
• Tracking fine-grained provenance may
interfere with existing optimizations.
– E.g. buffering I/O does not benefit
False Positives/Analysis Scope
• “Brute-forcing” a low false-positive ratio with the
“track everything” approach of DataTracker is
• Limiting the analysis scope gives a performance
• If we exploit known semantics, we can have the
best of both worlds.
– Pre-existing semantic knowledge: LLVMTrace
– Dynamically acquired knowledge: ProTracer [Ma
Takeaway: System Event
• A good start for quick deployments
• Simple versions may be expensive
• What happens in the binary?
• Middle-ground between disclosed and
automatic provenance collection.
• But you have to have access to source
Takeaway: Taint Analysis
• Prohibitively expensive
• Likely to remain so,
even after optimizations.
• Reserved for
provenance analysis of
• Offline approach
Generalizing the Results
• Only one implementation
was tested for each method.
• Repeating testing with
will provide confidence for
the insights gained.
• More confidence when
choosing a specific collection
Implementation Details Matter
• Our results are influenced by the specifics
of the implementation.
• Anecdote: The initial implementation of
LLVMTrace was actually slower than
• Qualitative features of the
provenance are also very
• How many vertices/edges are
contained in the generated
• Precision/Recall based on
provenance ground truth.
Where to go next?
• UnixBench is a basic benchmark.
• SPEC: Comprehensive in terms of
– Hard to get the provenance ground truth –
assess quality of captured provenance.
• Better directions:
– Coreutils based micro-benchmarks.
– Macro-benchmarks (e.g. Postmark,
• Automatic provenance capture is an
important part of the ecosystem
• Trade-offs in different capture modes
• Benchmarking – to inform
• Common platforms are essential