Provenance-based Security Audits and its Application
to COVID-19 Contact Tracing Apps
Andreas Schreiber1, Tim Sonnekalb1, Thomas S. Heinze1,
Lynn von Kurnatowski1, Jesus M. Gonzalez-Barahona2, Heather Packer3
1 German Aerospace Center (DLR), Germany
2 Universidad Rey Juan Carlos, Spain
3 University of Southampton, United Kingdom
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 1
Coronavirus “Contact Tracing Apps”
German “Corona Warn App” (CWA)
• App for Exposure Notification
• Based on APIs by Apple and Google
• Developed as Open-Source Software
by SAP and Telekom
• External contributors (via pull requests)
• https://github.com/corona-warn-app
• 12 repositories (update: 23)
Our Mission
• To analyze the quality of CWA and its Open-
Source development process
• Generate advice for other government apps
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 2
Image: © 2020 Marlene Brüggemann
Development of the “Corona Warn App”
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 3
https://cauldron.io/project/3860
Getting Knowledge from git-based Projects
1. Repository Mining
• Extraction of Provenance information from git projects
(files, issues, pull requests, etc.) in PROV format
 Directed Acyclic Graphs (DAGs)
• Tools: Git2prov, GitHub2Prov, GitLab2Prov
2. Graph Storage
• Storing Provenance in graph databases
 Property Graphs
• Tools: Neo4j, prov-db-connector, prov2neo
3. Generate Insights
• Graph analytics and graph visualization
• Tools: Cypher, Neo4j Bloom, Gephi, Mathematica
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 4
Repository Mining: Extraction of Provenance Information from git Projects
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 5
Extract provenance
GitHub
Organization corona-warn-app
git
Repository
cwa-
server
git
Repository
cwa-app-
ios
git
Repository
cwa-app-
android
git
Repository
cwa-
website
git
Repository
cwa-
documenta
tion
…
Graph
Database
Neo4j
PROV
JSON / RDF
Git*2PROV
prov2neo
Contributors/
Team Query
CYPHER
request
(PyGithub)
Extract additional data
MERGE
GitHub2PROV
GitHub2PROV
• See paper at 11th International Workshop on
Theory and Practice of Provenance (TaPP 2019),
Philadelphia, June 2019
• https://www.usenix.org/conferenc
e/tapp2019/presentation/packer
Based on Git2PROV (by de Nies et al.)
• Extends the PROV model of Git2PROV
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 6
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 7
Provenance Graph – Example
Visualization with Graphviz/dot
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 8
Which files have commits by
team members as well as
external contributors?
Query Data for Visualization from Neo4j with Cypher Queries
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 9
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 10
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 11
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 12
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 13
File (Entity)
Developer (Agent)
External contribution
Team member
contribution
Project: cwa-documentation
Visualization: Contributions of Team Members and External Contributors
(Tool: Gephi)
Project: cwa-server
Tool: Gephi
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 14
File (Entity)
Developer (Agent)
External contribution
Team member
contribution
Static Application Security Testing (SAST) Pipeline
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 15
Graph
Database
Neo4j
GitHub
Organization
corona-warn-app
git
Repository
cwa-
server
…
commit
hashes
JSON
QUERY
Snapshot
git merge
<commit hash>
Code Filter
Static Code Analysis
- PMD
- Xanitizer
- Infer
- Spotbugs
- Detect
- Flowdroid
Security
Findings
JSON
SAST
Database
File Paths
Parse
results
Store results
with commit hashes
SAST Database Schema
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 16
tool
id INTEGER
name TEXT
config TEXT
version TEXT
repo
id INTEGER
name TEXT
url TEXT
snapshot
id TEXT
committer_date TEXT
author_date TEXT
commit_message TEXT
repo INTEGER
branches
id INTEGER
branch TEXT
snapshot TEXT
run
id INTEGER
snapshot TEXT
tool INTEGER
success INTEGER
warning
id INTEGER
message TEXT
location TEXT
severity TEXT
run INTEGER
Number of Code Analysis Warnings for cwa-server Repository
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 17
Jul 2020 Sep 2020 Nov 2020 Jan 2021 Mar 2021
0
2
4
6
8
10
Date
Number
of
warnings
Jul 2020 Sep 2020 Nov 2020 Jan 2021 Mar 2021
0
5
10
15
20
25
30
35
40
45
Date
Number
of
warnings
PMD Xanitizer
Four Steps of the Provenance-driven Code Analysis
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 18
Step 2
Step 4
Step 3
Step 1
Graph
Database
Neo4j
commit hashes
DataFrame
QUERY
CYPHER
Filter and
clean results
SAST
Database
SQLite
store
commit hashes
Analyze and
plot
QUERY
SQL Results
Diagrams,
Reports,
…
generate
Cypher Query for Getting Commits by External Contributors
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 19
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 20
0 10 20 30 40 50
0
10
20
30
40
50
60
70
External Contributors Team Members
Warnings per commit
Sum
of
commits
with
#
warnings
Distribution of Number of all SAST Warnings for Commits
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 21
−1
5 −1
0 −5 0 5 1
0 1
5
1
2
5
1
0
2
5
1
0
0
2
Changes in number of warnings induced by commit
Sum
of
commits
with
#
diffs
(log
scale)
Distribution of Change in Number of SAST Warnings Caused by Commits
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 22
Current & Future Work
Applying the methodology to other projects
• DLR Inner Source: aerospace software
• Apps with high public relevance:
Luca App, CovPass App, …
Automation and visual analytics
• Easy setup for new projects
(GitHub/GitLab)
• (Public) interactive dashboard
Adding additional data sources
• App execution traces
• Social media mentions
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 23
Thank You!
Questions?
Andreas Schreiber
Andreas.Schreiber@dlr.de
DLR Institute for Software Technology,
Intelligent and Distributed Systems
http://www.DLR.de/sc/ivs
@onyame | @DLR_software
> IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021
DLR.de • Chart 24

Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps (Andreas Schreiber, Tim Sonnekalb, Thomas Heinze, Lynn von Kurnatowski, Jesus M. Gonzalez-Barahona, Heather Packer)

  • 1.
    Provenance-based Security Auditsand its Application to COVID-19 Contact Tracing Apps Andreas Schreiber1, Tim Sonnekalb1, Thomas S. Heinze1, Lynn von Kurnatowski1, Jesus M. Gonzalez-Barahona2, Heather Packer3 1 German Aerospace Center (DLR), Germany 2 Universidad Rey Juan Carlos, Spain 3 University of Southampton, United Kingdom > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 1
  • 2.
    Coronavirus “Contact TracingApps” German “Corona Warn App” (CWA) • App for Exposure Notification • Based on APIs by Apple and Google • Developed as Open-Source Software by SAP and Telekom • External contributors (via pull requests) • https://github.com/corona-warn-app • 12 repositories (update: 23) Our Mission • To analyze the quality of CWA and its Open- Source development process • Generate advice for other government apps > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 2 Image: © 2020 Marlene Brüggemann
  • 3.
    Development of the“Corona Warn App” > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 3 https://cauldron.io/project/3860
  • 4.
    Getting Knowledge fromgit-based Projects 1. Repository Mining • Extraction of Provenance information from git projects (files, issues, pull requests, etc.) in PROV format  Directed Acyclic Graphs (DAGs) • Tools: Git2prov, GitHub2Prov, GitLab2Prov 2. Graph Storage • Storing Provenance in graph databases  Property Graphs • Tools: Neo4j, prov-db-connector, prov2neo 3. Generate Insights • Graph analytics and graph visualization • Tools: Cypher, Neo4j Bloom, Gephi, Mathematica > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 4
  • 5.
    Repository Mining: Extractionof Provenance Information from git Projects > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 5 Extract provenance GitHub Organization corona-warn-app git Repository cwa- server git Repository cwa-app- ios git Repository cwa-app- android git Repository cwa- website git Repository cwa- documenta tion … Graph Database Neo4j PROV JSON / RDF Git*2PROV prov2neo Contributors/ Team Query CYPHER request (PyGithub) Extract additional data MERGE
  • 6.
    GitHub2PROV GitHub2PROV • See paperat 11th International Workshop on Theory and Practice of Provenance (TaPP 2019), Philadelphia, June 2019 • https://www.usenix.org/conferenc e/tapp2019/presentation/packer Based on Git2PROV (by de Nies et al.) • Extends the PROV model of Git2PROV > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 6
  • 7.
    > IPAW 2021> A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 7 Provenance Graph – Example Visualization with Graphviz/dot
  • 8.
    > IPAW 2021> A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 8 Which files have commits by team members as well as external contributors?
  • 9.
    Query Data forVisualization from Neo4j with Cypher Queries > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 9
  • 10.
    > IPAW 2021> A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 10
  • 11.
    > IPAW 2021> A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 11
  • 12.
    > IPAW 2021> A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 12
  • 13.
    > IPAW 2021> A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 13 File (Entity) Developer (Agent) External contribution Team member contribution Project: cwa-documentation Visualization: Contributions of Team Members and External Contributors (Tool: Gephi)
  • 14.
    Project: cwa-server Tool: Gephi >IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 14 File (Entity) Developer (Agent) External contribution Team member contribution
  • 15.
    Static Application SecurityTesting (SAST) Pipeline > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 15 Graph Database Neo4j GitHub Organization corona-warn-app git Repository cwa- server … commit hashes JSON QUERY Snapshot git merge <commit hash> Code Filter Static Code Analysis - PMD - Xanitizer - Infer - Spotbugs - Detect - Flowdroid Security Findings JSON SAST Database File Paths Parse results Store results with commit hashes
  • 16.
    SAST Database Schema >IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 16 tool id INTEGER name TEXT config TEXT version TEXT repo id INTEGER name TEXT url TEXT snapshot id TEXT committer_date TEXT author_date TEXT commit_message TEXT repo INTEGER branches id INTEGER branch TEXT snapshot TEXT run id INTEGER snapshot TEXT tool INTEGER success INTEGER warning id INTEGER message TEXT location TEXT severity TEXT run INTEGER
  • 17.
    Number of CodeAnalysis Warnings for cwa-server Repository > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 17 Jul 2020 Sep 2020 Nov 2020 Jan 2021 Mar 2021 0 2 4 6 8 10 Date Number of warnings Jul 2020 Sep 2020 Nov 2020 Jan 2021 Mar 2021 0 5 10 15 20 25 30 35 40 45 Date Number of warnings PMD Xanitizer
  • 18.
    Four Steps ofthe Provenance-driven Code Analysis > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 18 Step 2 Step 4 Step 3 Step 1 Graph Database Neo4j commit hashes DataFrame QUERY CYPHER Filter and clean results SAST Database SQLite store commit hashes Analyze and plot QUERY SQL Results Diagrams, Reports, … generate
  • 19.
    Cypher Query forGetting Commits by External Contributors > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 19
  • 20.
    > IPAW 2021> A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 20
  • 21.
    0 10 2030 40 50 0 10 20 30 40 50 60 70 External Contributors Team Members Warnings per commit Sum of commits with # warnings Distribution of Number of all SAST Warnings for Commits > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 21
  • 22.
    −1 5 −1 0 −50 5 1 0 1 5 1 2 5 1 0 2 5 1 0 0 2 Changes in number of warnings induced by commit Sum of commits with # diffs (log scale) Distribution of Change in Number of SAST Warnings Caused by Commits > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 22
  • 23.
    Current & FutureWork Applying the methodology to other projects • DLR Inner Source: aerospace software • Apps with high public relevance: Luca App, CovPass App, … Automation and visual analytics • Easy setup for new projects (GitHub/GitLab) • (Public) interactive dashboard Adding additional data sources • App execution traces • Social media mentions > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 23
  • 24.
    Thank You! Questions? Andreas Schreiber Andreas.Schreiber@dlr.de DLRInstitute for Software Technology, Intelligent and Distributed Systems http://www.DLR.de/sc/ivs @onyame | @DLR_software > IPAW 2021 > A. Schreiber et al. • Provenance-based Security Audits and its Application to COVID-19 Contact Tracing Apps > 19.07.2021 DLR.de • Chart 24