2. What is it?
OVERVIEW
Container Safety Determination (CSD) is a
scanning and monitoring tool that lets
engineers examine the safety state of their
containers.
The tool works for both images and
containers, and can be configured to work
without user intervention.
WORKING
CSD works by detecting suspicious files. It
compares all the files of a given image with a
database of known malicious and non-malicious
binaries in order to determine how safe an image
is. The security engineer works on the feedback
received for a particular image and takes action
accordingly.
4. Agile Development
Sprint planning before start of sprint
Break down requirements into simple tasks that can be completed during sprint
Prioritize and assign tasks
Status meetings every other day - address blocking issues
Weekly meetings with mentor - review progress, design
Sprint review at end of sprint
Discuss lessons learned and how to improve in next sprint
Used Trello for sprint planning
Used Slack for team communication and hangouts for meetings
8. Usecases
USECASE 1
What if the image used to launch a container
contains suspicious files?
Deployment engineer should have some
confidence/trust on the image that its free of
suspicious elements
USECASE 2
What if the container was launched from a safe
image, but it got compromised after that.
There should be a way to determine how safe a
container is after a specific interval. A continuous
scan after a fixed interval can help in determining
compromised containers
11. SDHASH
Example:
sdhashes for two
text files (~14KB)
with mostly same
content except a few
sentences deleted
from file2.
file1 hash file2 hash
sdhash similarity index: 96
Sdhash is a tool which allows comparing of two blobs of data and checking the similarity
based on the common strings. It provides quick results which are helpful in initial trial and
investigation of files. It also reduces the filesize to 2-3% of the original.
12. REFERENCE DATASETS
Official images available on dockerhub
NSRL Dataset
Self evaluated dataset
Datasets by third parties
ClamAV
VirusShare.com
14. Flow of image upload
Find base
Rabbitmq
# docker push 10.10.10.10:5000/xyz:latest
centos:7
Index Different files
Registry
Instance
Broadcaster
ENDPOINT
xyz:latest
Calculate sdhash
Elasticsearch
sdhash index
Compare
centos:7 vs xyz:latest
Scan different
files
Scanner
15. Container Scan
Get Container diff
and find files
scan ae3894fea89
DOCKER HOST
centos:7
get containers
Elasticsearch
sdhash index
Compare
Scan different files
Scanner
17. Lessons learnt
Working as a team in agile environment
Working with technologies such as docker, sdhash,
elasticsearch, and rabbitmq
Internals of docker and docker-registry
Working with Cloud platforms
Configuring/packaging code within containers, distribution of
containers
18. Limitations and future plans
LIMITATIONS
Sdhash is not ideal for comparing small files – can result in
false positives
Indicates if an image is safe or potentially unsafe only for
known files. The tool can be improved to provide more
conclusive verdicts on image safety
Sdhash does not work well with binary files
Current reference dataset is very small and relies on the fact
that the official images would be correct. We need to have a
bigger dataset of malicious files
FUTURE PLANS
Enable plugin architecture for adding new modules to
detect vulnerabilities. In this way, developers can
integrate their detection engines without any hassles
and we can have better results
Enable master-slave model where master can spin-up
containers as the load increases
Add a tool which works with comparing binary files
Add tool which works with comparing small files
Container safety determination is a scanning tool that helps one determine the safety state of containers and images. It provides security-related feedback about the code that is introduced or modified in an image or container. That way, engineers can use CSD as a monitoring tool to keep up with the safety of their images.
CSD works by listening an assigned Docker registry. Whenever someone pushes a new image, CSD pulls that image and proceeds to compute the hash of all its files. CSD utilizes sdhash to compare the binaries of each file to an elasticsearch database, containing the hashes of all the reference image files.
We've built our software using a multiple technologies that include RabbitMQ, docker, elasticsearch, sdhash and clamAV. We will soon discuss each technology in greater depth
Agile has been a huge part of our project and a great help to establishing a strong team dynamic. We were able to keep up with our biweekly sprints by utilizing development methods and techniques We had a sprint planning before each sprint, assigning tasks to each individual and this helped us in keeping track of the timeline related to the completion of the project
The project time-line was split between 6 sprints, spanning across our spring semester
In a typical production environment, the developer pulls an image from public registry, modifies the image, packages his code and submits it to private registry. The cloud/server administrator then goes ahead and runs that docker container on the production nodes.
From the deployment model discussed, we could think of two questions: how secure my image is, and how secure my container is? Our first usecase is determining the goodness of an image and the second usecase discusses about determining if the launched container is safe or not.
This can be achieved by performing scans at two locations: when an image is stored in the registry, and scanning the containers running in production environment periodically.
Sdhash, short for similarity digest hash, is a tool that allows comparing of two blobs of data and checking their similarity based on common strings within the blobs. The tool first generates a digest of each blob and then calculates a similarity index based on the hamming distance between the digests. The similarity index is between 0 and 100. We are using sdhash because it enables us to determine similarity between files.
Here is an example of sdhashes generated from two files each about 14KB. The files have mostly the same content except a few sentences are deleted from one of the files. The similarity index generated by sdhash compare is 96.
for Reference data sets we're using Official images from docker hub:
for each image we are scanning, we determine the original base image and compare our image against the base image. The assumption is that the official base image is safe.
We’re also using reference hashes from the National Software Reference Library.
We also incrementally generating our own data set based on images that the user has marked as safe.
We’re also using other third party datasets such as clamAV and VirusShare.
Our overall workflow includes pulling an image from the registry,
Computing sdhashes of files in the image,
And comparing each sdhash against our reference dataset of sdhashes to determine if there are any suspicious files in the image. We also run a scan on suspicious files using clamAV.
DockerUI is a non-official open source project we found on github, we did some change on this project to integrate our two usecases with it.
Most of our group members haven't work an agile team before, so it's a good experience and we all learnt a lot from the developing methodology.
We also learnt some new tools like docker, sdhash and elasticsearch. Especially for docker, we've been digging into docker an d we set up our own registry.
Talking about the limitations of our project, sdhash is not ideal when dealing with small files and our current dataset is not big enough so there might be wrong judgement coming out from our program.
So in the future we're planing to add tools to scan small files and binary files.
We're also planing to provide more approach to let developers integrate their detection engines to our project, and we're