SlideShare a Scribd company logo
1 of 20
By: Kostas, Jeremy, Renqing, Rahul
Mentor: Sastry S Duri (IBM Research)
What is it?
OVERVIEW
Container Safety Determination (CSD) is a
scanning and monitoring tool that lets
engineers examine the safety state of their
containers.
The tool works for both images and
containers, and can be configured to work
without user intervention.
WORKING
CSD works by detecting suspicious files. It
compares all the files of a given image with a
database of known malicious and non-malicious
binaries in order to determine how safe an image
is. The security engineer works on the feedback
received for a particular image and takes action
accordingly.
Technologies Used
sdhash
Agile Development
 Sprint planning before start of sprint
 Break down requirements into simple tasks that can be completed during sprint
 Prioritize and assign tasks
 Status meetings every other day - address blocking issues
 Weekly meetings with mentor - review progress, design
 Sprint review at end of sprint
 Discuss lessons learned and how to improve in next sprint
 Used Trello for sprint planning
 Used Slack for team communication and hangouts for meetings
- Analysis & Design
- Registry
- Sdhash
- Usecase 1
- sdHash
Sprint 1 Sprint 2 Sprint 3
- Usecase 2
- Action plan
Sprint 6 Sprint 5 Sprint 4
- Elasticsearch
- Endpoints
- Docker crawl
- Docker UI
- Scalability
Sprint Planning
- Documentation
- Scalability
- UI and video
1.
DOCKER REGISTRY
Let’s start with docker registry
PRODUCTION
Docker Private
RegistryDeveloper
Docker Public
Registry
PRODUCTION PRODUCTION PRODUCTION
Typical Deployment Model
Usecases
USECASE 1
What if the image used to launch a container
contains suspicious files?
Deployment engineer should have some
confidence/trust on the image that its free of
suspicious elements
USECASE 2
What if the container was launched from a safe
image, but it got compromised after that.
There should be a way to determine how safe a
container is after a specific interval. A continuous
scan after a fixed interval can help in determining
compromised containers
PRODUCTION
Docker Private
RegistryDeveloper
Docker Public
Registry
PRODUCTION PRODUCTION PRODUCTION
Areas of Interest
Scan images stored in the
registry
Scan running
containers
2.
SDHASH
SDHASH
Example:
sdhashes for two
text files (~14KB)
with mostly same
content except a few
sentences deleted
from file2.
file1 hash file2 hash
sdhash similarity index: 96
Sdhash is a tool which allows comparing of two blobs of data and checking the similarity
based on the common strings. It provides quick results which are helpful in initial trial and
investigation of files. It also reduces the filesize to 2-3% of the original.
REFERENCE DATASETS
 Official images available on dockerhub
 NSRL Dataset
 Self evaluated dataset
 Datasets by third parties
ClamAV
VirusShare.com
Overall Design
Flow of image upload
Find base
Rabbitmq
# docker push 10.10.10.10:5000/xyz:latest
centos:7
Index Different files
Registry
Instance
Broadcaster
ENDPOINT
xyz:latest
Calculate sdhash
Elasticsearch
sdhash index
Compare
centos:7 vs xyz:latest
Scan different
files
Scanner
Container Scan
Get Container diff
and find files
scan ae3894fea89
DOCKER HOST
centos:7
get containers
Elasticsearch
sdhash index
Compare
Scan different files
Scanner
Docker UI Demo
Lessons learnt
 Working as a team in agile environment
 Working with technologies such as docker, sdhash,
elasticsearch, and rabbitmq
 Internals of docker and docker-registry
 Working with Cloud platforms
 Configuring/packaging code within containers, distribution of
containers
Limitations and future plans
LIMITATIONS
 Sdhash is not ideal for comparing small files – can result in
false positives
 Indicates if an image is safe or potentially unsafe only for
known files. The tool can be improved to provide more
conclusive verdicts on image safety
 Sdhash does not work well with binary files
 Current reference dataset is very small and relies on the fact
that the official images would be correct. We need to have a
bigger dataset of malicious files
FUTURE PLANS
 Enable plugin architecture for adding new modules to
detect vulnerabilities. In this way, developers can
integrate their detection engines without any hassles
and we can have better results
 Enable master-slave model where master can spin-up
containers as the load increases
 Add a tool which works with comparing binary files
 Add tool which works with comparing small files
Thanks!
Any questions?
You can find us at:-
gladius@bu.edu
jmwenda@bu.edu
konpap94@bu.edu
rahuls@ccs.neu.edu

More Related Content

What's hot

Securing your Kubernetes applications
Securing your Kubernetes applicationsSecuring your Kubernetes applications
Securing your Kubernetes applicationsNéstor Salceda
 
Docker and SDL Web/Tridion - SDL UK User Group April 2017
Docker and SDL Web/Tridion - SDL UK User Group April 2017Docker and SDL Web/Tridion - SDL UK User Group April 2017
Docker and SDL Web/Tridion - SDL UK User Group April 2017rsleggett
 
BSides SF talk on Docker Images Security - Feb 13, 2017
BSides SF talk on Docker Images Security - Feb 13, 2017BSides SF talk on Docker Images Security - Feb 13, 2017
BSides SF talk on Docker Images Security - Feb 13, 2017Manideep Konakandla
 
Docker Security
Docker SecurityDocker Security
Docker SecurityBladE0341
 
How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016Phil Estes
 
DCSF19 Containerized Databases for Enterprise Applications
DCSF19 Containerized Databases for Enterprise ApplicationsDCSF19 Containerized Databases for Enterprise Applications
DCSF19 Containerized Databases for Enterprise ApplicationsDocker, Inc.
 
IPexpo - What is DevOps, and why should infrastructure operations care?
IPexpo - What is DevOps, and why should infrastructure operations care?IPexpo - What is DevOps, and why should infrastructure operations care?
IPexpo - What is DevOps, and why should infrastructure operations care?Chris Swan
 
Implementing Active Security with Sysdig Falco - Barcelona Software Crafters
Implementing Active Security with Sysdig Falco - Barcelona Software CraftersImplementing Active Security with Sysdig Falco - Barcelona Software Crafters
Implementing Active Security with Sysdig Falco - Barcelona Software CraftersNéstor Salceda
 
Breaking and fixing_your_dockerized_environments_owasp_appsec_usa2016
Breaking and fixing_your_dockerized_environments_owasp_appsec_usa2016Breaking and fixing_your_dockerized_environments_owasp_appsec_usa2016
Breaking and fixing_your_dockerized_environments_owasp_appsec_usa2016Manideep Konakandla
 
DockerCon 16 General Session Day 1
DockerCon 16 General Session Day 1DockerCon 16 General Session Day 1
DockerCon 16 General Session Day 1Docker, Inc.
 
You're monitoring Kubernetes Wrong
You're monitoring Kubernetes WrongYou're monitoring Kubernetes Wrong
You're monitoring Kubernetes WrongSysdig
 
Hacking into your containers, and how to stop it!
Hacking into your containers, and how to stop it!Hacking into your containers, and how to stop it!
Hacking into your containers, and how to stop it!Eric Smalling
 
How secure is your Docker Container pipeline?
How secure is your Docker Container pipeline?How secure is your Docker Container pipeline?
How secure is your Docker Container pipeline?Manideep Konakandla
 
Veer's Container Security
Veer's Container SecurityVeer's Container Security
Veer's Container SecurityJim Barlow
 
Don’t have a Meltdown! Practical Steps for Defending Your Apps
Don’t have a Meltdown! Practical Steps for Defending Your AppsDon’t have a Meltdown! Practical Steps for Defending Your Apps
Don’t have a Meltdown! Practical Steps for Defending Your AppsDocker, Inc.
 
Ryan Koop's Docker Chicago Meetup Demo March 12 2014
Ryan Koop's Docker Chicago Meetup Demo March 12 2014Ryan Koop's Docker Chicago Meetup Demo March 12 2014
Ryan Koop's Docker Chicago Meetup Demo March 12 2014Cohesive Networks
 
Back to the Future: Containerize Legacy Applications - Rob Tanner, Northern T...
Back to the Future: Containerize Legacy Applications - Rob Tanner, Northern T...Back to the Future: Containerize Legacy Applications - Rob Tanner, Northern T...
Back to the Future: Containerize Legacy Applications - Rob Tanner, Northern T...Docker, Inc.
 
Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...
Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...
Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...Docker, Inc.
 
DockerCon EU 2015: Docker Monitoring
DockerCon EU 2015: Docker MonitoringDockerCon EU 2015: Docker Monitoring
DockerCon EU 2015: Docker MonitoringDocker, Inc.
 

What's hot (20)

Securing your Kubernetes applications
Securing your Kubernetes applicationsSecuring your Kubernetes applications
Securing your Kubernetes applications
 
Docker and SDL Web/Tridion - SDL UK User Group April 2017
Docker and SDL Web/Tridion - SDL UK User Group April 2017Docker and SDL Web/Tridion - SDL UK User Group April 2017
Docker and SDL Web/Tridion - SDL UK User Group April 2017
 
BSides SF talk on Docker Images Security - Feb 13, 2017
BSides SF talk on Docker Images Security - Feb 13, 2017BSides SF talk on Docker Images Security - Feb 13, 2017
BSides SF talk on Docker Images Security - Feb 13, 2017
 
Docker Security
Docker SecurityDocker Security
Docker Security
 
How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016
 
DCSF19 Containerized Databases for Enterprise Applications
DCSF19 Containerized Databases for Enterprise ApplicationsDCSF19 Containerized Databases for Enterprise Applications
DCSF19 Containerized Databases for Enterprise Applications
 
IPexpo - What is DevOps, and why should infrastructure operations care?
IPexpo - What is DevOps, and why should infrastructure operations care?IPexpo - What is DevOps, and why should infrastructure operations care?
IPexpo - What is DevOps, and why should infrastructure operations care?
 
Implementing Active Security with Sysdig Falco - Barcelona Software Crafters
Implementing Active Security with Sysdig Falco - Barcelona Software CraftersImplementing Active Security with Sysdig Falco - Barcelona Software Crafters
Implementing Active Security with Sysdig Falco - Barcelona Software Crafters
 
Breaking and fixing_your_dockerized_environments_owasp_appsec_usa2016
Breaking and fixing_your_dockerized_environments_owasp_appsec_usa2016Breaking and fixing_your_dockerized_environments_owasp_appsec_usa2016
Breaking and fixing_your_dockerized_environments_owasp_appsec_usa2016
 
Is Docker Secure?
Is Docker Secure?Is Docker Secure?
Is Docker Secure?
 
DockerCon 16 General Session Day 1
DockerCon 16 General Session Day 1DockerCon 16 General Session Day 1
DockerCon 16 General Session Day 1
 
You're monitoring Kubernetes Wrong
You're monitoring Kubernetes WrongYou're monitoring Kubernetes Wrong
You're monitoring Kubernetes Wrong
 
Hacking into your containers, and how to stop it!
Hacking into your containers, and how to stop it!Hacking into your containers, and how to stop it!
Hacking into your containers, and how to stop it!
 
How secure is your Docker Container pipeline?
How secure is your Docker Container pipeline?How secure is your Docker Container pipeline?
How secure is your Docker Container pipeline?
 
Veer's Container Security
Veer's Container SecurityVeer's Container Security
Veer's Container Security
 
Don’t have a Meltdown! Practical Steps for Defending Your Apps
Don’t have a Meltdown! Practical Steps for Defending Your AppsDon’t have a Meltdown! Practical Steps for Defending Your Apps
Don’t have a Meltdown! Practical Steps for Defending Your Apps
 
Ryan Koop's Docker Chicago Meetup Demo March 12 2014
Ryan Koop's Docker Chicago Meetup Demo March 12 2014Ryan Koop's Docker Chicago Meetup Demo March 12 2014
Ryan Koop's Docker Chicago Meetup Demo March 12 2014
 
Back to the Future: Containerize Legacy Applications - Rob Tanner, Northern T...
Back to the Future: Containerize Legacy Applications - Rob Tanner, Northern T...Back to the Future: Containerize Legacy Applications - Rob Tanner, Northern T...
Back to the Future: Containerize Legacy Applications - Rob Tanner, Northern T...
 
Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...
Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...
Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...
 
DockerCon EU 2015: Docker Monitoring
DockerCon EU 2015: Docker MonitoringDockerCon EU 2015: Docker Monitoring
DockerCon EU 2015: Docker Monitoring
 

Similar to BU_DEMO

Top 6 Practices to Harden Docker Images to Enhance Security
Top 6 Practices to Harden Docker Images to Enhance SecurityTop 6 Practices to Harden Docker Images to Enhance Security
Top 6 Practices to Harden Docker Images to Enhance Security9 series
 
Common primitives in Docker environments
Common primitives in Docker environmentsCommon primitives in Docker environments
Common primitives in Docker environmentsalexandru giurgiu
 
Deploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and KubernetesDeploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and KubernetesPetteriTeikariPhD
 
Machine learning in cybersecutiry
Machine learning in cybersecutiryMachine learning in cybersecutiry
Machine learning in cybersecutiryVishwas N
 
DCSF 19 Building Your Development Pipeline
DCSF 19 Building Your Development Pipeline  DCSF 19 Building Your Development Pipeline
DCSF 19 Building Your Development Pipeline Docker, Inc.
 
Devops interview questions 1 www.bigclasses.com
Devops interview questions  1  www.bigclasses.comDevops interview questions  1  www.bigclasses.com
Devops interview questions 1 www.bigclasses.combigclasses.com
 
Continuous Integration for Oracle Database Development
Continuous Integration for Oracle Database DevelopmentContinuous Integration for Oracle Database Development
Continuous Integration for Oracle Database DevelopmentVladimir Bakhov
 
Forensic basics of Docker and Malware
Forensic basics of Docker and MalwareForensic basics of Docker and Malware
Forensic basics of Docker and MalwareIsha Chauhan
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes vty
 
"Docker best practice", Станислав Коленкин (senior devops, DataArt)
"Docker best practice", Станислав Коленкин (senior devops, DataArt)"Docker best practice", Станислав Коленкин (senior devops, DataArt)
"Docker best practice", Станислав Коленкин (senior devops, DataArt)DataArt
 
Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment Arun prasath
 
Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12Enkitec
 
Accelerate your development with Docker
Accelerate your development with DockerAccelerate your development with Docker
Accelerate your development with DockerAndrey Hristov
 
Accelerate your software development with Docker
Accelerate your software development with DockerAccelerate your software development with Docker
Accelerate your software development with DockerAndrey Hristov
 

Similar to BU_DEMO (20)

Docker best Practices
Docker best PracticesDocker best Practices
Docker best Practices
 
Top 6 Practices to Harden Docker Images to Enhance Security
Top 6 Practices to Harden Docker Images to Enhance SecurityTop 6 Practices to Harden Docker Images to Enhance Security
Top 6 Practices to Harden Docker Images to Enhance Security
 
Common primitives in Docker environments
Common primitives in Docker environmentsCommon primitives in Docker environments
Common primitives in Docker environments
 
Deploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and KubernetesDeploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and Kubernetes
 
Machine learning in cybersecutiry
Machine learning in cybersecutiryMachine learning in cybersecutiry
Machine learning in cybersecutiry
 
Docker for HPC in a Nutshell
Docker for HPC in a NutshellDocker for HPC in a Nutshell
Docker for HPC in a Nutshell
 
DCSF 19 Building Your Development Pipeline
DCSF 19 Building Your Development Pipeline  DCSF 19 Building Your Development Pipeline
DCSF 19 Building Your Development Pipeline
 
Docker training
Docker trainingDocker training
Docker training
 
Devops interview questions 1 www.bigclasses.com
Devops interview questions  1  www.bigclasses.comDevops interview questions  1  www.bigclasses.com
Devops interview questions 1 www.bigclasses.com
 
Continuous Integration for Oracle Database Development
Continuous Integration for Oracle Database DevelopmentContinuous Integration for Oracle Database Development
Continuous Integration for Oracle Database Development
 
Forensic basics of Docker and Malware
Forensic basics of Docker and MalwareForensic basics of Docker and Malware
Forensic basics of Docker and Malware
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
 
"Docker best practice", Станислав Коленкин (senior devops, DataArt)
"Docker best practice", Станислав Коленкин (senior devops, DataArt)"Docker best practice", Станислав Коленкин (senior devops, DataArt)
"Docker best practice", Станислав Коленкин (senior devops, DataArt)
 
Docker In Brief
Docker In BriefDocker In Brief
Docker In Brief
 
Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment
 
Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12
 
Hack the whale
Hack the whaleHack the whale
Hack the whale
 
Accelerate your development with Docker
Accelerate your development with DockerAccelerate your development with Docker
Accelerate your development with Docker
 
Accelerate your software development with Docker
Accelerate your software development with DockerAccelerate your software development with Docker
Accelerate your software development with Docker
 
What is this "docker"
What is this  "docker" What is this  "docker"
What is this "docker"
 

BU_DEMO

  • 1. By: Kostas, Jeremy, Renqing, Rahul Mentor: Sastry S Duri (IBM Research)
  • 2. What is it? OVERVIEW Container Safety Determination (CSD) is a scanning and monitoring tool that lets engineers examine the safety state of their containers. The tool works for both images and containers, and can be configured to work without user intervention. WORKING CSD works by detecting suspicious files. It compares all the files of a given image with a database of known malicious and non-malicious binaries in order to determine how safe an image is. The security engineer works on the feedback received for a particular image and takes action accordingly.
  • 4. Agile Development  Sprint planning before start of sprint  Break down requirements into simple tasks that can be completed during sprint  Prioritize and assign tasks  Status meetings every other day - address blocking issues  Weekly meetings with mentor - review progress, design  Sprint review at end of sprint  Discuss lessons learned and how to improve in next sprint  Used Trello for sprint planning  Used Slack for team communication and hangouts for meetings
  • 5. - Analysis & Design - Registry - Sdhash - Usecase 1 - sdHash Sprint 1 Sprint 2 Sprint 3 - Usecase 2 - Action plan Sprint 6 Sprint 5 Sprint 4 - Elasticsearch - Endpoints - Docker crawl - Docker UI - Scalability Sprint Planning - Documentation - Scalability - UI and video
  • 6. 1. DOCKER REGISTRY Let’s start with docker registry
  • 8. Usecases USECASE 1 What if the image used to launch a container contains suspicious files? Deployment engineer should have some confidence/trust on the image that its free of suspicious elements USECASE 2 What if the container was launched from a safe image, but it got compromised after that. There should be a way to determine how safe a container is after a specific interval. A continuous scan after a fixed interval can help in determining compromised containers
  • 9. PRODUCTION Docker Private RegistryDeveloper Docker Public Registry PRODUCTION PRODUCTION PRODUCTION Areas of Interest Scan images stored in the registry Scan running containers
  • 11. SDHASH Example: sdhashes for two text files (~14KB) with mostly same content except a few sentences deleted from file2. file1 hash file2 hash sdhash similarity index: 96 Sdhash is a tool which allows comparing of two blobs of data and checking the similarity based on the common strings. It provides quick results which are helpful in initial trial and investigation of files. It also reduces the filesize to 2-3% of the original.
  • 12. REFERENCE DATASETS  Official images available on dockerhub  NSRL Dataset  Self evaluated dataset  Datasets by third parties ClamAV VirusShare.com
  • 14. Flow of image upload Find base Rabbitmq # docker push 10.10.10.10:5000/xyz:latest centos:7 Index Different files Registry Instance Broadcaster ENDPOINT xyz:latest Calculate sdhash Elasticsearch sdhash index Compare centos:7 vs xyz:latest Scan different files Scanner
  • 15. Container Scan Get Container diff and find files scan ae3894fea89 DOCKER HOST centos:7 get containers Elasticsearch sdhash index Compare Scan different files Scanner
  • 17. Lessons learnt  Working as a team in agile environment  Working with technologies such as docker, sdhash, elasticsearch, and rabbitmq  Internals of docker and docker-registry  Working with Cloud platforms  Configuring/packaging code within containers, distribution of containers
  • 18. Limitations and future plans LIMITATIONS  Sdhash is not ideal for comparing small files – can result in false positives  Indicates if an image is safe or potentially unsafe only for known files. The tool can be improved to provide more conclusive verdicts on image safety  Sdhash does not work well with binary files  Current reference dataset is very small and relies on the fact that the official images would be correct. We need to have a bigger dataset of malicious files FUTURE PLANS  Enable plugin architecture for adding new modules to detect vulnerabilities. In this way, developers can integrate their detection engines without any hassles and we can have better results  Enable master-slave model where master can spin-up containers as the load increases  Add a tool which works with comparing binary files  Add tool which works with comparing small files
  • 19.
  • 20. Thanks! Any questions? You can find us at:- gladius@bu.edu jmwenda@bu.edu konpap94@bu.edu rahuls@ccs.neu.edu

Editor's Notes

  1. Container safety determination is a  scanning tool that helps one determine the safety state of containers and images. It provides security-related feedback about the code that is introduced or modified in an image or container. That way, engineers can use CSD as a monitoring tool to keep up with the safety of their images.
  2. CSD works by listening an assigned Docker registry. Whenever someone pushes a new image, CSD pulls that image and proceeds to compute the hash of all its files. CSD utilizes sdhash to compare the binaries of each file to an elasticsearch database, containing the hashes of all the reference image files.
  3. We've built our software using a multiple technologies that include RabbitMQ, docker, elasticsearch, sdhash and clamAV. We will soon discuss each technology in greater depth
  4. Agile has been a huge part of our project and a great help to establishing a strong team dynamic. We were able to keep up with our biweekly sprints by utilizing development methods and techniques We had a sprint planning before each sprint, assigning tasks to each individual and this helped us in keeping track of the timeline related to the completion of the project
  5. The project time-line was split between 6 sprints, spanning across our spring semester
  6. In a typical production environment, the developer pulls an image from public registry, modifies the image, packages his code and submits it to private registry. The cloud/server administrator then goes ahead and runs that docker container on the production nodes.
  7. From the deployment model discussed, we could think of two questions: how secure my image is, and how secure my container is? Our first usecase is determining the goodness of an image and the second usecase discusses about determining if the launched container is safe or not.
  8. This can be achieved by performing scans at two locations: when an image is stored in the registry, and scanning the containers running in production environment periodically.
  9. Sdhash, short for similarity digest hash, is a tool that allows comparing of two blobs of data and checking their similarity based on common strings within the blobs. The tool first generates a digest of each blob and then calculates a similarity index based on the hamming distance between the digests. The similarity index is between 0 and 100. We are using sdhash because it enables us to determine similarity between files. Here is an example of sdhashes generated from two files each about 14KB. The files have mostly the same content except a few sentences are deleted from one of the files. The similarity index generated by sdhash compare is 96.
  10. for Reference data sets we're using Official images from docker hub: for each image we are scanning, we determine the original base image and compare our image against the base image. The assumption is that the official base image is safe. We’re also using reference hashes from the National Software Reference Library. We also incrementally generating our own data set based on images that the user has marked as safe. We’re also using other third party  datasets such as clamAV and VirusShare.
  11. Our overall workflow includes pulling an image from the registry, Computing sdhashes of files in the image, And comparing each sdhash against our reference dataset of sdhashes to determine if there are any suspicious files in the image. We also run a scan on suspicious files using clamAV.
  12. DockerUI is a non-official open source project we found on github, we did some change on this project to integrate our two usecases with it.
  13. Most of our group members haven't work an agile team before, so it's a good experience and we all learnt a lot from the developing methodology. We also learnt some new tools like docker, sdhash and elasticsearch.  Especially for docker, we've been digging into docker an d we set up our own registry.
  14. Talking about the limitations of our project, sdhash is not ideal when dealing with small files and our current dataset is not big enough so there might be wrong judgement coming out from our program.   So in the future we're planing to add tools to scan small files and binary files. We're also planing to provide more approach to let developers integrate their detection engines to our project, and we're