In this presentation is briefly introduced the use of Docker for Data Science.
Are presented arguments like the management of containers and the creation of new Docker images
An introduction to Linux Container, Namespace & Cgroup.
Virtual Machine, Linux operating principles. Application constraint execution environment. Isolate application working environment.
Container Torture: Run any binary, in any containerDocker, Inc.
Running a container app in the container is easy, attaching a custom app to a running container is a bit trickier. But, what if I wanted to run any arbitrary binary in any arbitrary running container? Common wisdom says it's impossible. Is it ? This talk dives into containers internals, just above the kernel surface and demonstrates that this is, indeed possible. With a bit of C magic and ptrace.
Presentation on the Linux namespaces and system calls used to provide container isolation with Docker. Presented in March 2015 at http://www.meetup.com/Docker-Phoenix/ in Tempe, Arizona.
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
Containers are everywhere. But what exactly is a container? What are they made from? What's the difference between LXC, butts-nspawn, Docker, and the other container systems out there? And why should we bother about specific filesystems?
In this talk, Jérôme will show the individual roles and behaviors of the components making up a container: namespaces, control groups, and copy-on-write systems. Then, he will use them to assemble a container from scratch, and highlight the differences (and likelinesses) with existing container systems.
An introduction to Linux Container, Namespace & Cgroup.
Virtual Machine, Linux operating principles. Application constraint execution environment. Isolate application working environment.
Container Torture: Run any binary, in any containerDocker, Inc.
Running a container app in the container is easy, attaching a custom app to a running container is a bit trickier. But, what if I wanted to run any arbitrary binary in any arbitrary running container? Common wisdom says it's impossible. Is it ? This talk dives into containers internals, just above the kernel surface and demonstrates that this is, indeed possible. With a bit of C magic and ptrace.
Presentation on the Linux namespaces and system calls used to provide container isolation with Docker. Presented in March 2015 at http://www.meetup.com/Docker-Phoenix/ in Tempe, Arizona.
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
Containers are everywhere. But what exactly is a container? What are they made from? What's the difference between LXC, butts-nspawn, Docker, and the other container systems out there? And why should we bother about specific filesystems?
In this talk, Jérôme will show the individual roles and behaviors of the components making up a container: namespaces, control groups, and copy-on-write systems. Then, he will use them to assemble a container from scratch, and highlight the differences (and likelinesses) with existing container systems.
Docker, Linux Containers, and Security: Does It Add Up?Jérôme Petazzoni
Containers are becoming increasingly popular. They have many advantages over virtual machines: they boot faster, have less performance overhead, and use less resources. However, those advantages also stem from the fact that containers share the kernel of their host, instead of abstracting an new independent environment. This sharing has significant security implications, as kernel exploits can now lead to host-wide escalations.
In this presentation, we will:
- Review the actual security risks, in particular for multi-tenant environments running arbitrary applications and code
- Discuss how to mitigate those risks
- Focus on containers as implemented by Docker and the libcontainer project, but the discussion also stands for plain containers as implemented by LXC
AdamCloud: A Cloud infrastructure for a Genomic project. The AdamCloud project aims to simplify the installation of the AmpLab genomic pipeline (Snap, Adam, Avocado).
The results of the first iteration (part II) were presented here:
http://www.slideshare.net/davidonlaptop/bdm32-adam-cloud-part-2-43514904
Virtual machines are generally considered secure. At least, secure enough to power highly multi-tenant, large-scale public clouds, where a single physical machine can host a large number of virtual instances belonging to different customers. Containers have many advantages over virtual machines: they boot faster, have less performance overhead, and use less resources. However, those advantages also stem from the fact that containers share the kernel of their host, instead of abstracting a new independent environment. This sharing has significant security implications, as kernel exploits can now lead to host-wide escalations.
We will show techniques to harden Linux Containers; including kernel capabilities, mandatory access control, hardened kernels, user namespaces, and more, and discuss the remaining attack surface.
LXC, Docker, security: is it safe to run applications in Linux Containers?Jérôme Petazzoni
Linux Containers (or LXC) is now a popular choice for development and testing environments. As more and more people use them in production deployments, they face a common question: are Linux Containers secure enough? It is often claimed that containers have weaker isolation than virtual machines. We will explore whether this is true, if it matters, and what can be done about it.
Tokyo OpenStack Summit 2015: Unraveling Docker SecurityPhil Estes
A Docker security talk that Salman Baset and Phil Estes presented at the Tokyo OpenStack Summit on October 29th, 2015. In this talk we provided an overview of the security constraints available to Docker cloud operators and users and then walked through a "lessons learned" from experiences operating IBM's public Bluemix container cloud based on Docker container technology.
High level introduction to Linux Containers. Presented at Interop Las Vegas 2015. Frames the discussion with an introduction to intermodal shipping containers, the innovation around logistics and purpose built infrastructure and the impact. Walk through features of the Linux kernel which provide isolation and limitation and packaging applications as filesystem images. Finish talking about the emerging purpose built infrastructure for managing container deployments.
Rooting Out Root: User namespaces in DockerPhil Estes
This talk on the progress to bring user namespace support into Docker was presented by Phil Estes at LinuxCon/ContainerCon 2015 on Wednesday, Aug. 19th, 2015
presentation held at SUSE Linux Expert Forum December 2014
Linux container history and Linux namespaces
examples include:
* Move a VPN connection to its own namespace(p 25)
* User namespaces demo(p 28)
see collection of useful articles and advanced container usecases pp 29
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kvXlPd
This CloudxLab Introduction to Apache ZooKeeper tutorial helps you to understand ZooKeeper in detail. Below are the topics covered in this tutorial:
1) Data Model
2) Znode Types
3) Persistent Znode
4) Sequential Znode
5) Architecture
6) Election & Majority Demo
7) Why Do We Need Majority?
8) Guarantees - Sequential consistency, Atomicity, Single system image, Durability, Timeliness
9) ZooKeeper APIs
10) Watches & Triggers
11) ACLs - Access Control Lists
12) Usecases
13) When Not to Use ZooKeeper
Docker, Linux Containers, and Security: Does It Add Up?Jérôme Petazzoni
Containers are becoming increasingly popular. They have many advantages over virtual machines: they boot faster, have less performance overhead, and use less resources. However, those advantages also stem from the fact that containers share the kernel of their host, instead of abstracting an new independent environment. This sharing has significant security implications, as kernel exploits can now lead to host-wide escalations.
In this presentation, we will:
- Review the actual security risks, in particular for multi-tenant environments running arbitrary applications and code
- Discuss how to mitigate those risks
- Focus on containers as implemented by Docker and the libcontainer project, but the discussion also stands for plain containers as implemented by LXC
AdamCloud: A Cloud infrastructure for a Genomic project. The AdamCloud project aims to simplify the installation of the AmpLab genomic pipeline (Snap, Adam, Avocado).
The results of the first iteration (part II) were presented here:
http://www.slideshare.net/davidonlaptop/bdm32-adam-cloud-part-2-43514904
Virtual machines are generally considered secure. At least, secure enough to power highly multi-tenant, large-scale public clouds, where a single physical machine can host a large number of virtual instances belonging to different customers. Containers have many advantages over virtual machines: they boot faster, have less performance overhead, and use less resources. However, those advantages also stem from the fact that containers share the kernel of their host, instead of abstracting a new independent environment. This sharing has significant security implications, as kernel exploits can now lead to host-wide escalations.
We will show techniques to harden Linux Containers; including kernel capabilities, mandatory access control, hardened kernels, user namespaces, and more, and discuss the remaining attack surface.
LXC, Docker, security: is it safe to run applications in Linux Containers?Jérôme Petazzoni
Linux Containers (or LXC) is now a popular choice for development and testing environments. As more and more people use them in production deployments, they face a common question: are Linux Containers secure enough? It is often claimed that containers have weaker isolation than virtual machines. We will explore whether this is true, if it matters, and what can be done about it.
Tokyo OpenStack Summit 2015: Unraveling Docker SecurityPhil Estes
A Docker security talk that Salman Baset and Phil Estes presented at the Tokyo OpenStack Summit on October 29th, 2015. In this talk we provided an overview of the security constraints available to Docker cloud operators and users and then walked through a "lessons learned" from experiences operating IBM's public Bluemix container cloud based on Docker container technology.
High level introduction to Linux Containers. Presented at Interop Las Vegas 2015. Frames the discussion with an introduction to intermodal shipping containers, the innovation around logistics and purpose built infrastructure and the impact. Walk through features of the Linux kernel which provide isolation and limitation and packaging applications as filesystem images. Finish talking about the emerging purpose built infrastructure for managing container deployments.
Rooting Out Root: User namespaces in DockerPhil Estes
This talk on the progress to bring user namespace support into Docker was presented by Phil Estes at LinuxCon/ContainerCon 2015 on Wednesday, Aug. 19th, 2015
presentation held at SUSE Linux Expert Forum December 2014
Linux container history and Linux namespaces
examples include:
* Move a VPN connection to its own namespace(p 25)
* User namespaces demo(p 28)
see collection of useful articles and advanced container usecases pp 29
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kvXlPd
This CloudxLab Introduction to Apache ZooKeeper tutorial helps you to understand ZooKeeper in detail. Below are the topics covered in this tutorial:
1) Data Model
2) Znode Types
3) Persistent Znode
4) Sequential Znode
5) Architecture
6) Election & Majority Demo
7) Why Do We Need Majority?
8) Guarantees - Sequential consistency, Atomicity, Single system image, Durability, Timeliness
9) ZooKeeper APIs
10) Watches & Triggers
11) ACLs - Access Control Lists
12) Usecases
13) When Not to Use ZooKeeper
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
Many initiatives for running applications inside containers have been scoped to run on a single host. Using Docker containers for large-scale production environments poses interesting challenges, especially when deploying distributed big data applications like Apache Hadoop and Apache Spark. This session at Strata + Hadoop World in New York City (September 2016) explores various solutions and tips to address the challenges encountered while deploying multi-node Hadoop and Spark production workloads using Docker containers.
Some of these challenges include container life-cycle management, smart scheduling for optimal resource utilization, network configuration and security, and performance. BlueData is "all in” on Docker containers—with a specific focus on big data applications. BlueData has learned firsthand how to address these challenges for Fortune 500 enterprises and government organizations that want to deploy big data workloads using Docker.
This session by Thomas Phelan, co-founder and chief architect at BlueData, discusses how to securely network Docker containers across multiple hosts and discusses ways to achieve high availability across distributed big data applications and hosts in your data center. Since we’re talking about very large volumes of data, performance is a key factor, so Thomas shares some of the storage options implemented at BlueData to achieve near bare-metal I/O performance for Hadoop and Spark using Docker as well as lessons learned and some tips and tricks on how to Dockerize your big data applications in a reliable, scalable, and high-performance environment.
http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/detail/52042
Demystifying Containerization Principles for Data ScientistsDr Ganesh Iyer
Demystifying Containerization Principles for Data Scientists - An introductory tutorial on how Dockers can be used as a development environment for data science projects
We talk about docker, what it is, why it matters, and how it can benefit us. This presentation is an introduction and delivered to local meetup in Indonesia.
Presentation on Pesantren Kilat Code Security
Tangerang, 2016-06-06
We talk about docker. What it is? Why it matters? and how it can benefit us?
This presentation is an introduction and delivered to local meetup in Indonesia.
Introduction to dockers and kubernetes. Learn how this helps you to build scalable and portable applications with cloud. It introduces the basic concepts of dockers, its differences with virtualization, then explain the need for orchestration and do some hands-on experiments with dockers
Containers and Nutanix - Acropolis Container ServicesNEXTtour
This presentation was given at the London Nutanix user group (NUG) on Oct 26 by Denis Guyadeen. If you would like to join a NUG, you can find more information here http://bit.ly/NTNXUG - Hope to see you at a community meeting!
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014Amazon Web Services
If you have tried Docker but are unsure about how to run it at scale, you will benefit from this session. Like virtualization before, containerization (à; la Docker) is increasing the elastic nature of cloud infrastructure by an order of magnitude. But maybe you still have questions: How many containers can you run on a given Amazon EC2 instance type? Which metric should you look at to measure contention? How do you manage fleets of containers at scale?
Datadog is a monitoring service for IT, operations, and development teams who write and run applications at scale. In this session, the cofounder of Datadog presents the challenges and benefits of running containers at scale and how to use quantitative performance patterns to monitor your infrastructure at this magnitude and complexity. Sponsored by Datadog.
Database as a Service (DBaaS) on KubernetesObjectRocket
Learn about ObjectRocket's adventures in Kubernetes. We'll cover why we chose Kubernetes for our DBaaS platform, the challenges we faced, and how we overcame them. A presentation for DevWeek Austin 2018.
Go through the result of our latest large-scale study about Docker usage in real environment. Analyze and see the impact for operations and monitoring.
Dockerized containers are the current wave that promising to revolutionize IT. Everybody is talking about containers, but a lot of people remain confused on how they work and why they are different or better than virtual machines. In this session, Black Duck container and virtualization expert Tim Mackey will demystify containers, explain their core concepts, and compare and contrast them with the virtual machine architectures that have been the staple of IT for the last decade.
Automated Testing with Docker on Steroids - nlOUG TechExperience 2018 (Amersf...Lucas Jellema
Automated testing is important. We all know that we should do it. We also know that this can be painful, for many reasons. One of the most agonizing aspects of automated testing is the handling of the data. In order to run even the simplest of tests against the user interface, a service or API or even a PL/SQL unit typically requires that a proper starting point needs to be established in the database with respect to the data. Complex set up steps need to prepare various records to ensure the test can even start and afterwards in similarly complex tear down scripts we have to clean up after the test.
This session demonstrates how this hardship can be a thing of the past. Using snapshots of a test database in a Docker container with a managed test data set that supports all tests, we can create automated tests without any set up or tear down effort. These tests can run very fast, concurrently, and whenever and wherever you like them to run. This way of working opens up much higher test coverage and much increased productivity for developers and testers.
SQL Server is container-ready. This deck covers some of the common ideas, misconceptions, myths, and realities of databases like SQL Server in a DevOps model.
Bare-metal, Docker Containers, and Virtualization: The Growing Choices for Cl...Odinot Stanislas
(FR)
Introduction très sympathique autour des environnements Cloud avec un focus particulier sur la virtualisation et les containers (Docker)
(ENG)
Friendly presentation about Cloud solutions with a focus on virtualization and containers (Docker).
Author: Nicholas Weaver – Principal Architect, Intel Corporation
Containerization (à la Docker) is increasing the elastic nature of cloud infrastructure by an order of magnitude. If you have adopted Docker, or are considering it, you are probably facing questions like:
- How many containers can you run on a given Amazon EC2 instance type?
- Which metric should you look at to measure contention?
- How do you manage fleets of containers at scale?
Datadog’s CTO, Alexis Lê-Quôc, presents the challenges and benefits of running Docker containers at scale. Alexis explains how to use quantitative performance patterns to monitor your infrastructure at the new level of magnitude and increased complexity introduced by containerization.
Similar to Docker: Containers for Data Science (20)
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
4. Data Science Lifecycle
The Team Data Science Process lifecycle consist of the
following steps:
• Business Understanding
1
5. Data Science Lifecycle
The Team Data Science Process lifecycle consist of the
following steps:
• Business Understanding
• Data Acquisition and Understanding
• Data ingestion
• Data exploration
• Set up data pipeline
1
6. Data Science Lifecycle
The Team Data Science Process lifecycle consist of the
following steps:
• Business Understanding
• Data Acquisition and Understanding
• Data ingestion
• Data exploration
• Set up data pipeline
• Modeling
• Feature engineering
• Model training
1
7. Data Science Lifecycle
The Team Data Science Process lifecycle consist of the
following steps:
• Business Understanding
• Data Acquisition and Understanding
• Data ingestion
• Data exploration
• Set up data pipeline
• Modeling
• Feature engineering
• Model training
• Deployment
• Operationalize a model: deploy the model and pipeline to a
production or production-like environment for application
consumption.
1
8. Data Science Lifecycle
The Team Data Science Process lifecycle consist of the
following steps:
• Business Understanding
• Data Acquisition and Understanding
• Data ingestion
• Data exploration
• Set up data pipeline
• Modeling
• Feature engineering
• Model training
• Deployment
• Operationalize a model: deploy the model and pipeline to a
production or production-like environment for application
consumption.
• Customer Acceptance
1
9. Challenges in Data Science
Data science life cycle highlight some challenges:
2
10. Challenges in Data Science
Data science life cycle highlight some challenges:
• Download and install libraries
2
11. Challenges in Data Science
Data science life cycle highlight some challenges:
• Download and install libraries
• Manage versions and dependencies
2
12. Challenges in Data Science
Data science life cycle highlight some challenges:
• Download and install libraries
• Manage versions and dependencies
• Upgrade libraries
2
13. Challenges in Data Science
Data science life cycle highlight some challenges:
• Download and install libraries
• Manage versions and dependencies
• Upgrade libraries
• Isolate dependencies between projects
2
15. Containirezation
Containers come with many very attractive benefits for
developers, data science team and operations teams.
• Abstraction of the host system away from the containerized
application
3
16. Containirezation
Containers come with many very attractive benefits for
developers, data science team and operations teams.
• Abstraction of the host system away from the containerized
application
• Easy Scalability
3
17. Containirezation
Containers come with many very attractive benefits for
developers, data science team and operations teams.
• Abstraction of the host system away from the containerized
application
• Easy Scalability
• Simple Dependency Management and Application Versioning
3
18. Containirezation
Containers come with many very attractive benefits for
developers, data science team and operations teams.
• Abstraction of the host system away from the containerized
application
• Easy Scalability
• Simple Dependency Management and Application Versioning
• Extremely lightweight, isolated execution environments
3
19. Containirezation
Containers come with many very attractive benefits for
developers, data science team and operations teams.
• Abstraction of the host system away from the containerized
application
• Easy Scalability
• Simple Dependency Management and Application Versioning
• Extremely lightweight, isolated execution environments
• Shared Layering
3
20. Containirezation
Containers come with many very attractive benefits for
developers, data science team and operations teams.
• Abstraction of the host system away from the containerized
application
• Easy Scalability
• Simple Dependency Management and Application Versioning
• Extremely lightweight, isolated execution environments
• Shared Layering
• Composability and Predictability
3
21. Containirezation in Data Science
Containirezation solve a many problems simultaneously:
• They make easy to use libraries with complicated setups
• CPU version vs. GPU version (eg. Tensorflow)
• Different enviromets (eg. Python 2 vs. Python 3)
• Etc...
4
22. Containirezation in Data Science
Containirezation solve a many problems simultaneously:
• They make easy to use libraries with complicated setups
• CPU version vs. GPU version (eg. Tensorflow)
• Different enviromets (eg. Python 2 vs. Python 3)
• Etc...
• They make an output reproducible
4
23. Containirezation in Data Science
Containirezation solve a many problems simultaneously:
• They make easy to use libraries with complicated setups
• CPU version vs. GPU version (eg. Tensorflow)
• Different enviromets (eg. Python 2 vs. Python 3)
• Etc...
• They make an output reproducible
• They make easy the prototyping and deploy of complex
algorithms
4
24. Containirezation in Data Science
Containirezation solve a many problems simultaneously:
• They make easy to use libraries with complicated setups
• CPU version vs. GPU version (eg. Tensorflow)
• Different enviromets (eg. Python 2 vs. Python 3)
• Etc...
• They make an output reproducible
• They make easy the prototyping and deploy of complex
algorithms
• They can make easy and isolated the Python / R / Scala
data science development enviroments.
4
25. Containerization vs Virtualization
Virtual Machines (VMs)
• Represents hardware-level
virtualization
• Heavyweight
• Slow provisioning
• Limited performance
• Fully isolated and hence
more secure
5
26. Containerization vs Virtualization
Virtual Machines (VMs)
• Represents hardware-level
virtualization
• Heavyweight
• Slow provisioning
• Limited performance
• Fully isolated and hence
more secure
Containers
• Represents operating
system virtualization
• Lightweight
• Real-time provisioning and
scalability
• Native performance
• Process-level isolation and
hence less secure
5
27. Docker and Containerization
Figure 1: Containers isolate individual applications and use operating
system resources that have been abstracted by Docker. Containers can
be built by ”layering”, with multiple containers sharing underlying layers,
decreasing resource usage.
6
29. Run a Docker container
Docker runs processes in isolated containers. The docker run
command must specify an image to derive the container from. An
image developer can define image defaults related to:
7
30. Run a Docker container
Docker runs processes in isolated containers. The docker run
command must specify an image to derive the container from. An
image developer can define image defaults related to:
• Detached or foreground running
• Container identification
• Network settings
• Runtime constraints on CPU and memory
7
31. Interactive and Detached mode
Docker support two different running mode: interactive and
detached
8
32. Interactive and Detached mode
Docker support two different running mode: interactive and
detached
Interactive mode
$ sudo docker run -t -i --name mycontainer
alessandroadamo/ubuntu-ds-python3 /bin/bash
NB. To exit from an interactive container type exit command.
8
33. Interactive and Detached mode
Docker support two different running mode: interactive and
detached
Interactive mode
$ sudo docker run -t -i --name mycontainer
alessandroadamo/ubuntu-ds-python3 /bin/bash
NB. To exit from an interactive container type exit command.
Detached mode
$ sudo docker run -t -d -p 8888:8888
-v /home/user/notebooks:/home/ds/notebooks
--name mycontainer-daemon
alessandroadamo/ubuntu-ds-python3 8
34. List Containers
To list informations about the containers status we use the docker
ps.
Running containers
$ sudo docker ps
9
35. List Containers
To list informations about the containers status we use the docker
ps.
Running containers
$ sudo docker ps
Interactive mode
$ sudo docker ps -a
9
36. List Containers
To list informations about the containers status we use the docker
ps.
Running containers
$ sudo docker ps
Interactive mode
$ sudo docker ps -a
Latest container
$ sudo docker ps -l
9
37. List Containers
To list informations about the containers status we use the docker
ps.
Running containers
$ sudo docker ps
Interactive mode
$ sudo docker ps -a
Latest container
$ sudo docker ps -l
List quiet
$ sudo docker ps -q
9
38. Start and Stop Containers
Start a container
$ sudo docker start
mycontainer
10
39. Start and Stop Containers
Start a container
$ sudo docker start
mycontainer
Stop a container
$ sudo docker stop
mycontainer
10
40. Start and Stop Containers
Start a container
$ sudo docker start
mycontainer
Stop a container
$ sudo docker stop
mycontainer
Attach to a running container
$ sudo docker attach
mycontainer
10
41. Start and Stop Containers
Start a container
$ sudo docker start
mycontainer
Stop a container
$ sudo docker stop
mycontainer
Attach to a running container
$ sudo docker attach
mycontainer
Detach from a running
container
[ Ctrl + C ]
10
45. Building Process
• Docker can build images automatically by reading the
instructions from a Dockerfile.
• A Dockerfile is a text document that contains all the
commands a user could call on the command line to assemble
an image.
• Using docker build users can create an automated build that
executes several command-line instructions in succession.
12
47. Dockerfile
Dockerfile
And now an example of minimal Ubuntu Linux Docker image:
FROM ubuntu:16.04
MAINTAINER Alessandro Adamo "alessandro.adamo@gmail.com"
ENV REFRESHED_AT 2017-06-15
RUN apt-get update && apt-get dist-upgrade
13
50. Dockerfile Commands 1 / 3
Enviroment Variable
ENV <key> <value>
ENV <key> = <value>
Working Directory
WORKDIR ${foo}
Change User
USER username
14
51. Dockerfile Commands 1 / 3
Enviroment Variable
ENV <key> <value>
ENV <key> = <value>
Working Directory
WORKDIR ${foo}
Change User
USER username
Run a Command in new
Layer
RUN ["executable",
"param1", "param2"]
14
52. Dockerfile Commands 1 / 3
Enviroment Variable
ENV <key> <value>
ENV <key> = <value>
Working Directory
WORKDIR ${foo}
Change User
USER username
Run a Command in new
Layer
RUN ["executable",
"param1", "param2"]
Default for Container
CMD ["executable",
"param1","param2"]
14
53. Dockerfile Commands 1 / 3
Enviroment Variable
ENV <key> <value>
ENV <key> = <value>
Working Directory
WORKDIR ${foo}
Change User
USER username
Run a Command in new
Layer
RUN ["executable",
"param1", "param2"]
Default for Container
CMD ["executable",
"param1","param2"]
Metadata
LABEL version="1.0"
14