This document discusses using Docker and Ferry to share and deploy big data applications. It summarizes:
1) Packaging a simple Python/Bokeh application using Docker to make it easy to install and run.
2) Using Ferry to orchestrate the application across multiple containers for the web frontend, Cassandra database, and to specify the overall application configuration.
3) How Ferry allows easily sharing and deploying the application across different environments like local machines, cloud instances, and container orchestration platforms.
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014gethue
This talk describes how open source Hue was built in order to provide a better Hadoop User Experience. The underlying technical details of its architecture, the lessons learned and how it integrates with Impala, Search and Spark under the cover will be explained.
The presentation continues with real life analytics business use cases. It will show how data can be easily imported into the cluster and then queried interactively with SQL or through a visual search dashboard. All through your Web Browser or your own custom Web application!
This talk aims at organizations trying to put a friendly “face” on Hadoop and get productive. Anybody looking at being more effective with Hadoop will also learn best practices and how to quickly get ramped up on the main data scenarios. Hue can be integrated with existing Hadoop deployments with minimal changes/disturbances. We cover details on how Hue interacts with the ecosystem and leverages the existing authentication and security model of your company.
To sum-up, attendees of this talk will learn how Hadoop can be made more accessible and why Hue is the ideal gateway for using it more efficiently or being the starting point of your own Big Data Web application.
Why work with Ansible to deliver software in a secure and reliable way? Gain insight quickly, this deck shows the strenghts of the IT automation tool that does it all.
Bas Meijer is an Ansible Ambassador co-hosting the Ansible Benelux Meetup since 2014. He introduced the tool to major corporate clients for use in mission critical infrastructure provisioning, application construction, container orchestration, security operations, and more.
This is the story of a company that had 10s of customers and were facing severe scaling issues. They approached us. They had a good product predicting a few hundred customers within 6 months. VCs went to them. Infrastructure scaling was the only unknown; funding for software-defined data centers. We introduced Terraform for infrastructure creation, Chef for OS hardening, and then Packer for supporting AWS as well as VSphere. Then, after a few more weeks, when there was a need for faster response from the data center, we went into Serf to immediately trigger chef-clients and then to Consul for service monitoring.
Want to describe this journey.
Finally, we did the same exact thing in at a Fortune 500 customer to replace 15 year-old scripts. We will also cover sleek ways of dealing with provisioning in different Availability Zones across various AWS regions with Terraform.
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014gethue
This talk describes how open source Hue was built in order to provide a better Hadoop User Experience. The underlying technical details of its architecture, the lessons learned and how it integrates with Impala, Search and Spark under the cover will be explained.
The presentation continues with real life analytics business use cases. It will show how data can be easily imported into the cluster and then queried interactively with SQL or through a visual search dashboard. All through your Web Browser or your own custom Web application!
This talk aims at organizations trying to put a friendly “face” on Hadoop and get productive. Anybody looking at being more effective with Hadoop will also learn best practices and how to quickly get ramped up on the main data scenarios. Hue can be integrated with existing Hadoop deployments with minimal changes/disturbances. We cover details on how Hue interacts with the ecosystem and leverages the existing authentication and security model of your company.
To sum-up, attendees of this talk will learn how Hadoop can be made more accessible and why Hue is the ideal gateway for using it more efficiently or being the starting point of your own Big Data Web application.
Why work with Ansible to deliver software in a secure and reliable way? Gain insight quickly, this deck shows the strenghts of the IT automation tool that does it all.
Bas Meijer is an Ansible Ambassador co-hosting the Ansible Benelux Meetup since 2014. He introduced the tool to major corporate clients for use in mission critical infrastructure provisioning, application construction, container orchestration, security operations, and more.
This is the story of a company that had 10s of customers and were facing severe scaling issues. They approached us. They had a good product predicting a few hundred customers within 6 months. VCs went to them. Infrastructure scaling was the only unknown; funding for software-defined data centers. We introduced Terraform for infrastructure creation, Chef for OS hardening, and then Packer for supporting AWS as well as VSphere. Then, after a few more weeks, when there was a need for faster response from the data center, we went into Serf to immediately trigger chef-clients and then to Consul for service monitoring.
Want to describe this journey.
Finally, we did the same exact thing in at a Fortune 500 customer to replace 15 year-old scripts. We will also cover sleek ways of dealing with provisioning in different Availability Zones across various AWS regions with Terraform.
Hashicorp: Delivering the Tao of DevOpsRamit Surana
HashiCorp is an open-source software company based in San Francisco, California that solves development, operations, and security challenges in infrastructure so organizations can focus on business-critical tasks. HashiCorp provides a set of open source tools and commercial product offerings.
ContainerCon 2016: Finding (and Fixing!) Performance Anomalies in Large Scale...Victor Marmol
Borg provides a common runtime layer for Containers at Google. We try to guarantee a performance baseline for each class of tasks without looking into the task's runtime details or any metric from the application itself. This talk will cover the methodology we use to collect black-box performance monitoring information from Containers and presents case studies of interesting performance problems we detect and ways to mitigate them.
Alfresco Devcon 2019 - Lightning Talk - The Alfresco fat JAR experimentAxel Faust
This is the slide deck of my lightning talk at Alfresco Devcon 2019 in Edinburgh. The talk was held in a slot with 6 other presenters, and the recording should be available on YouTube sometime in February.
Best Practices of Infrastructure as Code with TerraformDevOps.com
When your organization is moving to cloud, the infrastructure layer transitions from running dedicated servers at limited scale to a dynamic environment, where you can easily adjust to growing demand by spinning up thousands of servers and scaling them down when not in use.
The future of DevOps is infrastructure as code. Infrastructure as code supports the growth of infrastructure and provisioning requests. It treats infrastructure as software: code that can be re-used, tested, automated and version controlled. HashiCorp Terraform adopts infrastructure as code throughout its tool to prevent configuration drift, manage immutable infrastructure and much more!
Join this webinar to learn why Infrastructure as Code is the answer to managing large scale, distributed systems and service-oriented architectures. We will cover key use cases, a demo of how to use Infrastructure as Code to provision your infrastructure and more:
Agenda:
Intro to Infrastructure as Code: Challenges & Use cases
Writing Infrastructure as Code with Terraform
Collaborating with Teams on Infrastructure
(WEB307) Scalable Site Management Using AWS OpsWorks | AWS re:Invent 2014Amazon Web Services
Migrating from a hosted environment to AWS is a good opportunity to streamline deployment and site operations. This session shows how FillZ used AWS OpsWorks with other tools to automate site operations and deliver a highly available site that is used by large numbers of customers. Through code and examples, this session shows you how to automate deployments across an entire fleet, configure a patching strategy, use common tools to create useful alarms and monitor system performance, and employ security best-practices in AWS.
Creating and Deploying Static Sites with HugoBrian Hogan
Most web sites don’t have data that changes, so why power them with a database and take the performance hit? In this talk we’ll explore static site generation using Hugo, an open-source static site generator. You’ll learn how to make a master layout for all pages, and how to use Markdown to create your content pages quickly.
Then we’ll explore how to deploy the site we made to production. We’ll automate the entire process. When you’re done, you’ll be able to build and deploy static web sites quickly with minimal tooling.
Dockerizing Windows Server Applications by Ender Barillas and Taylor BrownDocker, Inc.
A session covering the container workflow from the developers inner loop, CI/CD, to deployment in a container orchestration solution. We'll cover Visual Studio Code from a Mac, Visual Studio Code from Windows with Bash and Visual Studio as an in-container local development environment targeting both Windows and Linux Containers. We'll walk through CI, Validation and CD to the Azure Container Service running Docker Swarm as one example of how you can convert your existing config as code and VM deployments to the containerized workflows startups and early adopter enterprises are using today.
Cachopo - Scalable Stateful Services - Madrid Elixir MeetupAbel Muíño
This is an introduction to building our services in a different way, where state is moved out of the database and into the services (as opposed to mainstream stateless servers).
It also describes one particular proof-of-concept tool that Cabify built during its annual offsite.
At Rackspace, sysadmins have taken responsiblilty for what was a "developers problem" only a few years ago. What started as a way to solve an image build problem turned into a socially collaborative DevOps community. Come see what Chef started.
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
DevOps, continuous delivery and modern architectural trends can incredibly speed up the software development process. Big Data applications cannot be an exception and need to keep the same pace.
Hashicorp: Delivering the Tao of DevOpsRamit Surana
HashiCorp is an open-source software company based in San Francisco, California that solves development, operations, and security challenges in infrastructure so organizations can focus on business-critical tasks. HashiCorp provides a set of open source tools and commercial product offerings.
ContainerCon 2016: Finding (and Fixing!) Performance Anomalies in Large Scale...Victor Marmol
Borg provides a common runtime layer for Containers at Google. We try to guarantee a performance baseline for each class of tasks without looking into the task's runtime details or any metric from the application itself. This talk will cover the methodology we use to collect black-box performance monitoring information from Containers and presents case studies of interesting performance problems we detect and ways to mitigate them.
Alfresco Devcon 2019 - Lightning Talk - The Alfresco fat JAR experimentAxel Faust
This is the slide deck of my lightning talk at Alfresco Devcon 2019 in Edinburgh. The talk was held in a slot with 6 other presenters, and the recording should be available on YouTube sometime in February.
Best Practices of Infrastructure as Code with TerraformDevOps.com
When your organization is moving to cloud, the infrastructure layer transitions from running dedicated servers at limited scale to a dynamic environment, where you can easily adjust to growing demand by spinning up thousands of servers and scaling them down when not in use.
The future of DevOps is infrastructure as code. Infrastructure as code supports the growth of infrastructure and provisioning requests. It treats infrastructure as software: code that can be re-used, tested, automated and version controlled. HashiCorp Terraform adopts infrastructure as code throughout its tool to prevent configuration drift, manage immutable infrastructure and much more!
Join this webinar to learn why Infrastructure as Code is the answer to managing large scale, distributed systems and service-oriented architectures. We will cover key use cases, a demo of how to use Infrastructure as Code to provision your infrastructure and more:
Agenda:
Intro to Infrastructure as Code: Challenges & Use cases
Writing Infrastructure as Code with Terraform
Collaborating with Teams on Infrastructure
(WEB307) Scalable Site Management Using AWS OpsWorks | AWS re:Invent 2014Amazon Web Services
Migrating from a hosted environment to AWS is a good opportunity to streamline deployment and site operations. This session shows how FillZ used AWS OpsWorks with other tools to automate site operations and deliver a highly available site that is used by large numbers of customers. Through code and examples, this session shows you how to automate deployments across an entire fleet, configure a patching strategy, use common tools to create useful alarms and monitor system performance, and employ security best-practices in AWS.
Creating and Deploying Static Sites with HugoBrian Hogan
Most web sites don’t have data that changes, so why power them with a database and take the performance hit? In this talk we’ll explore static site generation using Hugo, an open-source static site generator. You’ll learn how to make a master layout for all pages, and how to use Markdown to create your content pages quickly.
Then we’ll explore how to deploy the site we made to production. We’ll automate the entire process. When you’re done, you’ll be able to build and deploy static web sites quickly with minimal tooling.
Dockerizing Windows Server Applications by Ender Barillas and Taylor BrownDocker, Inc.
A session covering the container workflow from the developers inner loop, CI/CD, to deployment in a container orchestration solution. We'll cover Visual Studio Code from a Mac, Visual Studio Code from Windows with Bash and Visual Studio as an in-container local development environment targeting both Windows and Linux Containers. We'll walk through CI, Validation and CD to the Azure Container Service running Docker Swarm as one example of how you can convert your existing config as code and VM deployments to the containerized workflows startups and early adopter enterprises are using today.
Cachopo - Scalable Stateful Services - Madrid Elixir MeetupAbel Muíño
This is an introduction to building our services in a different way, where state is moved out of the database and into the services (as opposed to mainstream stateless servers).
It also describes one particular proof-of-concept tool that Cabify built during its annual offsite.
At Rackspace, sysadmins have taken responsiblilty for what was a "developers problem" only a few years ago. What started as a way to solve an image build problem turned into a socially collaborative DevOps community. Come see what Chef started.
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
DevOps, continuous delivery and modern architectural trends can incredibly speed up the software development process. Big Data applications cannot be an exception and need to keep the same pace.
An overview on docker and container technology behind it. Lastly, we discuss few tools that might come handy when dealing with large number of containers management.
Short presentation about Docker and some usage scenarios for Web Developement, Operations and Continuous Delivery. This talk was held at the TYPO3 Camp Stuttgart in 2015.
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Docker, Inc.
Terraform is a tool for building and safely iterating on infrastructure, while Consul provides service discovery, monitoring and orchestration. In this talk we discuss using Terraform and Consul together to build a Docker-based Service Oriented Architecture at scale. We use Consul to provide the runtime control plane for the datacenter, and Terraform is used to modify the underlying infrastructure to allow for elastic scalability.
Docker and Cloud - Enables for DevOps - by ACA-ITStijn Wijndaele
DevOps is gericht op het tot stand brengen van een cultuur binnen organisaties waardoor het ontwikkelen, valideren en releasen van software sneller, meer betrouwbaar en frequenter kan verlopen. Om dit te realiseren staan het automatiseren van het 'software delivery process' en de bijhorende infrastructurele veranderingen centraal. Door de opkomst van 'Microservice Architecture' neemt het belang hiervan nog verder toe.
Sprekers: Stijn Van den Enden & Stijn Wijndaele (ACA IT-Solutions) DevOps is gericht op het tot stand brengen van een cultuur binnen organisaties waardoor het ontwikkelen, valideren en releasen van software sneller, meer betrouwbaar en frequenter kan verlopen. Om dit te realiseren staan het automatiseren van het 'software delivery process' en de bijhorende infrastructurele veranderingen centraal. Door de opkomst van 'Microservice Architecture' neemt het belang hiervan nog verder toe.
In deze avondconferentie werd, na een korte toelichting over DevOps, nagegaan wat Docker en de Cloud kunnen betekenen voor uw business, en hoe zij als enablers kunnen dienen voor het tot stand brengen van een DevOps-cultuur. Het container-landschap waarvan tools zoals Kubernetes, Docker Swarm, ...een belangrijk onderdeel vormen, wordt toegelicht en er wordt ingegaan op de wijze waarop deze tools aangewend kunnen worden om 'development' en 'operations' efficiënt te laten samenwerken.
Discover secrets of containers scalability and learn how to automate the process of resource allocation, load balancing and traffic distribution across multiple containers within one clustered environment.
Slides from Workshop 'Cloud Foundry: Hands-on Deployment Workshop'
http://www.meetup.com/CloudFoundry/events/150601282/
In this workshop you will learn Cloud Foundry fundamental concepts, setup, deployment and operations. We’ll cover a couple of alternatives to deploy CF in a local environment for learning and testing purposes as well as deploying Cloud Foundry atop IaaS production level environment, being able to manage hundreds of components and thousands of applications.
If you did not have a chance to work with Cloud Foundry, it may be useful to test its features locally at first. Deploying this environment on a local machine allows you to get hands-on experience in the solution and, in case you are a contributor, to test some features before you commit them to a production environment.
What is this Docker and Microservice thing that everyone is talking about? A primer to Docker and Microservice and how the two concepts complement each other.
In this presentation, we are going to take a look at the mechanism about how ceph storage is provided as persistence storage to docker container.
First of all, we are going to show the process how to deploy ceph storage using openstack kola and how we developed the docker volume plugin.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
1. Ferry - Share & Deploy Big
Data Applications with Docker
James Horey
2. • Writing a simple application with Bokeh
• Packaging our application with Docker
• Orchestrating our application with Ferry
Technical material can be found at:
https://github.com/jhorey/pydata
8. Let’s share
#!/bin/bash
!
# Make sure we have ‘pip’ installed
apt-get install python-pip
!
# Install packages in right order
apt-get —-yes install g++ python-dev
pip install bokeh
!
# Now download the data
python geography.py data/
python population economic Kentucky
data/
!
# Start the web server
python webserver data/
• Your script didn’t work
• Oh, I was supposed to run this as
sudo?
• Ok, it still didn’t work
• I get this funny error
• Oh yeah, I’m running Redhat
• Ok I’m at my desk, just use my
computer
9. • Encapsulates applications in isolated containers
• Makes it easy and safe to distribute applications
• Easy to get started
10. Our Dockerfile
Start from a
clean Precise
image
Install stuff
Add our files
Run this when
starting
$ docker build -t ferry/pydata .
$ docker push ferry/pydata
11. Sharing made simple
$ docker pull ferry/pydata
$ docker run -p 8000:8000 -name p1 —d ferry/pydata
p1
Kernel
Hardware
12. Sharing made simple
$ docker pull ferry/pydata
$ docker run -p 8000:8000 -name p1 —d ferry/pydata
$ docker run -p 8001:8000 -name p2 —d ferry/pydata
$ docker run -p 8002:8000 -name p3 —d ferry/pydata
p1 p2 p3
Kernel
Hardware
• Containers share basic kernel
and H.W. capabilities
• No virtualization
• Containers are isolated
• Access via port forwarding
You can run these commands now!
13. • Highly scalable and fault-tolerant
• Great for storing streaming data (sensors,
messages)
CREATE KEYSPACE census WITH REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1 };
!
USE census;
!
CREATE TABLE acs_economic_data (
state_cd TEXT,
state_name TEXT,
county_cd TEXT,
county_name TEXT,
median INT,
mean INT,
capita INT,
PRIMARY KEY(count_cd, state_cd)
);
14. Orchestration
Web DB
Web + DB
• Simple
• Full control
• More work for you
• Simpler Dockerfile
• More extensible
• How to orchestrate?
15. • Specify the containers that constitute your
application in YAML
• Support for Hadoop, Cassandra, GlusterFS, and
OpenMPI
• It’s a little bit like pip for your Docker-based
runtime environment
Ferry
http://ferry.opencore.io
16. Our Application
backend:
- storage:
personality: "cassandra"
instances: 1
connectors:
- personality: "ferry/pydata-cassandra"
ports: ["8000:8000"]
# The cassandra-client base comes with the various drivers
# pre-installed.
FROM ferry/cassandra-client
NAME ferry/pydata-cassandra
!
# Place the start scripts in the events directories so they
# are started when the connector is brought up.
ADD ./scripts/startcas.sh /service/runscripts/start/
ADD ./scripts/restartcas.sh /service/runscripts/restart/
RUN chmod a+x /service/runscripts/start/startcas.sh
RUN chmod a+x /service/runscripts/restart/restartcas.sh
+
18. What’s it doing?
$ ferry start cassandra.yml
Web C* C*
root@client-se-a5350a8d:~# env | grep BACK
BACKEND_STORAGE_TYPE=cassandra
BACKEND_STORAGE_IP=10.1.0.12
Generate!
Config
19. What’s it doing?
$ ferry start yarn
Client
Y Y
root@client-se-b597cb21:~# env | grep BACK
BACKEND_STORAGE_TYPE=gluster
BACKEND_STORAGE_IP=10.1.0.18
BACKEND_COMPUTE_TYPE=yarn
BACKEND_COMPUTE_IP=10.1.0.15
G G
21. Next steps
$ ferry share sa-df8d0aa6
w c* c*
Hardware
w c* c*
Hardware
w c* c*
Hardware
22. Next steps
$ ferry deploy sa-df8d0aa6
w c* c*
Hardware
w
c* c*
Hardware
Hardware Hardware
VPCEC2
S3
23. • Even simple applications can be complicated to
install and run
• Docker helps quite a bit with this
• Ferry helps build out big data applications