SlideShare a Scribd company logo
Ferry - Share & Deploy Big
Data Applications with Docker
James Horey
• Writing a simple application with Bokeh
• Packaging our application with Docker
• Orchestrating our application with Ferry
Technical material can be found at:
https://github.com/jhorey/pydata
Bokeh
U.S. Census
http://api.census.gov/data/2011/acs5?get=DP03_0062E&for=county:*&in=state:06
Median income All counties California
Download some data
Let’s install Bokeh
$ pip install bokeh
>> Downloading/unpacking bokeh
>> SystemError: Cannot compile 'Python.h'. Perhaps you need
to install python-dev|python-devel.
$ apt-get install python-dev & pip install bokeh
>> "gcc: error trying to exec 'cc1plus': execvp: No such file
or directory
$ apt-get install g++
$ pip install bokeh
RuntimeError: bokeh sample data directory does not exist, please
execute bokeh.sampledata.download()
$ python
>>> import bokeh.sampledata
A simple application
$ python plot.py Kentucky
Louisville
Let’s share
#!/bin/bash
!
# Make sure we have ‘pip’ installed
apt-get install python-pip
!
# Install packages in right order
apt-get —-yes install g++ python-dev
pip install bokeh
!
# Now download the data
python geography.py data/
python population economic Kentucky
data/
!
# Start the web server
python webserver data/
• Your script didn’t work
• Oh, I was supposed to run this as
sudo?
• Ok, it still didn’t work
• I get this funny error
• Oh yeah, I’m running Redhat
• Ok I’m at my desk, just use my
computer
• Encapsulates applications in isolated containers
• Makes it easy and safe to distribute applications
• Easy to get started
Our Dockerfile
Start from a
clean Precise
image
Install stuff
Add our files
Run this when
starting
$ docker build -t ferry/pydata .
$ docker push ferry/pydata
Sharing made simple
$ docker pull ferry/pydata
$ docker run -p 8000:8000 -name p1 —d ferry/pydata
p1
Kernel
Hardware
Sharing made simple
$ docker pull ferry/pydata
$ docker run -p 8000:8000 -name p1 —d ferry/pydata
$ docker run -p 8001:8000 -name p2 —d ferry/pydata
$ docker run -p 8002:8000 -name p3 —d ferry/pydata
p1 p2 p3
Kernel
Hardware
• Containers share basic kernel
and H.W. capabilities
• No virtualization
• Containers are isolated
• Access via port forwarding
You can run these commands now!
• Highly scalable and fault-tolerant
• Great for storing streaming data (sensors,
messages)
CREATE KEYSPACE census WITH REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1 };
!
USE census;
!
CREATE TABLE acs_economic_data (
state_cd TEXT,
state_name TEXT,
county_cd TEXT,
county_name TEXT,
median INT,
mean INT,
capita INT,
PRIMARY KEY(count_cd, state_cd)
);
Orchestration
Web DB
Web + DB
• Simple
• Full control
• More work for you
• Simpler Dockerfile
• More extensible
• How to orchestrate?
• Specify the containers that constitute your
application in YAML
• Support for Hadoop, Cassandra, GlusterFS, and
OpenMPI
• It’s a little bit like pip for your Docker-based
runtime environment
Ferry
http://ferry.opencore.io
Our Application
backend:
- storage:
personality: "cassandra"
instances: 1
connectors:
- personality: "ferry/pydata-cassandra"
ports: ["8000:8000"]
# The cassandra-client base comes with the various drivers
# pre-installed.
FROM ferry/cassandra-client
NAME ferry/pydata-cassandra
!
# Place the start scripts in the events directories so they
# are started when the connector is brought up.
ADD ./scripts/startcas.sh /service/runscripts/start/
ADD ./scripts/restartcas.sh /service/runscripts/restart/
RUN chmod a+x /service/runscripts/start/startcas.sh
RUN chmod a+x /service/runscripts/restart/restartcas.sh
+
Easy to share (again)
$ ferry start cassandra.yml
sa-df8d0aa6
$ ferry ps
UUID Storage Compute Connectors Status Base Time
---- ------- ------- ---------- ------ ---- ----
sa-df8d0aa6 se-54ed4e93 se-a5350a8d running cassandra.yml
$ ferry ssh sa-df8d0aa6
root@client-se-a5350a8d:~# ps -eaf | grep python
root 144 1 0 19:49 ? 00:00:00 python /home/ferry/
pydata/bokeh/webserver.py /home/ferry/pydata/data
What’s it doing?
$ ferry start cassandra.yml
Web C* C*
root@client-se-a5350a8d:~# env | grep BACK
BACKEND_STORAGE_TYPE=cassandra
BACKEND_STORAGE_IP=10.1.0.12
Generate!
Config
What’s it doing?
$ ferry start yarn
Client
Y Y
root@client-se-b597cb21:~# env | grep BACK
BACKEND_STORAGE_TYPE=gluster
BACKEND_STORAGE_IP=10.1.0.18
BACKEND_COMPUTE_TYPE=yarn
BACKEND_COMPUTE_IP=10.1.0.15
G G
What’s it doing?
$ ferry stop sa-c6cbb572
Client
Y Y
G G
Next steps
$ ferry share sa-df8d0aa6
w c* c*
Hardware
w c* c*
Hardware
w c* c*
Hardware
Next steps
$ ferry deploy sa-df8d0aa6
w c* c*
Hardware
w
c* c*
Hardware
Hardware Hardware
VPCEC2
S3
• Even simple applications can be complicated to
install and run
• Docker helps quite a bit with this
• Ferry helps build out big data applications
Thank you!
!
James
jlh@opencore.io
!
Ferry
ferry.opencore.io
@open_core_io

More Related Content

What's hot

Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabric
andymccurdy
 

What's hot (18)

YARN Services
YARN ServicesYARN Services
YARN Services
 
Fixing Growing Pains With Puppet Data Patterns
Fixing Growing Pains With Puppet Data PatternsFixing Growing Pains With Puppet Data Patterns
Fixing Growing Pains With Puppet Data Patterns
 
Mitchell Hashimoto, HashiCorp
Mitchell Hashimoto, HashiCorpMitchell Hashimoto, HashiCorp
Mitchell Hashimoto, HashiCorp
 
Hashicorp: Delivering the Tao of DevOps
Hashicorp: Delivering the Tao of DevOpsHashicorp: Delivering the Tao of DevOps
Hashicorp: Delivering the Tao of DevOps
 
ContainerCon 2016: Finding (and Fixing!) Performance Anomalies in Large Scale...
ContainerCon 2016: Finding (and Fixing!) Performance Anomalies in Large Scale...ContainerCon 2016: Finding (and Fixing!) Performance Anomalies in Large Scale...
ContainerCon 2016: Finding (and Fixing!) Performance Anomalies in Large Scale...
 
Phoenix for Rails Devs
Phoenix for Rails DevsPhoenix for Rails Devs
Phoenix for Rails Devs
 
Alfresco Devcon 2019 - Lightning Talk - The Alfresco fat JAR experiment
Alfresco Devcon 2019 - Lightning Talk - The Alfresco fat JAR experimentAlfresco Devcon 2019 - Lightning Talk - The Alfresco fat JAR experiment
Alfresco Devcon 2019 - Lightning Talk - The Alfresco fat JAR experiment
 
Best Practices of Infrastructure as Code with Terraform
Best Practices of Infrastructure as Code with TerraformBest Practices of Infrastructure as Code with Terraform
Best Practices of Infrastructure as Code with Terraform
 
Python Deployment with Fabric
Python Deployment with FabricPython Deployment with Fabric
Python Deployment with Fabric
 
(WEB307) Scalable Site Management Using AWS OpsWorks | AWS re:Invent 2014
(WEB307) Scalable Site Management Using AWS OpsWorks | AWS re:Invent 2014(WEB307) Scalable Site Management Using AWS OpsWorks | AWS re:Invent 2014
(WEB307) Scalable Site Management Using AWS OpsWorks | AWS re:Invent 2014
 
Build Automation 101
Build Automation 101Build Automation 101
Build Automation 101
 
Network automation (NetDevOps) with Ansible
Network automation (NetDevOps) with AnsibleNetwork automation (NetDevOps) with Ansible
Network automation (NetDevOps) with Ansible
 
Creating and Deploying Static Sites with Hugo
Creating and Deploying Static Sites with HugoCreating and Deploying Static Sites with Hugo
Creating and Deploying Static Sites with Hugo
 
Dockerizing Windows Server Applications by Ender Barillas and Taylor Brown
Dockerizing Windows Server Applications by Ender Barillas and Taylor BrownDockerizing Windows Server Applications by Ender Barillas and Taylor Brown
Dockerizing Windows Server Applications by Ender Barillas and Taylor Brown
 
Spark Summit EU talk by William Benton
Spark Summit EU talk by William BentonSpark Summit EU talk by William Benton
Spark Summit EU talk by William Benton
 
Cachopo - Scalable Stateful Services - Madrid Elixir Meetup
Cachopo - Scalable Stateful Services - Madrid Elixir MeetupCachopo - Scalable Stateful Services - Madrid Elixir Meetup
Cachopo - Scalable Stateful Services - Madrid Elixir Meetup
 
Bosh 2.0
Bosh 2.0Bosh 2.0
Bosh 2.0
 
Chef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK BoxChef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK Box
 

Similar to James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Midwest php 2013 deploying php on paas- why & how
Midwest php 2013   deploying php on paas- why & howMidwest php 2013   deploying php on paas- why & how
Midwest php 2013 deploying php on paas- why & how
dotCloud
 

Similar to James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker (20)

Midwest php 2013 deploying php on paas- why & how
Midwest php 2013   deploying php on paas- why & howMidwest php 2013   deploying php on paas- why & how
Midwest php 2013 deploying php on paas- why & how
 
Word press and containers
Word press and containersWord press and containers
Word press and containers
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
Docker Advanced registry usage
Docker Advanced registry usageDocker Advanced registry usage
Docker Advanced registry usage
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
 
Docker
DockerDocker
Docker
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
 
Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
Ceph, Docker, Heroku Slugs, CoreOS and Deis OverviewCeph, Docker, Heroku Slugs, CoreOS and Deis Overview
Ceph, Docker, Heroku Slugs, CoreOS and Deis Overview
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-IT
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPS
 
Ansible+docker (highload++2015)
Ansible+docker (highload++2015)Ansible+docker (highload++2015)
Ansible+docker (highload++2015)
 
Auto Scaling for Multi-Tier Containers Topology
Auto Scaling for Multi-Tier Containers TopologyAuto Scaling for Multi-Tier Containers Topology
Auto Scaling for Multi-Tier Containers Topology
 
Cloud Foundry: Hands-on Deployment Workshop
Cloud Foundry: Hands-on Deployment WorkshopCloud Foundry: Hands-on Deployment Workshop
Cloud Foundry: Hands-on Deployment Workshop
 
Containerdays Intro to Habitat
Containerdays Intro to HabitatContainerdays Intro to Habitat
Containerdays Intro to Habitat
 
Docker and Microservice
Docker and MicroserviceDocker and Microservice
Docker and Microservice
 
On demand-block-storage-for-docker
On demand-block-storage-for-dockerOn demand-block-storage-for-docker
On demand-block-storage-for-docker
 
Docker Introduction
Docker IntroductionDocker Introduction
Docker Introduction
 
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, Orchestration
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, OrchestrationThe Docker "Gauntlet" - Introduction, Ecosystem, Deployment, Orchestration
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, Orchestration
 

More from PyData

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Recently uploaded

Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
zahraomer517
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 

Recently uploaded (20)

Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 

James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

  • 1. Ferry - Share & Deploy Big Data Applications with Docker James Horey
  • 2. • Writing a simple application with Bokeh • Packaging our application with Docker • Orchestrating our application with Ferry Technical material can be found at: https://github.com/jhorey/pydata
  • 6. Let’s install Bokeh $ pip install bokeh >> Downloading/unpacking bokeh >> SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel. $ apt-get install python-dev & pip install bokeh >> "gcc: error trying to exec 'cc1plus': execvp: No such file or directory $ apt-get install g++ $ pip install bokeh RuntimeError: bokeh sample data directory does not exist, please execute bokeh.sampledata.download() $ python >>> import bokeh.sampledata
  • 7. A simple application $ python plot.py Kentucky Louisville
  • 8. Let’s share #!/bin/bash ! # Make sure we have ‘pip’ installed apt-get install python-pip ! # Install packages in right order apt-get —-yes install g++ python-dev pip install bokeh ! # Now download the data python geography.py data/ python population economic Kentucky data/ ! # Start the web server python webserver data/ • Your script didn’t work • Oh, I was supposed to run this as sudo? • Ok, it still didn’t work • I get this funny error • Oh yeah, I’m running Redhat • Ok I’m at my desk, just use my computer
  • 9. • Encapsulates applications in isolated containers • Makes it easy and safe to distribute applications • Easy to get started
  • 10. Our Dockerfile Start from a clean Precise image Install stuff Add our files Run this when starting $ docker build -t ferry/pydata . $ docker push ferry/pydata
  • 11. Sharing made simple $ docker pull ferry/pydata $ docker run -p 8000:8000 -name p1 —d ferry/pydata p1 Kernel Hardware
  • 12. Sharing made simple $ docker pull ferry/pydata $ docker run -p 8000:8000 -name p1 —d ferry/pydata $ docker run -p 8001:8000 -name p2 —d ferry/pydata $ docker run -p 8002:8000 -name p3 —d ferry/pydata p1 p2 p3 Kernel Hardware • Containers share basic kernel and H.W. capabilities • No virtualization • Containers are isolated • Access via port forwarding You can run these commands now!
  • 13. • Highly scalable and fault-tolerant • Great for storing streaming data (sensors, messages) CREATE KEYSPACE census WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; ! USE census; ! CREATE TABLE acs_economic_data ( state_cd TEXT, state_name TEXT, county_cd TEXT, county_name TEXT, median INT, mean INT, capita INT, PRIMARY KEY(count_cd, state_cd) );
  • 14. Orchestration Web DB Web + DB • Simple • Full control • More work for you • Simpler Dockerfile • More extensible • How to orchestrate?
  • 15. • Specify the containers that constitute your application in YAML • Support for Hadoop, Cassandra, GlusterFS, and OpenMPI • It’s a little bit like pip for your Docker-based runtime environment Ferry http://ferry.opencore.io
  • 16. Our Application backend: - storage: personality: "cassandra" instances: 1 connectors: - personality: "ferry/pydata-cassandra" ports: ["8000:8000"] # The cassandra-client base comes with the various drivers # pre-installed. FROM ferry/cassandra-client NAME ferry/pydata-cassandra ! # Place the start scripts in the events directories so they # are started when the connector is brought up. ADD ./scripts/startcas.sh /service/runscripts/start/ ADD ./scripts/restartcas.sh /service/runscripts/restart/ RUN chmod a+x /service/runscripts/start/startcas.sh RUN chmod a+x /service/runscripts/restart/restartcas.sh +
  • 17. Easy to share (again) $ ferry start cassandra.yml sa-df8d0aa6 $ ferry ps UUID Storage Compute Connectors Status Base Time ---- ------- ------- ---------- ------ ---- ---- sa-df8d0aa6 se-54ed4e93 se-a5350a8d running cassandra.yml $ ferry ssh sa-df8d0aa6 root@client-se-a5350a8d:~# ps -eaf | grep python root 144 1 0 19:49 ? 00:00:00 python /home/ferry/ pydata/bokeh/webserver.py /home/ferry/pydata/data
  • 18. What’s it doing? $ ferry start cassandra.yml Web C* C* root@client-se-a5350a8d:~# env | grep BACK BACKEND_STORAGE_TYPE=cassandra BACKEND_STORAGE_IP=10.1.0.12 Generate! Config
  • 19. What’s it doing? $ ferry start yarn Client Y Y root@client-se-b597cb21:~# env | grep BACK BACKEND_STORAGE_TYPE=gluster BACKEND_STORAGE_IP=10.1.0.18 BACKEND_COMPUTE_TYPE=yarn BACKEND_COMPUTE_IP=10.1.0.15 G G
  • 20. What’s it doing? $ ferry stop sa-c6cbb572 Client Y Y G G
  • 21. Next steps $ ferry share sa-df8d0aa6 w c* c* Hardware w c* c* Hardware w c* c* Hardware
  • 22. Next steps $ ferry deploy sa-df8d0aa6 w c* c* Hardware w c* c* Hardware Hardware Hardware VPCEC2 S3
  • 23. • Even simple applications can be complicated to install and run • Docker helps quite a bit with this • Ferry helps build out big data applications