Your SlideShare is downloading. ×
0
Ferry - Share & Deploy Big
Data Applications with Docker
James Horey
• Writing a simple application with Bokeh
• Packaging our application with Docker
• Orchestrating our application with Fer...
Bokeh
U.S. Census
http://api.census.gov/data/2011/acs5?get=DP03_0062E&for=county:*&in=state:06
Median income All counties Califo...
Download some data
Let’s install Bokeh
$ pip install bokeh
>> Downloading/unpacking bokeh
>> SystemError: Cannot compile 'Python.h'. Perhaps ...
A simple application
$ python plot.py Kentucky
Louisville
Let’s share
#!/bin/bash
!
# Make sure we have ‘pip’ installed
apt-get install python-pip
!
# Install packages in right ord...
• Encapsulates applications in isolated containers
• Makes it easy and safe to distribute applications
• Easy to get start...
Our Dockerfile
Start from a
clean Precise
image
Install stuff
Add our files
Run this when
starting
$ docker build -t ferry/p...
Sharing made simple
$ docker pull ferry/pydata
$ docker run -p 8000:8000 -name p1 —d ferry/pydata
p1
Kernel
Hardware
Sharing made simple
$ docker pull ferry/pydata
$ docker run -p 8000:8000 -name p1 —d ferry/pydata
$ docker run -p 8001:800...
• Highly scalable and fault-tolerant
• Great for storing streaming data (sensors,
messages)
CREATE KEYSPACE census WITH RE...
Orchestration
Web DB
Web + DB
• Simple
• Full control
• More work for you
• Simpler Dockerfile
• More extensible
• How to o...
• Specify the containers that constitute your
application in YAML
• Support for Hadoop, Cassandra, GlusterFS, and
OpenMPI
...
Our Application
backend:
- storage:
personality: "cassandra"
instances: 1
connectors:
- personality: "ferry/pydata-cassand...
Easy to share (again)
$ ferry start cassandra.yml
sa-df8d0aa6
$ ferry ps
UUID Storage Compute Connectors Status Base Time
...
What’s it doing?
$ ferry start cassandra.yml
Web C* C*
root@client-se-a5350a8d:~# env | grep BACK
BACKEND_STORAGE_TYPE=cas...
What’s it doing?
$ ferry start yarn
Client
Y Y
root@client-se-b597cb21:~# env | grep BACK
BACKEND_STORAGE_TYPE=gluster
BAC...
What’s it doing?
$ ferry stop sa-c6cbb572
Client
Y Y
G G
Next steps
$ ferry share sa-df8d0aa6
w c* c*
Hardware
w c* c*
Hardware
w c* c*
Hardware
Next steps
$ ferry deploy sa-df8d0aa6
w c* c*
Hardware
w
c* c*
Hardware
Hardware Hardware
VPCEC2
S3
• Even simple applications can be complicated to
install and run
• Docker helps quite a bit with this
• Ferry helps build ...
Thank you!
!
James
jlh@opencore.io
!
Ferry
ferry.opencore.io
@open_core_io
Upcoming SlideShare
Loading in...5
×

Pydata2014

449

Published on

Slides from PyData SV 2014

Published in: Data & Analytics, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
449
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Pydata2014"

  1. 1. Ferry - Share & Deploy Big Data Applications with Docker James Horey
  2. 2. • Writing a simple application with Bokeh • Packaging our application with Docker • Orchestrating our application with Ferry Technical material can be found at: https://github.com/jhorey/pydata
  3. 3. Bokeh
  4. 4. U.S. Census http://api.census.gov/data/2011/acs5?get=DP03_0062E&for=county:*&in=state:06 Median income All counties California
  5. 5. Download some data
  6. 6. Let’s install Bokeh $ pip install bokeh >> Downloading/unpacking bokeh >> SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel. $ apt-get install python-dev & pip install bokeh >> "gcc: error trying to exec 'cc1plus': execvp: No such file or directory $ apt-get install g++ $ pip install bokeh RuntimeError: bokeh sample data directory does not exist, please execute bokeh.sampledata.download() $ python >>> import bokeh.sampledata
  7. 7. A simple application $ python plot.py Kentucky Louisville
  8. 8. Let’s share #!/bin/bash ! # Make sure we have ‘pip’ installed apt-get install python-pip ! # Install packages in right order apt-get —-yes install g++ python-dev pip install bokeh ! # Now download the data python geography.py data/ python population economic Kentucky data/ ! # Start the web server python webserver data/ • Your script didn’t work • Oh, I was supposed to run this as sudo? • Ok, it still didn’t work • I get this funny error • Oh yeah, I’m running Redhat • Ok I’m at my desk, just use my computer
  9. 9. • Encapsulates applications in isolated containers • Makes it easy and safe to distribute applications • Easy to get started
  10. 10. Our Dockerfile Start from a clean Precise image Install stuff Add our files Run this when starting $ docker build -t ferry/pydata . $ docker push ferry/pydata
  11. 11. Sharing made simple $ docker pull ferry/pydata $ docker run -p 8000:8000 -name p1 —d ferry/pydata p1 Kernel Hardware
  12. 12. Sharing made simple $ docker pull ferry/pydata $ docker run -p 8000:8000 -name p1 —d ferry/pydata $ docker run -p 8001:8000 -name p2 —d ferry/pydata $ docker run -p 8002:8000 -name p3 —d ferry/pydata p1 p2 p3 Kernel Hardware • Containers share basic kernel and H.W. capabilities • No virtualization • Containers are isolated • Access via port forwarding You can run these commands now!
  13. 13. • Highly scalable and fault-tolerant • Great for storing streaming data (sensors, messages) CREATE KEYSPACE census WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; ! USE census; ! CREATE TABLE acs_economic_data ( state_cd TEXT, state_name TEXT, county_cd TEXT, county_name TEXT, median INT, mean INT, capita INT, PRIMARY KEY(count_cd, state_cd) );
  14. 14. Orchestration Web DB Web + DB • Simple • Full control • More work for you • Simpler Dockerfile • More extensible • How to orchestrate?
  15. 15. • Specify the containers that constitute your application in YAML • Support for Hadoop, Cassandra, GlusterFS, and OpenMPI • It’s a little bit like pip for your Docker-based runtime environment Ferry http://ferry.opencore.io
  16. 16. Our Application backend: - storage: personality: "cassandra" instances: 1 connectors: - personality: "ferry/pydata-cassandra" ports: ["8000:8000"] # The cassandra-client base comes with the various drivers # pre-installed. FROM ferry/cassandra-client NAME ferry/pydata-cassandra ! # Place the start scripts in the events directories so they # are started when the connector is brought up. ADD ./scripts/startcas.sh /service/runscripts/start/ ADD ./scripts/restartcas.sh /service/runscripts/restart/ RUN chmod a+x /service/runscripts/start/startcas.sh RUN chmod a+x /service/runscripts/restart/restartcas.sh +
  17. 17. Easy to share (again) $ ferry start cassandra.yml sa-df8d0aa6 $ ferry ps UUID Storage Compute Connectors Status Base Time ---- ------- ------- ---------- ------ ---- ---- sa-df8d0aa6 se-54ed4e93 se-a5350a8d running cassandra.yml $ ferry ssh sa-df8d0aa6 root@client-se-a5350a8d:~# ps -eaf | grep python root 144 1 0 19:49 ? 00:00:00 python /home/ferry/ pydata/bokeh/webserver.py /home/ferry/pydata/data
  18. 18. What’s it doing? $ ferry start cassandra.yml Web C* C* root@client-se-a5350a8d:~# env | grep BACK BACKEND_STORAGE_TYPE=cassandra BACKEND_STORAGE_IP=10.1.0.12 Generate! Config
  19. 19. What’s it doing? $ ferry start yarn Client Y Y root@client-se-b597cb21:~# env | grep BACK BACKEND_STORAGE_TYPE=gluster BACKEND_STORAGE_IP=10.1.0.18 BACKEND_COMPUTE_TYPE=yarn BACKEND_COMPUTE_IP=10.1.0.15 G G
  20. 20. What’s it doing? $ ferry stop sa-c6cbb572 Client Y Y G G
  21. 21. Next steps $ ferry share sa-df8d0aa6 w c* c* Hardware w c* c* Hardware w c* c* Hardware
  22. 22. Next steps $ ferry deploy sa-df8d0aa6 w c* c* Hardware w c* c* Hardware Hardware Hardware VPCEC2 S3
  23. 23. • Even simple applications can be complicated to install and run • Docker helps quite a bit with this • Ferry helps build out big data applications
  24. 24. Thank you! ! James jlh@opencore.io ! Ferry ferry.opencore.io @open_core_io
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×