A talk I gave at the recent Advanced AWS Meeup - this is a detailed guide to how I installed and set up Spinnaker to work with our infrastructure at Stitch Fix. I go over the various problems I ran into and how I solved them. I hope this can be useful for others setting up, or interested in setting up Spinnaker for their purposes.
**Big thanks to Armory for recording the talks! Video for this talk can be found here: https://youtu.be/ywzPblFpIE0 (I'm the second speaker)**
5. 100% of Infrastructure
on AWS
3 Peered VPCs
Isolate environments into different VPCs:
● TEST
○ testing deployments before
pushing to prod
● PROD
○ all production deployments
● INFRA
○ tools that both prod and test
need to use
prod test
infra
jenkins
artifactory
spinnaker
flotilla
7. Process Overview
create ELB
create Route53
create spec
bake AMI
launch ASG
build RPM
Repeatable
Deployment Process
Definition of Application
make changes to code
To create an application, this
would be the one time setup
app “scaffolding” on aws;
route53 points to ELB
rpm built from this recipe
Iterative process for deploying new versions
attach to ELB
8. Step 1:
Build RPM from Spec
Wrote up simple tools to create the RPM:
● Create spec file from template
● Customize spec file
● Jenkins job to build RPM
The process appears complex:
● The spec file seems scary for user
● But it makes deployment easy down the
line!
Name: sf-helloworld
Version: 0.0.1
Release: 1
Summary: YOUR SUMMARY HERE!
Group: Development/Libraries
License: stitchfix-internal
BuildArch: noarch
AutoReqProv: no
BuildRequires:
Requires: sf-base, sf-aa, sf-nginx
%install
mkdir -p $RPM_BUILD_ROOT{/stitchfix,/etc/init.d}
cp -R %{_sourcedir} $RPM_BUILD_ROOT/stitchfix/%{base_name}
cp %{_topdir}/SCRIPTS/sf-%{base_name}
$RPM_BUILD_ROOT/etc/init.d/sf-%{base_name}
%files
/stitchfix/%{base_name}
/etc/init.d/sf-%{base_name}
%post
ln -s /etc/nginx/sites-available/sf-app.conf
/etc/nginx/sites-enabled/sf-app.conf
/usr/bin/pip-2.7 install -e /stitchfix/%{base_name}
chkconfig --add %{name}
chkconfig --levels 345 %{name} on
sf-helloworld.spec
9. Step 2: Bake AMI
● Used aminator (also from Netflix) to create
AMIs
● Jenkins job for baking
How does AMI get baked?
1. Create volume from base AMI id
2. Attach and mount volume
3. Chroot into volume
4. Install RPM on volume
5. Create snapshot from volume
6. Register AMI from snapshot
EC2 Instance
(Baking Machine)
Artifactory
(RPM repo)
RPM
Volume
get RPM from repo
installRPM
10. Step 3: Deploy
ELB
ASG
Route53
EC2 EC2 EC2
Launch Config
AMIRPM
is baked into
both used to create
internet traffic
immutableserver
routes traffic
11. Why Spinnaker?
80 Data Scientists
10 Platform Engineers
Our data scientists are
responsible for:
● Building ETLs
● Deploying Dashboards
and Services
We value self service!
13. Key
Differences
from the
Netflix Setup
1. Amazon Linux instead of Ubuntu
a. Adding RPM support to Gradle
b. System V instead of Upstart
2. Nginx instead of Apache
3. Secured Redis on AWS
4. No Cassandra in Existing
Architecture
And how to handle them
14. Diff #1
You drew the short straw with
Amazon Linux (Red Hat) instead
of Ubuntu
15. Adding RPM Support to
Gradle
Create the buildRpm block:
● add our rpm repo in /etc/yum.repos.d
on bake machine
● add dependency rpms inside the block
● make sure to build all the other spinnaker
rpms and push to your rpm repo
./gradlew buildRpm
// Ubuntu
buildDeb {
requires('redis-server', '3.0.5', GREATER | EQUAL)
requires('spinnaker-clouddriver')
requires('spinnaker-deck')
requires('spinnaker-echo')
requires('spinnaker-front50')
requires('spinnaker-gate')
requires('spinnaker-igor')
requires('spinnaker-orca')
requires('spinnaker-rosco')
requires('spinnaker-rush')
requires('apache2')
}
// Centos
buildRpm {
requires('sf-nginx')
requires('sf-base')
requires('spinnaker-clouddriver')
requires('spinnaker-deck')
requires('spinnaker-echo')
requires('spinnaker-front50')
requires('spinnaker-gate')
requires('spinnaker-igor')
requires('spinnaker-orca')
requires('spinnaker-rosco')
requires('spinnaker-rush')
os = LINUX # ⇐ YOU NEED THIS MAGIC LINE!
}
[spinnaker] build.gradle
16. Upstart on
Amazon Linux
Different startup systems:
● We use System V (ancient)
○ service nginx start
○ startup scripts in /etc/init.d
○ chkconfig for starting on bootup
● Spinnaker uses upstart
○ initctl start spinnaker
○ conf files in /etc/init
Another Issue:
● 0.6.5 version of upstart on Amazon Linux which
is way older than 1.4 on Ubuntu
description "rosco"
start on filesystem or runlevel [2345]
# not supported in old version
# so for amazon linux we remove these lines:
setuid spinnaker
setgid spinnaker
expect fork
stop on stopping spinnaker
env HOME=/home/spinnaker exec /opt/rosco/bin/rosco 2>&1
> /var/log/spinnaker/rosco/rosco.log &
[rosco] /etc/init/rosco.conf
18. Namespace Gate and
Rosco in Nginx
● include /etc/nginx/sites-enabled in main nginx conf
● on deploy, symlink
/etc/nginx/sites-available/spinnaker.conf =>
/etc/nginx/sites-enabled/spinnaker.conf
[spinnaker]
/etc/nginx/sites-available/spinnaker.conf
# all services on the same machine
server {
listen 80;
location / {
root /opt/deck/html;
}
# namespacing gate
location ~* ^/gate/ {
rewrite ^/gate/(.*) /$1 break;
proxy_pass http://localhost:8084;
}
# namespacing rosco
location ~* ^/rosco/ {
rewrite ^/rosco/(.*) /$1 break;
proxy_pass http://localhost:8087;
}
}
ELB
HTTP 80 ⇒ HTTP 80
nginx 80
/ => /opt/deck/html
/gate/health => localhost:8084/health
/rosco/health => localhost:8087/health
EC2
spinnaker.<internal-domain>.com
19. Diff #3
You happily use AWS
Elasticache for Redis, but find
out Spinnaker angers it
20. AWS Elasticache is
Special
AWS Redis won’t let you issue CONFIG
commands!
● Redis version has to be >= 2.8.0
● On AWS elasticache console, add
notify-keyspace-events=Egx
to a new parameter group
○ this enables redis keyspace
events for generic commands
and expired events
● In gate.yml, add
redis.configuration.secure=true
server:
port: ${services.gate.port:8084}
address: ${services.gate.host:localhost}
...
redis:
connection: ${services.redis.connection}
# add the following two lines if using aws redis
configuration:
secure: true
[spinnaker] /config/gate.yml
AWS
Redis 2.8.0
spinnaker
parameter
group
notify-keyspace-events=Egx
22. Quick EBS Backed
Cassandra Node
Don’t want an entire cluster - want fast setup, so
create single-node Cassandra:
● EBS backed store for cassandra data
● Startup script remaps route53 entry on each
deployment
○ Point straight to EC2, not ELB
On redeploy or termination:
● EBS detaches, so data is not lost
● cassandra.<internal-domain>.com mapped
to new EC2
Cassandra
cassandra.<internal-domain>.com
EBS
/cassandra-storage
# change all store dirs to EBS
data_file_directories:
- /cassandra-storage/data
commitlog_directory: /cassandra-storage/commitlog
saved_caches_directory: /cassandra-storage/saved_caches
# point all to private route53 entry
seed_provider:
parameters:
- seeds: cassandra.<internal-domain>.com
listen_address: cassandra.<internal-domain>.com
rpc_address: cassandra.<internal-domain>.com
/etc/cassandra/conf/cassandra.yaml
25. SSL + Auth
on Spinnaker
● Where to Terminate SSL?
● Glory and the Beast of Self Signed
Certs
● Google OAuth2.0 Redirects Mess
up Nginx Rewrites
● Tomcat Ignores Client Certs for
Client Auth
Get ready to read a lot of stack
traces
27. Nginx to Terminate
SSL for Deck, Rosco
● Configure nginx with cert and key and turn ssl on
● Nginx now cannot start on bootup - needs
password?
○ Add password to a file, add to nginx
● Now our healthcheck is messed up
○ Add 5000 port for easy ELB healthcheck
● Optional 80 => 443 redirect
● Notice how gate rewrite is gone…
○ has to do with oauth redirects
server {
listen 5000;
location / {
add_header Content-Type text/plain;
return 200 'POOOOOOOOP';
}
}
# optional redirect here
server {
listen 80;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
ssl_password_file /etc/keys/spinnaker.pass;
ssl_certificate /opt/spinnaker/ssl/server.crt;
ssl_certificate_key /opt/spinnaker/ssl/server.key;
location / {
root /opt/deck/html;
}
location ~* ^/rosco/ {
rewrite ^/rosco/(.*) /$1 break;
proxy_pass http://localhost:8087;
}
}
[spinnaker]
/etc/nginx/sites-available/spinnaker.conf
28. For Gate, Pass Through SSL
Directly to Server
We want ELB to just pass traffic through to gate
without decrypting:
● Bypass nginx for gate: ports 8084 ⇒ 8084 for
gate SSL
Gate is responsible for all types of authentication:
● Have client certificate?
○ Authenticate client certificate - this is
why gate needs to terminate SSL
● No client certificate?
○ Send to google oauth
ELB
HTTP 80 ⇒ HTTP 80
TCP 443 ⇒ TCP 443
TCP 8084 ⇒ TCP 8084
EC2
spinnaker.<internal-domain>.com
gate
8084
nginx
443
80 ⇒ 443
29. SSL: Dilemma #2
Self signed certs? Meet your
new best friends, the Java
TrustStores
30. Tomcat Needs CA to Be in
Trust Store
Because we are using self-signed certs, it’s
important to have our self created CA in the
truststore:
● Add spinnaker cert to java keystore using
keytool utility
● Add keystore/truststore file location to
gate-local.yml config
server:
ssl:
enabled: true
keyStore: /opt/spinnaker/ssl/keystore.jks
keyStorePassword: poop
keyAlias: server
trustStore: /opt/spinnaker/ssl/keystore.jks
trustStorePassword: poop
/opt/spinnaker/conf/gate-local.yml
But at some point I still had problems, so here’s a
quick hack - add your CA to default java CA file:
$JAVA_HOME/jre/lib/security/cacerts
32. Remove Namespacing
for Gate & Bypass Nginx
● Set redirect_uri to our gate
address:
https://spinnaker.<internal-
domain>.com:8084/login
● Gate can no longer be namespaced
because on redirect, /gate in the path
gets lost as only $host recorded
Spinnaker
(gate)
Google
Auth
Server
Web Browser
(deck javascript)
https://spinnaker.<internal-domain>.com:8084/login
User authorization request
User authorizes application
Auth code grant
Access token request
Access token grant
34. Make Tomcat Request Client
Cert for Client Auth
We need to enable scripts to post tasks to spinnaker with
client authentication:
● Create certs for client
● Configure gate tomcat to validate client cert
Spinnaker Gate
spinnaker.<internal-domain>.com:8084
Beakhead
(Spinnaker Client)
x509:
enabled: true
subjectPrincipalRegex: CN=(.*?)
server:
ssl:
clientAuth: want
enabled: true
keyStore: /opt/spinnaker/ssl/keystore.jks
keyStorePassword: poop
keyAlias: server
trustStore: /opt/spinnaker/ssl/keystore.jks
trustStorePassword: poop
/opt/spinnaker/conf/gate-local.yml
POST /tasks
Include client cert
in request
● Layer based authentication
on gate
● Tomcat validates cert: has to
recognize cert authority
from truststore
● Returns response if
authenticated
37. I learned a lot about SSL, OAuth
2.0 and Client Authentication.
Like a lot.
38. Thanks for Listening!
We are very much looking forward to having
Spinnaker in production.
Find me on spinnaker slack
@dtkachenko
All pictures used in this presentation credit to Allie Brosh
hyperboleandahalf.blogspot.com