Leonard Austin (Ravelin) - DevOps in a Machine Learning World

DevOps in a
Machine Learning World
@leonardaustin

As machine learning moves from niche to
mainstream tech stacks how do DevOps engineers
prepare for a very different set of problems. A brief
look at the new issues that arise from machine
learning, an overview of cutting-edge "old school"
solutions and how to drag data science (kicking and
screaming) into a world of automation.

Leonard Austin
Cofounder at Ravelin
CTO, Software Engineer, DevOps, Recruiter...
@leonardaustin

Ravelin
Fraud Detection. Ravelin examines your visitor and
payment data in real time, telling your systems
which customers are fraudsters. We use Machine
Learning, Rule Engines, Graph Networks and
Industry Expertise to respond with scores in
milliseconds. Perfect for an on-demand world.
Raised $2m last year. Fintech. Hiring

$14B
Lost to fraud
Growing rapidly as fraudsters move online

One fraudster leads to lots of cost

Stack
Go + Python
AWS
MicroServices
Storage: Cassandra, Postgres, ElasticSearch, Redis, Graph Database X, ZooKeeper
Queue: NSQ, Kinesis
Instrumentation: InfluxDB, Grafana
Docker - but only for local dev

Doing Things The
Right Way
TerraForm
100% Automation
Horizontally Scalable
Continuous Integration
No need for SSH access
100% Visibility - Metrics & Logs

Servers & MicroServices
“Livestock, not pets. It gets sick, terminate it” - DevOps guy on the internet

Machine Learning
Challenges
> Data Warehousing
Resource on Demand
Deploy
Hardware Requirement
Life Cycle
(Explore, Train, Deploy)

Data Warehousing
What?
Why we need it for Ravelin
How much data

$10m
IBM, Oracle, Microsoft
v1

$1m
Massively Parallel Processing - MPP
IBM, Oracle, Microsoft, Teradata, Vertica, GreenPlum
v1.5

$200k
Hadoop MapReduce, Spark, Hive, Impala
v2

We ♡ BigQuery
Costs - $5 per terabyte, 5c per range query per terabyte
Managed - but no reserve compute resources needed!
Distributed columns easily append
Dataflow
Restriction:
Can’t Update
No Indexes

Probably need to mention AWS RedShift

Stack
Go + Python
AWS & Google Cloud Platform
MicroServices
DB: Cassandra, Postgres, ElasticSearch, Redis, Graph Databases, ZooKeeper
Queue: NSQ, Kinesis, Google Pub/Sub
Warehouse: BigQuery, DataFlow

Machine Learning
Challenges
Data Warehousing
> Resource on Demand
Deploy
Life Cycle

Work on the Cloud!
“Stephen’s laptop was measurably heavier because of the amount
of data he had on it. We asked him nicely to move everything to
the cloud and now the internet is a little heavier” - Science 2016

Data
“Single point of success”- Jose CTO Hailo 2014
AWS
32 Cores 244GB RAM
Google Cloud Platform
32 Cores 208GB RAM
Azure
16 Cores 112GB RAM

Machine Learning
Challenges
Data Warehousing
Resource on Demand
> Deploy
Life Cycle

Deploying Models
Train - sample
Pickle
S3
Deploy
Simple

Hardware - GPU’s
Specific for Deep Learning
AWS have a GPU machine but $$$
No virtualization
Buy and build your own server
Q. How Deep is your problem?
Speech, Video, Images

Summary
Data Warehousing
BigQuery
Dataflow
On Demand Resource
1 Machine (because clustering is expensive)
Big Machines on the Cloud
Persistent Volumes on Google Cloud Compute

Hiring Smart People
DevOps - Mid Level & Senior
Data Scientist - Junior & Mid Level
Software Engineer - Junior, Mid Level & Senior
Product Owner

Thanks
@leonardaustin
@ravelinhq
ravelin.com
leonard.austin@ravelin.com
Remember we are hiring

Leonard Austin (Ravelin) - DevOps in a Machine Learning World

More Related Content

What's hot

Similar to Leonard Austin (Ravelin) - DevOps in a Machine Learning World

More from Outlyer

Recently uploaded

Leonard Austin (Ravelin) - DevOps in a Machine Learning World

Editor's Notes