Intro - End to end ML with Kubeflow @ SignalConf 2018

End to End ML
With Kubeflow
& friends
@holdenkarau
Signal
2018
Legit-enough

Some links (slides & recordings will be at):
http://bit.ly/2QgsqF9
^ Slides & code-lab links
(after)
CatLoversShow

Holden:
● Prefered pronouns are she/her
● Developer Advocate at Google
● Apache Spark PMC/Committer, contribute to many other projects
● previously IBM, Alpine, Databricks, Google, Foursquare & Amazon
● co-author of Learning Spark & High Performance Spark
● Twitter: @holdenkarau
● Slide share http://www.slideshare.net/hkarau
● Code review livestreams: https://www.twitch.tv/holdenkarau /
https://www.youtube.com/user/holdenkarau
● Spark Talk Videos http://bit.ly/holdenSparkVideos

Who do I think you all are?
● Nice people*
● Interested in Machine Learning
● Possibly Familiar with one of Java, Scala, or Python
Amanda

What is in store for our adventure?
● We have 30 minutes :)
● Brief intros to what Kubernetes & Spark, and Kubeflow are
● How to train a model (ish)
● How to serve a model (ish)
● Scaling (ish)
● Updating models and other scary thoughts
Ada Doglace

● General purpose distributed system
○ With a really nice API including Python :)
● Apache project
● Faster than Hadoop Map/Reduce
● Good when too big for a single
machine
● Built on top of two abstractions for
distributed data: RDDs & Datasets
● Has ML Libraries
● WIP Kubeflow integration PR 1467
What is Spark?

The different pieces of Spark
Apache Spark
SQL, DataFrames & Datasets
Structured
Streaming
Scala,
Java,
Python, &
R
Spark ML
bagel &
Graph X
MLLib
Scala,
Java,
PythonStreaming
Graph
Frames
Paul Hudson

Train Model on ML-Rig
Photo by Tomomi

Problem:
Models are Cool,
Feature prep is Hard
Training is Tedious,
Everyone Forgot Deployment

What is Kubeflow?
“Data Scientists”
Model Serving On Kube
Model Training
*

What is Kubeflow?
“Kubeflow is a Cloud Native platform for machine learning based on
Google’s internal machine learning pipelines.”
or:
● The recognition that just a bunch of model weights isn’t enough
● Designed to support the ecosystem of tools needed (from data
prep to serving)
● Open source project :)
Ada Doglace

Really just want to replace this:
Photo by: Milestoned

What’s Next?!
Step away from keyboard
Think about type(s) of model
Look at components directory and see what’s a fit tool wise
Don’t know? Choose jupyter deal with the details live
Can’t find it?

Containers Buffet
argo
automation
chainer-job
core
credentials-pod-preset
katib
mpi-job
mxnet-job
openmpi
pachyderm
pytorch-job
seldon
tf-serving
weaveflux

What about just the basics?*
./scripts/kfctl.sh init ${KFAPP} --platform gcp --project ${PROJECT}
cd ${KFAPP}
../scripts/kfctl.sh generate platform
../scripts/kfctl.sh apply platform
../scripts/kfctl.sh generate k8s
../scripts/kfctl.sh apply k8s

What about just tensorflow?*
ks registry add kubeflow
github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflow
ks pkg install kubeflow/core@${VERSION}
ks pkg install kubeflow/tf-serving@${VERSION}
ks pkg install kubeflow/tf-job@${VERSION}

Ok well I need to be able to access Jupyter
too...
kubectl port-forward -n ${NAMESPACE} `kubectl get pods -n
${NAMESPACE} --selector=service=ambassador -o
jsonpath='{.items[0].metadata.name}'` 8080:80

Your Special ML Training Goes here
Don’t have any pressing projects but still want to have fun? Check
out Michelle’s notebook for Github Issue summarization.
Or want to see mnist again? here :)

...
from keras.callbacks import CSVLogger, ModelCheckpoint
script_name_base = 'tutorial_seq2seq'
csv_logger =
CSVLogger('{:}.log'.format(script_name_base))
model_checkpoint =
ModelCheckpoint('{:}.epoch{{epoch:02d}}-val{{val_loss:
.5f}}.hdf5'.format(script_name_base),
save_best_only=True)

history = seq2seq_Model.fit([encoder_input_data,
decoder_input_data],
np.expand_dims(decoder_target_data, -1),
batch_size=batch_size,
epochs=epochs,
validation_split=0.12,
callbacks=[csv_logger, model_checkpoint])
Really just check out Michelle’s notebook for Github Issue
summarization.

But what about [special foo-baz-inator] or
[special-yak-shaving-tool]?
Write a Dockerfile and build an image, use FROM so you’re not
starting from scratch.
FROM gcr.io/kubeflow-images-public/tensorflow-1.6.0-notebook-cpu
RUN pip install py-special-yak-shaving-tool
Then tell set it as a param for your training/serving job as needed:
ks param set tfjob-v1alpha2 image "my-special-image-goes-here”

What about that magical feature prep?
For now it’s a mostly write-by-hand situation
However TFX has some cool tools we can use today (like
TF.Transform) if we’re ok with DirectRunner or Dataflow (with Flink
support in the works indirectly)

Enter: TF.Transform
● For pre-processing of your data
● e.g. where you spend 90% of your dev time anyways
● Integrates into serving time :D
● OSS
● Runs on top of Apache Beam, but current release not yet
scalable outside of GCP
● On Apache Beam master this can run-ish on Flink, but rough
● Please don’t use this in production today unless your on
GCP/Dataflow
PROKathryn Yengel

Defining a Transform processing function
def preprocessing_fn(inputs):
x = inputs['x']
y = inputs['y']
s = inputs['s']
x_centered = x - tft.mean(x)
y_normalized = tft.scale_to_0_1(y)
s_int = tft.string_to_int(s)
return { 'x_centered': x_centered,
'y_normalized': y_normalized, 's_int': s_int}

mean stddev
normalize
multiply
quantiles
bucketize
Analyzers
Reduce (full pass)
Implemented as a distributed
data pipeline
Transforms
Instance-to-instance (don’t
change batch dimension)
Pure TensorFlow

Analyze
normalize
multiply
bucketize
constant
tensors
data
mean stddev
normalize
multiply
quantiles
bucketize

Scale to ... Bag of Words / N-Grams
Bucketization Feature Crosses
tft.ngrams
tft.string_to_int
tf.string_split
tft.scale_to_z_score
tft.apply_buckets
tft.quantiles
tft.string_to_int
tf.string_join
...
Some common use-cases...

BEAM Beyond the JVM: Current release
● Non JVM BEAM doesn’t work outside of Google’s environment yet
● tl;dr : uses grpc / protobuf
○ Similar to the common design but with more efficient representations (often)
● But exciting new plans to unify the runners and ease the support of different
languages (called SDKS)
○ See https://beam.apache.org/contribute/portability/
● If this is exciting, you can come join me on making BEAM work in Python3
○ Yes we still don’t have that :(
○ But we're getting closer & you can come join us on BEAM-2874 :D
Emma

Serving: TF is probably easiest for now...
MODEL_COMPONENT=my-model-server
MODEL_NAME=cat-finder-3k
ks generate tf-serving ${MODEL_COMPONENT}
--name=${MODEL_NAME}
ks param set ${MODEL_COMPONENT} deployHttpProxy true
ks param set ${MODEL_COMPONENT} modelPath
${MODEL_PATH}
ks apply ${KF_ENV} -c ${MODEL_COMPONENT}

Or use Seldon Core & friends*
Seldon Core is an OSS platform for deploying ML models on
Kubernetes supported by Kubeflow.
Supports Many Model types/formats:
● Tensorflow
● Sklearn
● Spark ML**
● R
● H20

Set up seldon core for serving
# Gives cluster-admin role to the default service account
kubectl create clusterrolebinding seldon-admin
--clusterrole=cluster-admin
--serviceaccount=${NAMESPACE}:default
# Install the kubeflow/seldon package
ks pkg install kubeflow/seldon
# Generate the seldon component and deploy it
ks generate seldon seldon --name=seldon

Build an image with your model*
docker run -v $(pwd):/my_model
seldonio/core-python-wrapper:0.7 /my_model
IssueSummarization 0.1 gcr.io --base-image=python:3.6
--image-name=gcr-repository-name/my-image-name

And kick off the new model:
ks generate seldon-serve-simple new-serving-magic
--name=model-name
--image=gcr.io/gcr-repository-name/model:version
--namespace=${NAMESPACE}
--replicas=2
ks apply ${KF_ENV} -c new-serving-magic

Wait so how do I use this?
Your favourite rest library goes here*
Timeouts matter!
Doing recommendations? Have fall-backs
Have multiple models? fall-backs
*Need to use in batch? Maybe skip seldon, tf-serving &
friends and integrate the library into your code. Or
not.
Trish Hamme

Scaling - or ruh roh people are using this!
replicas: 1
Becomes
replicas: 10
Factor of 10 =~ “science”

Wait really?
● Early: switch from mini-kube to ${cloud provider} with GPUs
○ “Vertical” scaling
● Next: increase # of workers for training
○ “Horizontal” scaling
○ Auto-scaling also WIP per-backend for the most part
● Serving, # of replicas
○ Auto-scaling is a WIP -
https://github.com/kubeflow/kubeflow/issues/1219
PROJennifer C.

What about validation?
TensorFlow Data Validation (TFDV)
Or Roll your own?
● Counters & execution time most common
● Please also check % of data change
Spark-validator (proof of concept)
Please validate your pipelines, and not just for data code changes too.

Previously live demos recorded
● Kubeflow intro
https://codelabs.developers.google.com/codelabs/kubeflow-intr
oduction/index.html & streamed http://bit.ly/kfIntroStream
● Kubeflow E2E with Github issue
summurizationhttps://codelabs.developers.google.com/codelab
s/cloud-kubeflow-e2e-gis/ & streamed http://bit.ly/kfGHStream
● You can tell they were live streamed by how poorly went, I
promise no video editing has occurred.
● You can do these yourself too (including one of them at our
booth)!

Join me & Boo @ Google’s booth @ 5PM
And join my-coworker Casey West @ 6talking about:
Building Captain Obvious:
Understand Faster with Machine Learning APIs

Want to watch working on a Kubeflow PR?
● Join Holden Friday @ 2pm pacific for live coding continuing
working on her Apache Spark to Kubeflow (using the existing
Spark operator as a base)
https://www.youtube.com/watch?v=zHnTdqbjPik
● Or just https://youtube.com/user/holdenkarau & like +
subscribe + click the bell :p

k thnx bye :)
Give feedback on this presentation
http://bit.ly/holdenTalkFeedback

Intro - End to end ML with Kubeflow @ SignalConf 2018

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Intro - End to end ML with Kubeflow @ SignalConf 2018

Similar to Intro - End to end ML with Kubeflow @ SignalConf 2018 (20)

Recently uploaded

Recently uploaded (20)

Intro - End to end ML with Kubeflow @ SignalConf 2018