Presented at All Things Open 2023
Presented by Danny McCormick - Google
Title: Deploying Models at Scale with Apache Beam
Abstract: Apache Beam is an open source tool for building distributed scalable data pipelines. This talk will explore how Beam can be used to perform common machine learning tasks, with a heavy focus on running inference at scale. The talk will include a demo component showing how Beam can be used to deploy and update models efficiently on both CPUs and GPUs for inference workloads.
An attendee can expect to leave this talk with a high level understanding of Beam, the challenges of deploying models at scale, and the ability to use Beam to easily parallelize their inference workloads.
Find more info about All Things Open:
On the web: https://www.allthingsopen.org/
Twitter: https://twitter.com/AllThingsOpen
LinkedIn: https://www.linkedin.com/company/all-things-open/
Instagram: https://www.instagram.com/allthingsopen/
Facebook: https://www.facebook.com/AllThingsOpen
Mastodon: https://mastodon.social/@allthingsopen
Threads: https://www.threads.net/@allthingsopen
2023 conference: https://2023.allthingsopen.org/
1. Deploying Models at Scale with
Apache Beam
Danny McCormick
docs.google.com/presentation/d/1WBcT69Lh9GIFoDhgx5OwuCJp0LkoCLxRPiU3Wa_67ns
(or shorturl.at/hAKN2)
12. From Flume came Beam
Datastore
Map
Map
Datastore
Map
Group by Key
(Reduce)
Combine
Map
Map
Combine
Datastore
Datastore
Datastore
13. Unified Model for Batch and Streaming
● Batch processing is a special case of
stream processing
● Batch + Stream = Beam
Mond
ay
Tues
day
Wedn
esdayThurs
day
Friday
17. Terms
● PCollection - distributed multi-element
dataset
● Transform - operation that takes N
PCollections and produces M PCollections
● Pipeline - directed acyclic graph of
Transforms and PCollections
24. Distributed Inference with Beam
● Beam takes care of all of this with the
RunInference transform
● Loads model, batches inputs, and
plugs into DAG
RunInference(model_handler=<config>)
30. Automatic Model Refresh
● Hot swaps model in live pipeline
● Manages memory for you
● No pipeline down time (though
maybe some inference down time)
34. Ideal small model configuration
VM
I/O &
Model
I/O &
Model
I/O &
Model
I/O &
Model
I/O &
Model
I/O &
Model
I/O &
Model
I/O &
Model
35. Ideal Large Model Configuration
VM
I/O
I/O
I/O
I/O
I/O
I/O
I/O
I/O
Model
36. Ideal Multi Large Model Configuration
VM
I/O
I/O
I/O
I/O
I/O
I/O
I/O
I/O
Model 1
VM
I/O
I/O
I/O
I/O
Model 2
I/O
I/O
I/O
I/O
Model 3
Option 1 Option 2
37. How do we map ideal model configurations to this?
VM
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
38. Ideal small model configuration
VM
I/O &
Model
I/O &
Model
I/O &
Model
I/O &
Model
I/O &
Model
I/O &
Model
I/O &
Model
I/O &
Model
40. Ideal Large Model Configuration
VM
I/O
I/O
I/O
I/O
I/O
I/O
I/O
I/O
Model
41. Ideal Large Model Configuration
VM
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Inference
Process
42. Optional: serve a single model for all processes
>>> model_handler = PytorchModelHandlerTensor(
... model_class=LinearRegression,
... large_model=True,
... model_params={'input_dim': 1, 'output_dim': 1},
... state_dict_path='gs://path/to/model.pt')
>>> pcoll | RunInference(model_handler=model_handler)
43. Ideal Multi Large Model Configuration
VM
I/O
I/O
I/O
I/O
I/O
I/O
I/O
I/O
Model 1
VM
I/O
I/O
I/O
I/O
Model 2
I/O
I/O
I/O
I/O
Model 3
44. Ideal Large Model Configuration
VM
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Model
Manager
45. Optional: serve a single model for all processes
>>> per_key_mhs = [
... KeyModelMapping(['key1', 'key2', 'key3'], model_handler_1),
... KeyModelMapping(['foo', 'bar', 'baz'], model_handler_2)]
>>> mh = KeyedModelHandler(per_key_mhs)
>>> pcoll | RunInference(model_handler=mh)
49. Beam Primitives for GPUs
● Resource hints for heterogeneous
pools
● Built in detection + framework
specific responses to GPUs at the
ModelHandler level
● Large model setting (revisited)
50. Ideal Large Model Configuration
VM
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Worker
Process
Inference
Process