Sanket Shahane, Data Scientist & Andy Liu, Senior Data Engineer, Lucidworks. Presentation from ACTIVATE 2019, the Search and AI Conference. http://www.activate-conf.com
2. Productionizing Python ML
Models using Fusion 5
A N D Y L I U – S E N I O R D ATA E N G I N E E R , L U C I D W O R K S
S A N K E T S H A H A N E – D ATA S C I E N T I S T, L U C I D W O R K S
4. Data Science and Search
Development
D ATA S C I E N T I S T S :
U s e t h e m e t h o d o l o g y
o f c h o i c e
U s e t h e p a c k a g e s o f
c h o i c e
S i m p l e m o d e l h a n d - o f f
t o s e a r c h a p p l i c a t i o n s
CONVERGENCE
S E A R C H D E V E L O P E R S :
L e v e r a g e a l o w - c o d e
e x p e r i e n c e f o r
e m b e d d i n g A I i n t o
i n d e x a n d q u e r y
p i p e l i n e s
N o n e e d f o r d e e p d a t a
s c i e n c e e x p e r t i s e
6. YES!!
Introducing native Python model serving in Fusion 5.0
• Real-time ML model serving using native Python
– Just for model serving, not model training
• For data scientists who want to integrate custom trained Python machine
learning models with Fusion’s index and query pipelines
7. Have your cake and eat it too!
U S E Y O U R E X I S T I N G S E A R C H
D ATA C O L L E C T I O N S F O R
M O D E L I N G
B R I N G Y O U R O W N M O D E L
( B Y O M )
O P E N I N T E G R AT I O N T H R O U G H
F U S I O N M L M I C R O S E R V I C E
L E V E R A G E S C A L A B L E S E A R C H
P L AT F O R M W I T H O P E R AT I O N A L
P I P E L I N E S
Fusion ML
Microservic
e
9. Architecture
• Microservices architecture running in K8s
• Exposes endpoints for adding, updating,
and deleting user-defined models, running
predictions
• Data Science Integration Toolkit includes
client library for ease of use
11. Architectural Goals
Building a runtime Python model execution environment
• Extensible
– Pluggable API for data scientists to add data processing and modeling code
• Scalable
– Leverages modern technologies like GRPC and K8s for performance at scale
• Flexible
– Simple yet flexible API to support different kinds of machine learning models
• Integrated
– Fully integrated with Fusion’s Index and Query Pipelines
• Easy to use
– API driven with convenient Python client libraries
15. Process
Train
model
Create
plugin
Deploy &
Test
Integrate
• Create a bundle ZIP file that contains:
– All serialized object files
– predict.py file is executed by the ML service when a prediction is requested from
a query or index pipeline
• predict.py must have these two functions:
– def init(bundle_path: str) – Called only once. Place one-time
initialization here, like loading a serialized model from disk
– def predict(model_input: dict) -> dict - necessary to
generate a prediction from a single input