The document discusses deploying a model trained in Azure Databricks onto Azure Machine Learning. It covers model training in Databricks, packaging the model and storing it in Azure Blob Storage, registering the model with Azure ML, deploying it to an Azure Kubernetes Service cluster, and serving it as a web service. Demo sections show training a model for semantic type detection in Databricks and deploying it using Azure ML. The goal is to make model deployment and consumption seamless across Azure services.
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
1.
2. Deploy and serve model from
Azure Databricks
onto Azure Machine Learning
- Reema Kuvadia ( Software Engineer 2)
- Tao Li (Senior Applied Scientist)
3. Agenda
▪ Model Training and
experimenting
▪ Model Deployment
▪ Model Consumption and Azure
website deployment
4. Azure Resources
Azure Databricks Azure Blob Storage Azure Machine Learning Azure Kubernetes Azure Web Service
Azure Databricks is an
Apache Spark-based
analytics platform
optimized for the Microsoft
Azure cloud services
platform
Experiment on Azure
Databricks
Model training using
PySpark
Azure Blob storage is a
service for storing large
amounts of unstructured
object data
Published model is stored
in Azure blob storage
Azure machine learning is
a cloud-based service
used to build, test and
deploy predictive analytics
solutions based on your
data
Register the model to
Azure Machine Learning
Azure Kubernetes
Service (AKS) is a
managed container
orchestration service,
based on the open
source Kubernetes system,
which is available on
the Azure public cloud
Create model image and
create endpoint
Microsoft Azure Web Sites
is a cloud computing based
platform for hosting
websites, created and
operated by Microsoft.
Model serve as Web
Service on Azure
Consume model using
RestAPI endpoint
Model Training Model Storing Model Deployment Model Severing
Model
Consumption
7. Introduction to the problem
▪ The current solutions mostly rely on dictionary/vocabulary, regular expression, and rule-based loop up
and matching to identify the semantic types.
▪ not robust to dirty and complex data
▪ not generalized to diverse data types.
▪
Problem: Correctly detecting the semantic types of data (column of data) is critical for data science
tasks such as data cleaning/normalization, data matching, and data enrichment.
Data Type
D. James, Kevin Louis, Steven Moring, Thomas V. Beard Name
Chicago, Seattle, Tenn, TBA Location
2019-10-12, Oct 12, 2019, 10/12/2019, 20191012 Date
8. Model E2E Flow
…
Data
…
App
Model Training
Experiment on Azure
Databricks
Model training using
PySpark
Azure Databricks
PySpark
Model Packaging
Package model using
MLeap
Publish model to azure
blob storage
Azure Blob Storage
Define Deployment
Define model environment
and dependencies
Prepare Scoring script
Visual Studio Code
Register the model to
Azure Machine Learning
Create model image
Deploy to azure
Kubernetes web service
Model Deployment
Azure Machine
Learning
Azure Kubernetes
Serve & Consume
Model serve as Web
Service on Azure
Consume model using
RestAPI endpoint
Azure Web Service
9. Model Architecture and Training
▪ Featurization
▪ Embedding Dataframe lookup in memory
▪ Spark SQL for featurization using UDF (user-defined function)
Multi-class Classification using Random Forest
▪ Modeling
▪
▪
text
Web Table:
Bing RetroIndex
Public Table:
Paper Data
Customer Table:
Demo Data
First Name Date Phone
John
Michael
...
Richard
2015-11-19 1-925-226-7368x212
08/15/2015 830-115-4090
... ...
May 27, 2016 (067)681-4908
1. Data Source &Table repository 2. Tabular Data & Features
Header Embeddings
Character Distributions
Word Embeddings
Global Statistics
Header statistics
Feature Extraction
(Data)
Column Data
Column Header
Feature Extraction
(Header)
Label Extraction
...
Person
.FirstName
Calendar
.Date
Identity.Service
.Phone
...
Features Labels
concatenate
Label Cleaning
3. Training and Testing 4. Semantic Type Detection
Training
Testing
Table for scoring
ML Model
Predicted Type
+
Confidence Score
Location.City: 0.8
NA: 0.6
Calendar.Year: 0.9
Excel Table
...
12. Model Deployment
▪ Model training on Azure Databricks.
▪ Package model and publish into Azure Blob
Storage
▪ Prerequisites
▪ AML (Azure Machine Learning) Workspace
▪ AKS (Azure Kubernetes Service) Cluster
▪ Azure Machine Learning and Storage SDK
▪ Model Registry
Registering a model to store, version, and track metadata about
models in your workspace.
▪ Define deployment
▪ Scoring File (named score.py)
▪ Loads the model when the deployed service starts.
▪ Receiving data, passing it to the model, and then returning
a response.
▪ AML environment. (software dependencies and libraries)
▪ Deploy the model
▪ Create the image
▪ Config the entry script and environment
▪ Config Runtime (runtime="spark-py")
▪ CPU and Memory
▪ Deploy image as a web app
▪ Deploy the model to AKS cluster
▪ Get model endpoint
▪ Consume the model
▪ Use the model via SDK
▪ Use the model via Endpoints
13. Scoring File (Score.py)
▪ init():
▪ This function loads the model into a global object.
▪ This function is run only once, when the Docker container
start the web service.
The entry script receives data submitted to a deployed web service and passes it to the model. It then
takes the response returned by the model and returns that to the client. The script contains two
functions that load and run the model:
def run(input_data):
try:
data = json.loads(input_data)['data’]
features = Featurization_new(data)
feature_df = spark.createDataFrame([features,], names)
predictions_raw = model.transform(feature_df)
predictions = predictions_raw.select("prediction", "features")
#Get each scored result
predictions = predictions.collect()
preds = [str(x['prediction']) for x in predictions]
return preds[0]
except Exception as e:
def init():
global spark
global model
global word_to_embedding
spark = SparkSession.builder.getOrCreate()
model_path = Model.get_model_path('semantic_mapping_model')
model = PipelineModel.load(model_path)
embedding_path = Model.get_model_path('word_to_embedding.pkl')
file = open(embedding_path, 'rb')
word_to_embedding = pickle.load(file)
file.close()
▪ run(input_data):
▪ This function uses the model to predict a value based on
the input data.
▪ Inputs and outputs of the run typically use JSON for
serialization and deserialization.
16. Model Consumption and Website Deployment
▪ Registration:
▪ To register model we need following:
▪ Path: (string) location of model
▪ Name: (string) model name
▪ Description: (string) that describes the model
▪ Worskapce: (string) name of workspace that we want
to consume in webservice.
In this script we register the model, create or use existing environment using YAML file.
Then deploy model as Webservice on AKS which will create and endpoint, that we consume in the
website.
name : project_environment
dependencies :
- python=3.6.2
- pip:
- azureml-defaults
- scikit-learn
- numpy
- inference-schema[numpy-support]
from azureml.core.model import Model
embedding = Model(ws, 'word_to_embedding.pkl')
if not embedding:
embedding = Model.register(model_path="./model/word_to_embedding.pkl
",
model_name="word_to_embedding.pkl",
description="Word to embedding",
workspacee=ws)
▪ Environment config file:
▪ You can now create and/or use an Environment object
when deploying a Webservice. The Environment can have
been previously registered with your Workspace, or it will
be registered with it as a part of the Webservice
deployment.
19. Summary
▪ Spark APIs we used are:
▪ Spark SQL and UDF (User Defined Functions) for
featurization
▪
▪ Microsoft Azure for making it
seamless to integrate with 3rd
party platforms