Vista: A declarative system for large-scale CNN feature transfer for multimodal analytics

Declarative Transfer Learning
from Deep CNNs at Scale
Supun Nakandala and Arun Kumar
{snakanda, arunkk}@eng.ucsd.edu

Growth of unstructured data
Source: ETC@USC
65.5 53
18.1 10.3 5.1
0
50
100
%
What type of data is used by Data
Scientists?
Relation Data Text Data Image Data
Other Video Data
Source: 2017 Kaggle Survey
Dark Data
2
Data growth mainly driven by
unstructured data
e-Commerce Healthcare Social Media
unstructured
structured

Deep Convolution Neural Networks (CNN) provide opportunities to
holistically integrate image data with analytics pipelines.
Opportunity: CNN
3

4
CNN: Hierarchical Feature Extractors
Low level features Mid level features High level features

CNN: Training Limitations
Lot of labelled training data Lot of compute power Time consuming
“Dark art” of
hyperparameter
tuning
5
“Transfer Learning” mitigates
these limitations

Outline
6
Example and Motivations
Our System Vista
Experimental Evaluation

Brand Tags Price
Transfer Learning: CNNs for the other 90%
Structured Data
Image Data
Train ML
Model
Evaluate
7
Pre-trained CNN

Brand Tags Price
Transfer Learning: Bottleneck
Structured Data
Image Data
Train ML
Model
Evaluate
8
Which layer will result in the best
accuracy?

Brand Tags Price
Transfer Learning: Current Practice
Structured Data
Image Data
Train ML
Model
Evaluate
9
+ Efficient CNN inference
- Doesn’t support scalable
processing
+ Scalable processing
+ Fault tolerant
- No support for CNNs

Problems with Current Practice
Usability: Manual management of CNN features.
Efficiency: From image inference for all feature layers has
computational redundancies.
Reliability: CNN layers are big, requires careful memory configuration.
Disk spills
System crashes!
10
l1 l2 l3

Outline
11
Our System Vista
Overview
System Architecture
System Optimizations

Vista: Overview
12
Vista is a declarative system for scalable feature transfer from
deep CNNs for multimodal analytics.
Vista takes in:
Brand Tags Price
Structured Data Image Data
l1 l2 l3
Pre-trained CNN and layers of
interest
ML model
Vista optimizes the CNN feature transfer workload and reliably
runs it.

Outline
13
Our System Vista
Overview
System Architecture

Vista: Architecture
Declarative API Benefit: Usability
Optimizer
Logical Plan Optimizations
Physical Plan Optimizations
System Configuration Optimizations
Benefit: Efficiency
and Reliability
Pre-trained
CNNs
Benefit: Efficiency,
Scalability, and Fault
Tolerance 14

Outline
15
Our System Vista

Current Practice: Repeated Inference
16
Structured Data {S}
Images
l1 l2 l3
Image Features {l1}
⋈
Multimodal Features {S, l1}
ML Model Lazy Materialization
Problem: Repeated inferences

Extract all layers in one go
17
Structured Data {S}
Images
l1 l2 l3
Image Features {l1, l2, l3}
⋈
Multimodal Features {S, l1, l2, l3}
Eager MaterializationML Model
{S, l1}
ML Model
{S, l2}
ML Model
{S, l3} Problem: High Memory
Footprint
- disk spills/cache misses
- system crashes!

Our Novel Plan: Staged CNN Inference
18
Structured Data {S}
Images
l1 l2 l3
Image Features {l1}
Staged Materialization
ML Model
Benefits: No redundant
computations, minimum
memory footprint
⋈
Multimodal Features {S, l1} Multimodal Features
{S, l2}
ML Model
l1 l2 l3
Partial CNN Operator

Our Novel Plan: Staged CNN Inference
19
Structured Data {S}
Images
l1 l2 l3
Image Features {l1}
ML Model
memory footprint
⋈
Multimodal Features {S, l1} Multimodal Features
{S, l2}
ML Model
l1 l2 l3
Problem: High join overhead
14 KB
784 KB

Our Novel Plan: Staged CNN Inference - Reordered
20
Images
l1 l2 l3
Image Features {l1}
ML Model
memory footprint
Multimodal Features {S, l1}
Structured Data {S}
⋈
Multimodal Features
{S, l2}
ML Model
l1 l2 l3
Problem: High join overhead

Outline
21
Our System Vista

Vista: Physical Plan Optimizations
Benefit: Vista automatically picks the physical plan choices.
22
Join Operator
Options: Broadcast vs Shuffle join
Trade-Offs: Memory footprint vs Network cost
Storage Format
Options: Compressed vs Uncompressed
Trade-Offs: Memory footprint vs Compute cost

Outline
23
Our System Vista

Vista: System Configuration Optimizations
24
Memory allocation
Query parallelism
Data partition size

Memory Allocation
25
Challenge: Default configurations won’t work
CNN features are big
Non trivial CNN model inference memory
In-Memory
Storage
Core
Memory
User
Memory
OS Reserved
Memory
CNN Inference
Memory
UDF Execution
Query Processing
e.g. join processing Store intermediate data CNN model footprints
Benefit: Vista frees the data scientist from manual memory
and system configuration tuning.

Query Parallelism and Data Partition Size
26
Benefit: Vista sets Query Parallelism to improve utilization
and reduce disk spills.
Query Parallelism
Increase Query Parallelism  Increase CNN Inference
Memory  Less Storage Memory
Data Partition Size
Too big  System Crash, Too small  High overhead
Benefit: Vista sets optimal data partition size to reduce
overheads and avoid crashes.

Outline
27
Our System Vista

Experimental Setup
8 worker nodes
and 1 master node
32 GB RAM
Intel Xeon @ 2.00GHz CPU with 8 cores
300 GB HDD
Version 2.2.0
Runs in standalone mode
Version 1.3.0 28

Dataset & Workloads
Amazon product reviews dataset
ML model:
Logistic Regression for 10 iterations
29
Number of Records 200,000
Number of Structured Features 200 – price, category embedding, review
embedding
Image Image of each product item
Target Predict each product is popular or not
Pre-trained CNNs:
AlexNet – Last 4 layers
VGG16 – Last 3 layers
ResNet50 – Last 5 layers

End-to-end reliability and efficiency
162.47
345.6
314.43
51.1
X
163.37
44.37
X
156.73
16.33 X X16.8
61.7
35.86
0
50
100
150
200
250
300
350
400
AlexNet VGG ResNet
Spark-TF/Amazon
Lazy-1 Lazy-5 Lazy-7 Eager Vista X System crashes
10X
Runtime(min)
Experimental results for other data systems, datasets, and drill-down experiments can be found in our paper.
System crashes due
to mismanaged
memory
30

Summary of Vista
Declarative system for scalable feature transfer from
CNNs.
Thank You!
Project Webpage: https://adalabucsd.github.io/vista.html
Performs DBMS inspired logical plan, physical plan, and
system configuration optimizations.
Improves efficiency by up to 90% and avoids unexpected
system crashes.
31

Vista: A declarative system for large-scale CNN feature transfer for multimodal analytics

Recommended

Recommended

More Related Content

Similar to Vista: A declarative system for large-scale CNN feature transfer for multimodal analytics

Similar to Vista: A declarative system for large-scale CNN feature transfer for multimodal analytics (20)

Recently uploaded

Recently uploaded (20)

Vista: A declarative system for large-scale CNN feature transfer for multimodal analytics