The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serverless Advanced Analytics - DAT401 - re:Invent 2017

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
T he Bo ss: A Pe tasc al e Datab ase f o r L ar g e -Sc al e
Ne u r o sc i e nc e , P o we r e d b y Se r v e r l e ss A dv anc e d
A nal yti c s
M i k e C o l s o n , S o l u t i o n s A r c h i t e c t
D e a n K l e i s s a s , T e c h n i c a l L e a d
N o v e m b e r 3 0 , 2 0 1 7

Agenda
• AWS Introduction (10 mins.)
• Serverless Technologies Overview
• JHU/APL and IARPA MICrONS (35 mins.)
• Program Introduction
• Boss Overview and Architecture
• Serverless Components Deep Dive
• Serverless Design Considerations
• Demonstration

Cloud Architecture Evolution
Virtualized Managed Serverless
Virtualized
Servers
Managed
Platforms
Serverless
Technologies

No servers to provision
or manage
Scales with usage
Never pay for idle Availability and fault
tolerance built in
Serverless characteristics

Serverless Computing – AWS Lambda
Run code without provisioning or managing servers – pay
only for the compute time you consume.

Continuous
Scaling
No Servers to
Manage
Subsecond
Metering
Benefits of AWS Lambda
AWS Lambda handles:
• Operations and
management
• Provisioning and
utilization
• Scaling
• Availability and fault
tolerance
Automatically scales your
application, running code in
response to each trigger
Your code runs in parallel and
processes each trigger
individually, scaling precisely
with the size of the workload
Pricing
• CPU and Network
scaled based on
RAM (128 MB to
1500 MB)
• $0.20 per
1M requests
• Price per 100 ms

λλ
λ
DBMS
λ
λ
λ
λ
λ
λ λ
λ
λ
Queue

Benefits of AWS Step Functions
Diagnose and
debug problems
faster
Adapt to change
Easy to connect and
coordinate
distributed components
and microservices to
quickly create apps
Manages the
operations and
infrastructure of
service coordination
to ensure availability
at scale, and
under failure
Productivity Agility Resilience

Application Lifecycle in AWS Step Functions
Visualize in the
Console
Define in JSON Monitor
Executions

Amazon
DynamoDB
Fast and Flexible NoSQL Database Service
• NoSQL Database
• Seamless scalability
• Zero admin
• Single digit millisecond latency

Amazon DynamoDB
Document or Key-Value Scales to Any WorkloadFully Managed NoSQL
Access Control Event Driven ProgrammingFast and Consistent

Dean Kleissas
Research Engineer

IARPA MICrONS
Intelligence Advanced Research Projects Activity
Machine Intelligence from Cortical Networks
• MICrONS seeks to revolutionize machine learning by
understanding the representations, transformations, and
learning rules employed by the brain
• The program is expressly designed as a dialogue between
computer science, data science, and neuroscience
Neurally Plausible
Machine Learning
Framework
Behavior
Experiment
Functional
Imaging
Structural
Imaging
Data
Analysis

Why Is This Different?
• Current Neural networks are “neurally inspired” but
not considered biofidelic or neurally plausible
• Previous projects to build algorithms based on the
brain exist, but have been focused on macro and
micro information, or lower-fidelity statistics
• Little is known about the brain at the mesoscale
• A “cortical column” is theorized to be order
~1mm3
• In this program, structure and function co-
registration provides a uniquely rich picture of
computing circuits
• Researchers are directly measuring mesoscale
activity and circuits
Human Connectome Project
(1-100s of neurons)
microscale
(1k – 1M neurons)
mesoscale
(brain regions)
macroscale
?

Why Is This Different: Functional Imaging
Video Credit: Tianyu Wang (Xu Lab, Cornell University) & Jacob Reimer (Tolias Lab, Baylor College of Medicine)

Why Is This Different: Structural Imaging
• Peta-scale structural imaging
• 1mm3 region is large enough to contain
meaningful circuits never before observed
• ~50k-100k neurons
• ~100,000,000 synapses
• ~4x4x30nm voxels
• ~2 – 2.5 PB
• Three different techniques
• Scanning Electron Microscopy (SEM)
• Transmission Electron Microscopy
(TEM)
• Fluorescent in situ sequencing
(FISSEQ) Barcoding
Video Credit: Kasthuri, et al. - Cell 2015
Bobby Kasthuri, Daniel Berger, Jeff Lichtman

Why Is This Different: Co-registered Data
• Co-registration links structure to function
• For the first time, researchers will
measure in the same sample at scale:
• Stimulus (”input”)
• Behavior (“output”)
• Connectome (“circuit diagram”)
• Neuronal Activity (“voltages”)
Calcium Imaging Data – Tolias Lab, Baylor College of Medicine
X-ray Tomography and co-registration – Allen Institute for Brain Science

Why Can We Succeed Now?
• New imaging techniques and engineering advances allow for
interrogation of mesoscale circuits
• Increased computing power has enabled automated analysis w/
machine learning
• Reduced storage costs have made collection of many petabytes of data
possible
• Use of the cloud has provided the ability to scale when needed and
facilitates sharing and collaboration
We can directly observe and densely reconstruct mesoscale
neuronal circuits in vivo for the first time

The Boss
Block and Object Storage Service
The Boss is a multi-dimensional spatial database, provided as a managed service on AWS
The Boss stores annotation data co-registered to image data
• An annotation is a unique 64-bit identifier applied to a set of voxels, representing its spatial distribution
ID: 1267
ID: 345345
ID: 534534799

High-Level System Architecture

Ingest Service
Boss API Overview
The Boss is accessible through a versioned REST API
User Service Group Service Resource Service Permission Service
Object Service Tile Service Downsample
Service
Cutout Service
Metadata Service
On Premise The BossID: 104829

The Boss Leverages Serverless Components
Experimental Metadata,
Annotation Index, Cuboid
Index, Tile Index
DynamoDB Lambda SQS Step Functions S3
Downsampling, Ingest,
Cache page-in and page-out
operations, DNS updates
Ingest Upload Tasks,
Reliable Lambda
Processing
Downsample Workflow,
Ingest Workflow,
Asynchronous Delete
Workflow
Cuboid Storage,
Tile Storage,
Static Hosting

Heaviside
Python library and DSL for working with AWS Step Functions
• The Step Function state machine language, while
flexible, is hard to write and maintain
• Heaviside is a Python package that provides several
components to make Step Functions easy to use
• DSL and Compiler – Greatly simplifies writing and maintaining
Step Function JSON definitions
• Library for creating and execution Step Functions in AWS
• A framework for running Activities
https://github.com/jhuapl-boss/heaviside

Heaviside Example {
"Comment": "Delete CuboidnRemoves all of the different data related to a given cuboid,nremoves the actual cuboid data, and then cleans up the finalnbookkeeping for the cuboidn",
"States": {
"Line7": {
"Next": "merge_parallel_outputs",
"Branches": [
{
"States": {
"delete_metadata": {
"Comment": "deletes metadata",
"Resource": "arn:aws:states:us-east-1:451493790433:activity:delete_metadata-integration-boss",
"End": true,
"Retry": [
{
"IntervalSeconds": 60,
"MaxAttempts": 4,
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2.0
}
],
"Type": "Task"
}
},
"StartAt": "delete_metadata"
},
{
"States": {
"delete_id_count": {
"Comment": "deletes from dynamodb table idcount",
"Resource": "arn:aws:states:us-east-1:451493790433:activity:delete_id_count-integration-boss",
"End": true,
"Retry": [
{
"MaxAttempts": 4,
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2.0
}
],
"Type": "Task"
}
},
"StartAt": "delete_id_count"
},
{
"States": {
"delete_id_index": {
"Comment": "deletes from dyanmodb table idindex",
"Resource": "arn:aws:states:us-east-1:451493790433:activity:delete_id_index-integration-boss",
"End": true,
"Retry": [
{
"MaxAttempts": 4,
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2.0
}
],
"Type": "Task"
}
},
"StartAt": "delete_id_index"
}
],
"Type": "Parallel"
},
"merge_parallel_outputs": {
"Comment": "merges the outputs of all the parallel activities into a single dictionary",
"Resource": "arn:aws:states:us-east-1:451493790433:activity:merge_parallel_outputs-integration-boss",
"Next": "find_s3_index",
"Retry": [
{
"MaxAttempts": 4,
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2.0
}
],
"Type": "Task"
},
"find_s3_index": {
"Comment": "finds data to delete from s3index and s3",
"Resource": "arn:aws:states:us-east-1:451493790433:activity:find_s3_index-integration-boss",
"Next": "delete_s3_index",
"Retry": [
{
"MaxAttempts": 4,
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2.0
}
],
"Type": "Task"
},
"delete_s3_index": {
"Comment": "deletes data from s3index and s3",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.error",
"Next": "notify_admins"
}
],
"Resource": "arn:aws:states:us-east-1:451493790433:activity:delete_s3_index-integration-boss",
"Next": "delete_clean_up",
"Type": "Task",
"Retry": [
{
"MaxAttempts": 4,
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2.0
}
]
},
"notify_admins": {
"Comment": "sends SNS message to microns topic",
"Resource": "arn:aws:states:us-east-1:451493790433:activity:notify_admins-integration-boss",
"Next": "delete_clean_up",
"Type": "Task"
},
"delete_clean_up": {
"Comment": "cleans up the delete s3 table.",
"Resource": "arn:aws:states:us-east-1:451493790433:activity:delete_clean_up-integration-boss",
"End": true,
"Retry": [
{
"MaxAttempts": 4,
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2.0
}
],
"Type": "Task"
}
},
"StartAt": "Line7"
}
Heaviside
Compiler

Downsample Deep Dive: Overview
• Problem Description
• Need to iteratively downsample a dataset to build a resolution
hierarchy
• Enables “zooming out” for large scale visualization and analysis
• Workflow is run infrequently and on-demand by users
• Workflow needs to scale from 2GB to 2PB of data
• Implementation
• Use a Step Function to manage failures and iterate processing
• Since downsample is “embarrassingly parallel” invoke Lambda
in parallel to perform image processing
• Serverless Benefit
• Can massively scale processing for short period of time and on-
demand without an administrator in the loop
• Don’t need to worry about high availability

Downsample Deep Dive: Result
~2 PB
~512 TB
~128 TB
~32 TB
~8 TB
~2 TB
~0.5 TB
~0.12 TB

Downsample Deep Dive: Process
• User requests a channel to be
downsampled via the API
• Step Function invokes Lambdas in
parallel to downsample data while
providing status to the API
• Step Function iterates automatically
to build the resolution hierarchy
?

Ingest Deep Dive: Overview
• Problem Description
• Need to transfer large amounts of on-premise image data into the Boss
• Support both data transfer to the cloud and "ingest" into the Boss format
• Workflow is run infrequently and on-demand by users, but often in “bursts”
as teams deliver data for the same deadlines
• Workflow needs to scale from 2GB to 2PB of data
• Implementation
• Utilize SQS, S3, Lambda, and DynamoDB to provide high-throughput,
reliable upload and processing pipeline
• Serverless Benefits
• Don’t need to keep servers up when workflow is not running
• Can massively scale processing for short period of time on-demand

Ingest Deep Dive: Create an Ingest Job
• The Ingest process is
on-demand and can be
started at any time
• User uploads
configuration file
• Boss API creates a
temporary task queue

Ingest Deep Dive: Populate Upload Queue
• Step Function invoked to
populate the Upload
Task Queue
• First, Lambda is called in
parallel to upload
messages to SQS
• Next, the Step Function
waits to allow SQS to
become consistent
• Finally, the Step
Function verifies the
number of messages in
the queue is correct

Ingest Deep Dive: Upload Tiles
• The Ingest Client operates
distributed and in parallel,
uploading tiles as fast as
possible to Amazon S3
• Amazon S3 PUT events
invoke Lambda to track
tiles
• When enough tiles arrive,
a second Lambda is
asynchronously invoked to
ingest the tiles
????

Ingest Deep Dive: Ingest Cuboids
• Lambda function converts
image tiles into
compressed 3D matrices
• Processed data is written
to final S3 bucket and
indexed
• Temporary image files are
deleted
X

Ingest rate limited by the
user’s local resources
and bandwidth
Ingest Deep Dive: Benefit of Serverless Ingest
• Support for multiple users
ingesting in parallel
• Does not impact the rest of
the system’s performance
• Auto-scales automatically
Current transfers have reached >4Gbps

Lambda Design Considerations
• Duration and memory limitations
• 5 minutes, 1.5GB memory max can limit applications
• More memory = More CPU
• Your lambda will run FASTER but cost MORE per 100ms
• Optimize allocated memory independently for each
lambda function to minimize cost
• Code and dependencies (virtualenv) limited to 250MB
• Lambda capacity is tied to execution duration
• If your Lambda calls external services (e.g. DynamoDB,
S3), network and external latencies WILL effect
execution time
• This can result in interesting failure modes and
cascading failures
• As your Lambda starts to throttle and automatically retry,
things can continue to back up even more
• Circuit breakers and other resilient design patterns are
useful

DynamoDB Design Considerations
• Object size drives capacity
• As a read or write grows in size,
consumed capacity increases
• The largest size record uses 400x
the capacity of the smallest size
record
• When you pay for capacity you
are actually paying for partitions
• If you need to deal with a hot
partition you need to DOUBLE
your capacity
• Beware of the Hot Partition
• Happens when you read/write
heavily to keys in the same
partition
• Can be very confusing as you
have provisioned plenty of
capacity but still get throttled
• Be sure to “spread” your keys
across partitions
• Prepend a hash!
Units of Capacity required for writes = Number of item writes per second x item size in 1KB blocks
Units of Capacity required for reads = Number of item reads per second x item size in 4KB blocks

Scaling Up Lambda and DynamoDB
• If you want to scale Lambda you must up your limits
• Increase your Lambda capacity limit
• If you project your lambda function into a VPC, make sure your network
architecture can handle the bandwidth
• Increase your ENI limit to match your Lambda capacity
• If interacting with S3 heavily, pre-shard your bucket
• Use DynamoDB Auto Scaling!
• DynamoDB can scale up infinitely, but only down 4 times a day
• TEST and then TEST and then TEST again
• Attempt to model user behavior with end-to-end regression tests
• Update your model of user behavior with time
• Look into error and log aggregators
• When things to bad, they go pretty bad so it’s hard to debug
DevOps Fail

Acknowledgements
JHU/APL
Denise D’Angelo
Tim Gion
Sandy Hider
Priya Manavalan
Jordan Matelsky
Derek Pryor
Will Gray Roncal
Brock Wester
IARPA
David A. Markowitz
R. Jacob Vogelstein
JHU
Alex Baden
Kunal Lillaney
Randal Burns
Team 1
David Cox
Hanspeter Pfister
Jeff Lichtman
Team 3
George Church
Sandra Kuhlman
Tai Sing Lee
Alan Yuille
Team 2
Andreas Tolias
Sebastian Seung
R. Clay Reid
Nuno da Costa

Thank you!
T h e B o s s : g i t h u b . c o m / j h u a p l - b o s s
H e a v i s i d e : g i t h u b . c o m / j h u a p l - b o s s / h e a v i s i d e
I A R P A M I C r O N S : h t t p s : / / w w w . i a r p a . g o v / i n d e x . p h p / r e s e a r c h -
p r o g r a m s / m i c r o n s

The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serverless Advanced Analytics - DAT401 - re:Invent 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serverless Advanced Analytics - DAT401 - re:Invent 2017

Similar to The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serverless Advanced Analytics - DAT401 - re:Invent 2017 (20)

More from Amazon Web Services

More from Amazon Web Services (20)

The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serverless Advanced Analytics - DAT401 - re:Invent 2017