SRV318_Research at PNNL Powered by AWS

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Research at PNNL: Powered by AWS
M i k e G i a r d i n e l l i a n d R a l p h P e r k o
P a c i f i c N o r t h w e s t N a t i o n a l L a b o r a t o r y
N o v e m b e r 2 8 , 2 0 1 7
Reference herein to any specific commercial product, process, or service by trade name,
trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement,
recommendation, or favoring by the United States Government or any agency thereof, or Battelle
Memorial Institute. The views and opinions of authors expressed herein do not necessarily state
or reflect those of the United States Government or any agency thereof.
SRV 318

Senior
Software
Engineers
Mike Giardinelli Ralph Perko

The national laboratory system

PNNL at a glance
$920.4 M
In R&D
expenditures
104
U.S. and foreign
patents granted
1,058
Peer-reviewed
publications
2 FLC Awards
5 R&D 100 Awards
4,400
Scientists, engineers
and non-technical staff

Software engineering at PNNL
• Staff focus is research and innovation, not operations
• Developers work with scientists to enable research
• Limited space and resources for hardware
• Big driver for moving to AWS!
• Agile is difficult

Problem: isolated research
• Who are the researchers
• Researchers work independently
• Focus on innovation and novel concepts
• Lack of collaboration with engineers
• Creates long delivery times
• Product usually isn’t what the customer
has envisioned

Enabling research with AWS
• Research is the life blood of the organization
• Researchers should not be troubled with environment
configurations, optimizations, etc.
• Software engineers provide expertise needed to build applied
solutions
• Utilizing AWS has been a turning point.
• AWS has dramatically helped to improve collaboration.
• AWS fits better with our Agile software processes
As a result, researchers can focus on the problem

Moving to the cloud
Our progression to AWS
Drivers
• Lack of resources internally (hardware and people)
• Customer deliverables and demands / deadlines
Concerns
• Cost
• Vendor lock-in
Initial Approach
• Fork-lift model
• Missed out on AWS services
• Still had operational headaches
Current Approach
• Serverless wherever possible

Image Classification Pipeline

Overview
Goal
Enable novel image classification research on live,
streaming data
First primarily serverless solution

Image retrieval and classification - requirements
Requirements
• Handle static and streaming media
• Scalable, robust, and flexible
• Easily deployed and maintained
• Extensible (add additional models and instantiations)
• Identify optimal ways to collaborate
Research and customer requirements

Image retrieval and classification
Research and engineering implementation

Image retrieval and classification
How research and engineering collaborated on the effort

Image classification—Apache NiFi
Why NiFi?

NiFi overview
• Process and distribute data
• Message / data routing is very flexible and robust
• ETL is painless
• Easy to install, scale, configure, and extend
• Visually see what is going on with your pipelines
• Backpressure and queueing are baked into the flows—
excellent for systems that have brittle endpoints
• Low barrier to entry; broadens user audience
Where we find benefit and why we use it

NiFi tuning on AWS
• C4 and M4 EC2 instance types work well
• Scaling: we go vertical, then horizontal
• Keep normal CPU load at 50–60% CPU utilization
• Set provenance to Volatile
• General purpose SSDs work well
• Follow the NiFi “Configuration Best Practices” in the
admin guide

Data flow logic
{"image":"http://somewebsite.com/puppies.jpeg"},
{"image":"http://somewebsite.com/kittens.jpeg"},
{"image":"http://somewebsite.com/koalas.jpeg"}
Filter
“We only care about Koala bears!”
Create new
message
and push
to Amazon
SNS
{
"url":"http://somewebsite.com/koalas.jpeg",
"hash" : "092f6b17d186adb2e121afcdc7e5470b0c6f82a5",
"name" : "koalas.jpeg",
"type" : "jpeg",
"bucket” : “image-classifier-test”
}
{"image":"http://somewebsite.com/koalas.jpeg"}
Read data from Kafka topic

AWS SNS
{
"Records": [
{
"EventVersion": "1.0",
"EventSubscriptionArn": "arn:aws:sns:EXAMPLE",
"EventSource": "aws:sns",
"Sns": {
"SignatureVersion": "1",
"Timestamp": null,
"Signature": "EXAMPLE",
"SigningCertUrl": "EXAMPLE",
"MessageId": "95df01b4-ee98-5cb9-9903-4c221d41eb5e",
"Message": "{"hash": "092f6b17d186adb2e121afcdc7e5470b0c6f82a5",
"url": "http://somewebsite.com/koalas.jpeg", "bucketname": "image-
classifier-test", "name" : "koalas.jpeg", "type" : "jpeg",}",

Why Lambda?
Image classification

Why AWS Lambda?
Where we went serverless instead of NiFi
• Hands down best choice for scaling
• Performance, cost, maintenance
• Straightforward code
• 60+ image requests per second
• Required 4+ large EC2 instances clustered
• One month pilot
• AWS Lambda
• 161,490,065 requests; 61,490,065 seconds
• $1,050
• Amazon EC2:
• 4 EC2 m4.10xlarge instances
• $3,329 (reserved)

AWS Lambda
Where we went serverless instead of NiFi

Lambda code example
public void handleRequest(SNSEvent snsEvent, Context context) throws Exception {
//get the JSON payload
String message = snsEvent.getRecords().get(0).getSNS().getMessage();
//parse JSON
//after retrieving the URL download the image
BufferedImage image = ImageIO.read(imageUrl);
//convert
ImageIO.write(image, “jpeg”, byteArrayOutputStream);
//save to S3
s3Client.putObject(new PutObjectRequest(bucketName, fileName, inputStream,metadata));
//write metadata to Dynamo
Table table = dynamoDB.getTable(dynamoDbTable);
Item item = new Item()
.withString("url_hash", request.getHash())
.withString("url", request.getUrl())
.withString("s3_bucket", s3Bucket);
table.putItem(item);
//create and send a notification
SendMessageRequest sendNewImageCachedMsg = new SendMessageRequest()
.withQueueUrl(queueUrl).withMessageBody(newImageJson);
amazonSqs.sendMessage(sendNewImageCachedMsg);
}

How research and engineering collaborated on the effort
Collaboration

Lessons learned
• Fantastic for scaling
• Obvious choice
• Very performant when functions are loaded (warm)
• API is easy to use
• Just Java
• Used for two key situations
• Low cost development/pilot efforts
• High volume/throughput

Lessons learned (continued)
• Cold start performance
• 30 s (cold) as opposed to 400 ms (warm)
• Legacy code vs new development
• Limits on jar sizes
• Message size on Amazon SNS
• 256 KB limit
• Combine functionality in a single Lambda function!
• Easier and cheaper to manage
• Step functions for our use cases were too expensive

EC2 vs Lambda - EC2 based solution

EC2 vs Lambda - Lambda based solution

Exploring Serverless
Exploring Serverless

Amazon Athena
• Great for exploring data in Amazon S3
• HQL / SQL support
• Partition support
• Use AWS Glue crawlers
• Complements Hadoop cluster
Where else can we use serverless?

Moving Forward with Serverless

Supporting research
Evaluation of capabilities requires infrastructure
AWS Glue

Lessons Learned

In Summary
• More and more we lean on AWS serverless services
• We don’t have the resources for operations and maintenance
• Government customers we support prefer serverless solutions
• Makes it easier to provide researchers and engineers with flexible
blueprints for their implementations
• Focus on solving problems not setting up infrastructure
• What are your technical needs. Do we already have something similar?
• Leverage AWS environment to provide easy access to data, services, tools,
and resources
• Pleased with performance
• We can “brute force” solutions if we have to
• Most performance tuning is trivial
• Find most cost-effective use cases for your needs
• We have been able to strike a balance between serverless and managed
• Periodically do spot checks on cost. Upfront calculations may have been
incorrect

In Summary Cont’d
• Go-to tech stack
• Apache NiFi, Amazon S3, AWS Lambda, Amazon SQS, Amazon SNS,
Amazon DynamoDB, Amazon RDS, others as needed
• Take advantage of built-in events / triggers when you can
• Most of the time S3 + events are good enough
• “Free” capability
• We have abandoned Kafka in favor of Apache NiFi site-to-site or Amazon SQS
• Apache Kafka is great, just don’t have the administrative resources to
support. Use AWS alternative, when possible.
• Most “streaming” requests by our customers don’t really require streaming
• Request that researchers and engineers catalog their data and try to follow basic
data lake practices
• Keep raw and enriched / augmented separate
• Add metadata to known events and important time frames
• Enable start / stop and replay to improve evaluation

Thank you!

SRV318_Research at PNNL Powered by AWS

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SRV318_Research at PNNL Powered by AWS

Similar to SRV318_Research at PNNL Powered by AWS (20)

More from Amazon Web Services

More from Amazon Web Services (20)

SRV318_Research at PNNL Powered by AWS