SlideShare a Scribd company logo
1 of 38
Download to read offline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Research at PNNL: Powered by AWS
M i k e G i a r d i n e l l i a n d R a l p h P e r k o
P a c i f i c N o r t h w e s t N a t i o n a l L a b o r a t o r y
N o v e m b e r 2 8 , 2 0 1 7
Reference herein to any specific commercial product, process, or service by trade name,
trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement,
recommendation, or favoring by the United States Government or any agency thereof, or Battelle
Memorial Institute. The views and opinions of authors expressed herein do not necessarily state
or reflect those of the United States Government or any agency thereof.
SRV 318
Senior
Software
Engineers
Mike Giardinelli Ralph Perko
The national laboratory system
PNNL at a glance
$920.4 M
In R&D
expenditures
104
U.S. and foreign
patents granted
1,058
Peer-reviewed
publications
2 FLC Awards
5 R&D 100 Awards
4,400
Scientists, engineers
and non-technical staff
Software engineering at PNNL
• Staff focus is research and innovation, not operations
• Developers work with scientists to enable research
• Limited space and resources for hardware
• Big driver for moving to AWS!
• Agile is difficult
Problem: isolated research
• Who are the researchers
• Researchers work independently
• Focus on innovation and novel concepts
• Lack of collaboration with engineers
• Creates long delivery times
• Product usually isn’t what the customer
has envisioned
Enabling research with AWS
• Research is the life blood of the organization
• Researchers should not be troubled with environment
configurations, optimizations, etc.
• Software engineers provide expertise needed to build applied
solutions
• Utilizing AWS has been a turning point.
• AWS has dramatically helped to improve collaboration.
• AWS fits better with our Agile software processes
As a result, researchers can focus on the problem
Moving to the cloud
Our progression to AWS
Drivers
• Lack of resources internally (hardware and people)
• Customer deliverables and demands / deadlines
Concerns
• Cost
• Vendor lock-in
Initial Approach
• Fork-lift model
• Missed out on AWS services
• Still had operational headaches
Current Approach
• Serverless wherever possible
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Image Classification Pipeline
Overview
Goal
Enable novel image classification research on live,
streaming data
First primarily serverless solution
Image retrieval and classification - requirements
Requirements
• Handle static and streaming media
• Scalable, robust, and flexible
• Easily deployed and maintained
• Extensible (add additional models and instantiations)
• Identify optimal ways to collaborate
Research and customer requirements
Image retrieval and classification
Research and engineering implementation
Image retrieval and classification
How research and engineering collaborated on the effort
Image classification—Apache NiFi
Why NiFi?
NiFi overview
• Process and distribute data
• Message / data routing is very flexible and robust
• ETL is painless
• Easy to install, scale, configure, and extend
• Visually see what is going on with your pipelines
• Backpressure and queueing are baked into the flows—
excellent for systems that have brittle endpoints
• Low barrier to entry; broadens user audience
Where we find benefit and why we use it
Image cache data flow example
NiFi tuning on AWS
• C4 and M4 EC2 instance types work well
• Scaling: we go vertical, then horizontal
• Keep normal CPU load at 50–60% CPU utilization
• Set provenance to Volatile
• General purpose SSDs work well
• Follow the NiFi “Configuration Best Practices” in the
admin guide
Data flow logic
{"image":"http://somewebsite.com/puppies.jpeg"},
{"image":"http://somewebsite.com/kittens.jpeg"},
{"image":"http://somewebsite.com/koalas.jpeg"}
Filter
“We only care about Koala bears!”
Create new
message
and push
to Amazon
SNS
{
"url":"http://somewebsite.com/koalas.jpeg",
"hash" : "092f6b17d186adb2e121afcdc7e5470b0c6f82a5",
"name" : "koalas.jpeg",
"type" : "jpeg",
"bucket” : “image-classifier-test”
}
{"image":"http://somewebsite.com/koalas.jpeg"}
Read data from Kafka topic
Why SNS?
Image classification
AWS SNS
{
"Records": [
{
"EventVersion": "1.0",
"EventSubscriptionArn": "arn:aws:sns:EXAMPLE",
"EventSource": "aws:sns",
"Sns": {
"SignatureVersion": "1",
"Timestamp": null,
"Signature": "EXAMPLE",
"SigningCertUrl": "EXAMPLE",
"MessageId": "95df01b4-ee98-5cb9-9903-4c221d41eb5e",
"Message": "{"hash": "092f6b17d186adb2e121afcdc7e5470b0c6f82a5",
"url": "http://somewebsite.com/koalas.jpeg", "bucketname": "image-
classifier-test", "name" : "koalas.jpeg", "type" : "jpeg",}",
Why Lambda?
Image classification
Why AWS Lambda?
Where we went serverless instead of NiFi
• Hands down best choice for scaling
• Performance, cost, maintenance
• Straightforward code
• 60+ image requests per second
• Required 4+ large EC2 instances clustered
• One month pilot
• AWS Lambda
• 161,490,065 requests; 61,490,065 seconds
• $1,050
• Amazon EC2:
• 4 EC2 m4.10xlarge instances
• $3,329 (reserved)
AWS Lambda
Where we went serverless instead of NiFi
Lambda code example
public void handleRequest(SNSEvent snsEvent, Context context) throws Exception {
//get the JSON payload
String message = snsEvent.getRecords().get(0).getSNS().getMessage();
//parse JSON
//after retrieving the URL download the image
BufferedImage image = ImageIO.read(imageUrl);
//convert
ImageIO.write(image, “jpeg”, byteArrayOutputStream);
//save to S3
s3Client.putObject(new PutObjectRequest(bucketName, fileName, inputStream,metadata));
//write metadata to Dynamo
Table table = dynamoDB.getTable(dynamoDbTable);
Item item = new Item()
.withString("url_hash", request.getHash())
.withString("url", request.getUrl())
.withString("s3_bucket", s3Bucket);
table.putItem(item);
//create and send a notification
SendMessageRequest sendNewImageCachedMsg = new SendMessageRequest()
.withQueueUrl(queueUrl).withMessageBody(newImageJson);
amazonSqs.sendMessage(sendNewImageCachedMsg);
}
How research and engineering collaborated on the effort
Collaboration
Lessons learned
Where we find benefit and why we use it
• Fantastic for scaling
• Obvious choice
• Very performant when functions are loaded (warm)
• API is easy to use
• Just Java
• Used for two key situations
• Low cost development/pilot efforts
• High volume/throughput
Lessons learned (continued)
Where we find benefit and why we use it
• Cold start performance
• 30 s (cold) as opposed to 400 ms (warm)
• Legacy code vs new development
• Limits on jar sizes
• Message size on Amazon SNS
• 256 KB limit
• Combine functionality in a single Lambda function!
• Easier and cheaper to manage
• Step functions for our use cases were too expensive
EC2 vs Lambda - EC2 based solution
EC2 vs Lambda - Lambda based solution
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Exploring Serverless
Exploring Serverless
Amazon Athena
• Great for exploring data in Amazon S3
• HQL / SQL support
• Partition support
• Use AWS Glue crawlers
• Complements Hadoop cluster
Where else can we use serverless?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Moving Forward with Serverless
Supporting research
Evaluation of capabilities requires infrastructure
AWS Glue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lessons Learned
In Summary
• More and more we lean on AWS serverless services
• We don’t have the resources for operations and maintenance
• Government customers we support prefer serverless solutions
• Makes it easier to provide researchers and engineers with flexible
blueprints for their implementations
• Focus on solving problems not setting up infrastructure
• What are your technical needs. Do we already have something similar?
• Leverage AWS environment to provide easy access to data, services, tools,
and resources
• Pleased with performance
• We can “brute force” solutions if we have to
• Most performance tuning is trivial
• Find most cost-effective use cases for your needs
• We have been able to strike a balance between serverless and managed
• Periodically do spot checks on cost. Upfront calculations may have been
incorrect
In Summary Cont’d
• Go-to tech stack
• Apache NiFi, Amazon S3, AWS Lambda, Amazon SQS, Amazon SNS,
Amazon DynamoDB, Amazon RDS, others as needed
• Take advantage of built-in events / triggers when you can
• Most of the time S3 + events are good enough
• “Free” capability
• We have abandoned Kafka in favor of Apache NiFi site-to-site or Amazon SQS
• Apache Kafka is great, just don’t have the administrative resources to
support. Use AWS alternative, when possible.
• Most “streaming” requests by our customers don’t really require streaming
• Request that researchers and engineers catalog their data and try to follow basic
data lake practices
• Keep raw and enriched / augmented separate
• Add metadata to known events and important time frames
• Enable start / stop and replay to improve evaluation
Questions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...Amazon Web Services
 
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWSUnlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWSAmazon Web Services
 
GAM309-Breathe Life into a Mobile Game_NoNotes.pdf
GAM309-Breathe Life into a Mobile Game_NoNotes.pdfGAM309-Breathe Life into a Mobile Game_NoNotes.pdf
GAM309-Breathe Life into a Mobile Game_NoNotes.pdfAmazon Web Services
 
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...Amazon Web Services
 
Bringing the Superpower of Bots to Your Company with a Serverless Bot Solutio...
Bringing the Superpower of Bots to Your Company with a Serverless Bot Solutio...Bringing the Superpower of Bots to Your Company with a Serverless Bot Solutio...
Bringing the Superpower of Bots to Your Company with a Serverless Bot Solutio...Amazon Web Services
 
BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...
BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...
BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...Amazon Web Services
 
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...Amazon Web Services
 
GPSBUS205_Power to the People- Amazon Connect
GPSBUS205_Power to the People- Amazon ConnectGPSBUS205_Power to the People- Amazon Connect
GPSBUS205_Power to the People- Amazon ConnectAmazon Web Services
 
An Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale MigrationsAn Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale MigrationsAmazon Web Services
 
Build a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million usersBuild a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million usersAmazon Web Services
 
Alexa State of the Science - ALX321 - 2h amazonwebservices Deep Dive into Ama...
Alexa State of the Science - ALX321 - 2h amazonwebservices Deep Dive into Ama...Alexa State of the Science - ALX321 - 2h amazonwebservices Deep Dive into Ama...
Alexa State of the Science - ALX321 - 2h amazonwebservices Deep Dive into Ama...Amazon Web Services
 
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...Amazon Web Services
 
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...Amazon Web Services
 
MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...
MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...
MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...Amazon Web Services
 
GPSTEC318-IoT Security from Manufacturing to Maintenance
GPSTEC318-IoT Security from Manufacturing to MaintenanceGPSTEC318-IoT Security from Manufacturing to Maintenance
GPSTEC318-IoT Security from Manufacturing to MaintenanceAmazon Web Services
 
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBSRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBAmazon Web Services
 
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex Pollexy
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex PollexyMCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex Pollexy
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex PollexyAmazon Web Services
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...Amazon Web Services
 
MCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonMCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonAmazon Web Services
 
WPS301-Navigating HIPAA and HITRUST_QuickStart Guide to Account Gov Strat.pdf
WPS301-Navigating HIPAA and HITRUST_QuickStart Guide to Account Gov Strat.pdfWPS301-Navigating HIPAA and HITRUST_QuickStart Guide to Account Gov Strat.pdf
WPS301-Navigating HIPAA and HITRUST_QuickStart Guide to Account Gov Strat.pdfAmazon Web Services
 

What's hot (20)

NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
 
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWSUnlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWS
 
GAM309-Breathe Life into a Mobile Game_NoNotes.pdf
GAM309-Breathe Life into a Mobile Game_NoNotes.pdfGAM309-Breathe Life into a Mobile Game_NoNotes.pdf
GAM309-Breathe Life into a Mobile Game_NoNotes.pdf
 
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
 
Bringing the Superpower of Bots to Your Company with a Serverless Bot Solutio...
Bringing the Superpower of Bots to Your Company with a Serverless Bot Solutio...Bringing the Superpower of Bots to Your Company with a Serverless Bot Solutio...
Bringing the Superpower of Bots to Your Company with a Serverless Bot Solutio...
 
BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...
BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...
BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...
 
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
 
GPSBUS205_Power to the People- Amazon Connect
GPSBUS205_Power to the People- Amazon ConnectGPSBUS205_Power to the People- Amazon Connect
GPSBUS205_Power to the People- Amazon Connect
 
An Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale MigrationsAn Overview of Best Practices for Large Scale Migrations
An Overview of Best Practices for Large Scale Migrations
 
Build a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million usersBuild a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million users
 
Alexa State of the Science - ALX321 - 2h amazonwebservices Deep Dive into Ama...
Alexa State of the Science - ALX321 - 2h amazonwebservices Deep Dive into Ama...Alexa State of the Science - ALX321 - 2h amazonwebservices Deep Dive into Ama...
Alexa State of the Science - ALX321 - 2h amazonwebservices Deep Dive into Ama...
 
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
 
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
 
MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...
MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...
MAE304-Turners Cloud Archive for CNN's Video Library and Global Multiplatform...
 
GPSTEC318-IoT Security from Manufacturing to Maintenance
GPSTEC318-IoT Security from Manufacturing to MaintenanceGPSTEC318-IoT Security from Manufacturing to Maintenance
GPSTEC318-IoT Security from Manufacturing to Maintenance
 
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBSRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
 
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex Pollexy
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex PollexyMCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex Pollexy
MCL331_Building a Virtual Assistant with Amazon Polly and Amazon Lex Pollexy
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
 
MCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonMCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and Gluon
 
WPS301-Navigating HIPAA and HITRUST_QuickStart Guide to Account Gov Strat.pdf
WPS301-Navigating HIPAA and HITRUST_QuickStart Guide to Account Gov Strat.pdfWPS301-Navigating HIPAA and HITRUST_QuickStart Guide to Account Gov Strat.pdf
WPS301-Navigating HIPAA and HITRUST_QuickStart Guide to Account Gov Strat.pdf
 

Similar to SRV318_Research at PNNL Powered by AWS

AWS Sydney Summit 2013 - Architecting for High Availability
AWS Sydney Summit 2013 - Architecting for High AvailabilityAWS Sydney Summit 2013 - Architecting for High Availability
AWS Sydney Summit 2013 - Architecting for High AvailabilityAmazon Web Services
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
ENT309 scaling up to your first 10 million users
ENT309 scaling up to your first 10 million usersENT309 scaling up to your first 10 million users
ENT309 scaling up to your first 10 million usersAmazon Web Services
 
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...Amazon Web Services
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)Amazon Web Services
 
Avere & AWS Enterprise Solution with Special Bundle Pricing Offer
Avere & AWS Enterprise Solution with Special Bundle Pricing OfferAvere & AWS Enterprise Solution with Special Bundle Pricing Offer
Avere & AWS Enterprise Solution with Special Bundle Pricing OfferAvere Systems
 
OpenSourceIndia-Suman.pptx
OpenSourceIndia-Suman.pptxOpenSourceIndia-Suman.pptx
OpenSourceIndia-Suman.pptxSuman Debnath
 
Escalando hasta sus primeros 10 millones de usuarios
Escalando hasta sus primeros 10 millones de usuariosEscalando hasta sus primeros 10 millones de usuarios
Escalando hasta sus primeros 10 millones de usuariosAmazon Web Services LATAM
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
10 Pro Tips for Scaling Your Startup from 0-10M Users
10 Pro Tips for Scaling Your Startup from 0-10M Users10 Pro Tips for Scaling Your Startup from 0-10M Users
10 Pro Tips for Scaling Your Startup from 0-10M UsersAmazon Web Services
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Migrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSMigrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSTom Laszewski
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIAmazon Web Services
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersAmazon Web Services
 

Similar to SRV318_Research at PNNL Powered by AWS (20)

AWS Sydney Summit 2013 - Architecting for High Availability
AWS Sydney Summit 2013 - Architecting for High AvailabilityAWS Sydney Summit 2013 - Architecting for High Availability
AWS Sydney Summit 2013 - Architecting for High Availability
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
ENT309 scaling up to your first 10 million users
ENT309 scaling up to your first 10 million usersENT309 scaling up to your first 10 million users
ENT309 scaling up to your first 10 million users
 
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
The Boss: A Petascale Database for Large-Scale Neuroscience, Powered by Serve...
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
 
Avere & AWS Enterprise Solution with Special Bundle Pricing Offer
Avere & AWS Enterprise Solution with Special Bundle Pricing OfferAvere & AWS Enterprise Solution with Special Bundle Pricing Offer
Avere & AWS Enterprise Solution with Special Bundle Pricing Offer
 
Managing Your Cloud Assets
Managing Your Cloud AssetsManaging Your Cloud Assets
Managing Your Cloud Assets
 
OpenSourceIndia-Suman.pptx
OpenSourceIndia-Suman.pptxOpenSourceIndia-Suman.pptx
OpenSourceIndia-Suman.pptx
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Escalando hasta sus primeros 10 millones de usuarios
Escalando hasta sus primeros 10 millones de usuariosEscalando hasta sus primeros 10 millones de usuarios
Escalando hasta sus primeros 10 millones de usuarios
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
10 Pro Tips for Scaling Your Startup from 0-10M Users
10 Pro Tips for Scaling Your Startup from 0-10M Users10 Pro Tips for Scaling Your Startup from 0-10M Users
10 Pro Tips for Scaling Your Startup from 0-10M Users
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Migrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSMigrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWS
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Azure JumpStart
Azure JumpStartAzure JumpStart
Azure JumpStart
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

SRV318_Research at PNNL Powered by AWS

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT Research at PNNL: Powered by AWS M i k e G i a r d i n e l l i a n d R a l p h P e r k o P a c i f i c N o r t h w e s t N a t i o n a l L a b o r a t o r y N o v e m b e r 2 8 , 2 0 1 7 Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or Battelle Memorial Institute. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. SRV 318
  • 4. PNNL at a glance $920.4 M In R&D expenditures 104 U.S. and foreign patents granted 1,058 Peer-reviewed publications 2 FLC Awards 5 R&D 100 Awards 4,400 Scientists, engineers and non-technical staff
  • 5. Software engineering at PNNL • Staff focus is research and innovation, not operations • Developers work with scientists to enable research • Limited space and resources for hardware • Big driver for moving to AWS! • Agile is difficult
  • 6. Problem: isolated research • Who are the researchers • Researchers work independently • Focus on innovation and novel concepts • Lack of collaboration with engineers • Creates long delivery times • Product usually isn’t what the customer has envisioned
  • 7. Enabling research with AWS • Research is the life blood of the organization • Researchers should not be troubled with environment configurations, optimizations, etc. • Software engineers provide expertise needed to build applied solutions • Utilizing AWS has been a turning point. • AWS has dramatically helped to improve collaboration. • AWS fits better with our Agile software processes As a result, researchers can focus on the problem
  • 8. Moving to the cloud Our progression to AWS Drivers • Lack of resources internally (hardware and people) • Customer deliverables and demands / deadlines Concerns • Cost • Vendor lock-in Initial Approach • Fork-lift model • Missed out on AWS services • Still had operational headaches Current Approach • Serverless wherever possible
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Image Classification Pipeline
  • 10. Overview Goal Enable novel image classification research on live, streaming data First primarily serverless solution
  • 11. Image retrieval and classification - requirements Requirements • Handle static and streaming media • Scalable, robust, and flexible • Easily deployed and maintained • Extensible (add additional models and instantiations) • Identify optimal ways to collaborate Research and customer requirements
  • 12. Image retrieval and classification Research and engineering implementation
  • 13. Image retrieval and classification How research and engineering collaborated on the effort
  • 15. NiFi overview • Process and distribute data • Message / data routing is very flexible and robust • ETL is painless • Easy to install, scale, configure, and extend • Visually see what is going on with your pipelines • Backpressure and queueing are baked into the flows— excellent for systems that have brittle endpoints • Low barrier to entry; broadens user audience Where we find benefit and why we use it
  • 16. Image cache data flow example
  • 17. NiFi tuning on AWS • C4 and M4 EC2 instance types work well • Scaling: we go vertical, then horizontal • Keep normal CPU load at 50–60% CPU utilization • Set provenance to Volatile • General purpose SSDs work well • Follow the NiFi “Configuration Best Practices” in the admin guide
  • 18. Data flow logic {"image":"http://somewebsite.com/puppies.jpeg"}, {"image":"http://somewebsite.com/kittens.jpeg"}, {"image":"http://somewebsite.com/koalas.jpeg"} Filter “We only care about Koala bears!” Create new message and push to Amazon SNS { "url":"http://somewebsite.com/koalas.jpeg", "hash" : "092f6b17d186adb2e121afcdc7e5470b0c6f82a5", "name" : "koalas.jpeg", "type" : "jpeg", "bucket” : “image-classifier-test” } {"image":"http://somewebsite.com/koalas.jpeg"} Read data from Kafka topic
  • 20. AWS SNS { "Records": [ { "EventVersion": "1.0", "EventSubscriptionArn": "arn:aws:sns:EXAMPLE", "EventSource": "aws:sns", "Sns": { "SignatureVersion": "1", "Timestamp": null, "Signature": "EXAMPLE", "SigningCertUrl": "EXAMPLE", "MessageId": "95df01b4-ee98-5cb9-9903-4c221d41eb5e", "Message": "{"hash": "092f6b17d186adb2e121afcdc7e5470b0c6f82a5", "url": "http://somewebsite.com/koalas.jpeg", "bucketname": "image- classifier-test", "name" : "koalas.jpeg", "type" : "jpeg",}",
  • 22. Why AWS Lambda? Where we went serverless instead of NiFi • Hands down best choice for scaling • Performance, cost, maintenance • Straightforward code • 60+ image requests per second • Required 4+ large EC2 instances clustered • One month pilot • AWS Lambda • 161,490,065 requests; 61,490,065 seconds • $1,050 • Amazon EC2: • 4 EC2 m4.10xlarge instances • $3,329 (reserved)
  • 23. AWS Lambda Where we went serverless instead of NiFi
  • 24. Lambda code example public void handleRequest(SNSEvent snsEvent, Context context) throws Exception { //get the JSON payload String message = snsEvent.getRecords().get(0).getSNS().getMessage(); //parse JSON //after retrieving the URL download the image BufferedImage image = ImageIO.read(imageUrl); //convert ImageIO.write(image, “jpeg”, byteArrayOutputStream); //save to S3 s3Client.putObject(new PutObjectRequest(bucketName, fileName, inputStream,metadata)); //write metadata to Dynamo Table table = dynamoDB.getTable(dynamoDbTable); Item item = new Item() .withString("url_hash", request.getHash()) .withString("url", request.getUrl()) .withString("s3_bucket", s3Bucket); table.putItem(item); //create and send a notification SendMessageRequest sendNewImageCachedMsg = new SendMessageRequest() .withQueueUrl(queueUrl).withMessageBody(newImageJson); amazonSqs.sendMessage(sendNewImageCachedMsg); }
  • 25. How research and engineering collaborated on the effort Collaboration
  • 26. Lessons learned Where we find benefit and why we use it • Fantastic for scaling • Obvious choice • Very performant when functions are loaded (warm) • API is easy to use • Just Java • Used for two key situations • Low cost development/pilot efforts • High volume/throughput
  • 27. Lessons learned (continued) Where we find benefit and why we use it • Cold start performance • 30 s (cold) as opposed to 400 ms (warm) • Legacy code vs new development • Limits on jar sizes • Message size on Amazon SNS • 256 KB limit • Combine functionality in a single Lambda function! • Easier and cheaper to manage • Step functions for our use cases were too expensive
  • 28. EC2 vs Lambda - EC2 based solution
  • 29. EC2 vs Lambda - Lambda based solution
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Exploring Serverless Exploring Serverless
  • 31. Amazon Athena • Great for exploring data in Amazon S3 • HQL / SQL support • Partition support • Use AWS Glue crawlers • Complements Hadoop cluster Where else can we use serverless?
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Moving Forward with Serverless
  • 33. Supporting research Evaluation of capabilities requires infrastructure AWS Glue
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lessons Learned
  • 35. In Summary • More and more we lean on AWS serverless services • We don’t have the resources for operations and maintenance • Government customers we support prefer serverless solutions • Makes it easier to provide researchers and engineers with flexible blueprints for their implementations • Focus on solving problems not setting up infrastructure • What are your technical needs. Do we already have something similar? • Leverage AWS environment to provide easy access to data, services, tools, and resources • Pleased with performance • We can “brute force” solutions if we have to • Most performance tuning is trivial • Find most cost-effective use cases for your needs • We have been able to strike a balance between serverless and managed • Periodically do spot checks on cost. Upfront calculations may have been incorrect
  • 36. In Summary Cont’d • Go-to tech stack • Apache NiFi, Amazon S3, AWS Lambda, Amazon SQS, Amazon SNS, Amazon DynamoDB, Amazon RDS, others as needed • Take advantage of built-in events / triggers when you can • Most of the time S3 + events are good enough • “Free” capability • We have abandoned Kafka in favor of Apache NiFi site-to-site or Amazon SQS • Apache Kafka is great, just don’t have the administrative resources to support. Use AWS alternative, when possible. • Most “streaming” requests by our customers don’t really require streaming • Request that researchers and engineers catalog their data and try to follow basic data lake practices • Keep raw and enriched / augmented separate • Add metadata to known events and important time frames • Enable start / stop and replay to improve evaluation
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!