The document discusses Netflix's cloud architecture on Amazon Web Services (AWS). It aims to be faster, scalable, available and allow developers to work more productively. Some key points are moving from a central SQL database to distributed NoSQL stores, replacing sticky in-memory sessions with a shared cache, and optimizing for latency tolerance over chatty protocols. The architecture also focuses on layered service interfaces over tangled code and instrumenting services rather than code.
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Adrian Cockcroft
Presentation given in October 2011 at the High Performance Transaction Systems Workshop http://hpts.ws - describes how Netflix used AWS to run a set of highly scalable Cassandra benchmarks on hundreds of instances in only a few hours.
This is the meat of the presentation, it describes in detail how do use anti-architecture to define what gets done, then discusses patterns, type systems, PaaS frameworks, services and components. There is a detailed explanation of Cassandra as a data store and open source components.
Join AWS at this session to understand how to architect an infrastructure to handle going from zero to millions of users. From leveraging highly scalable AWS services to making smart decisions on building out your application, you'll learn a number of best practices for scaling your infrastructure in the cloud.
Speakers:
Andreas Chatzakis, AWS Solutions Architect
Pete Mounce, Senior Developer, JustEat
In addition to running databases in Amazon EC2, AWS customers can choose among a variety of managed database services. These services save effort, save time, and unlock new capabilities and economies. In this session, we make it easy to understand how they differ, what they have in common, and how to choose one or more. We explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service; Amazon RDS, a relational database service in the cloud; Amazon ElastiCache, a fast, in-memory caching service in the cloud; and Amazon Redshift, a fully managed, petabyte-scale data-warehouse solution that can be surprisingly economical. We’ll cover how each service might help support your application, how much each service costs, and how to get started.
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance DatabaseAmazon Web Services
Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and re-sizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. In this webinar we review the different types of Amazon RDS available and how to move your existing databases to Amazon RDS with minimum disruption.
Reasons to attend:
- Learn how Amazon RDS can reduce the overhead of running high performance mission critical databases.
- Learn how to migrate your existing database workloads into Amazon RDS running on the AWS Cloud.
- Learn how to scale up and scale down your Amazon RDS instance and save money with reserved instances.
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Adrian Cockcroft
Presentation given in October 2011 at the High Performance Transaction Systems Workshop http://hpts.ws - describes how Netflix used AWS to run a set of highly scalable Cassandra benchmarks on hundreds of instances in only a few hours.
This is the meat of the presentation, it describes in detail how do use anti-architecture to define what gets done, then discusses patterns, type systems, PaaS frameworks, services and components. There is a detailed explanation of Cassandra as a data store and open source components.
Join AWS at this session to understand how to architect an infrastructure to handle going from zero to millions of users. From leveraging highly scalable AWS services to making smart decisions on building out your application, you'll learn a number of best practices for scaling your infrastructure in the cloud.
Speakers:
Andreas Chatzakis, AWS Solutions Architect
Pete Mounce, Senior Developer, JustEat
In addition to running databases in Amazon EC2, AWS customers can choose among a variety of managed database services. These services save effort, save time, and unlock new capabilities and economies. In this session, we make it easy to understand how they differ, what they have in common, and how to choose one or more. We explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service; Amazon RDS, a relational database service in the cloud; Amazon ElastiCache, a fast, in-memory caching service in the cloud; and Amazon Redshift, a fully managed, petabyte-scale data-warehouse solution that can be surprisingly economical. We’ll cover how each service might help support your application, how much each service costs, and how to get started.
Day 2 - Amazon RDS - Letting AWS run your Low Admin, High Performance DatabaseAmazon Web Services
Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and re-sizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. In this webinar we review the different types of Amazon RDS available and how to move your existing databases to Amazon RDS with minimum disruption.
Reasons to attend:
- Learn how Amazon RDS can reduce the overhead of running high performance mission critical databases.
- Learn how to migrate your existing database workloads into Amazon RDS running on the AWS Cloud.
- Learn how to scale up and scale down your Amazon RDS instance and save money with reserved instances.
Are you challenged today with getting non-digital information into a digital format? Are you trying to find the most cost effective storage solutions for your digital content? Do you want to share your libraries rich information with a global audience? Attend this webinar to learn how to digitize, store and share your information quickly, efficiently and at the lowest cost possible.
Slides from QConSF Nov 19th, 2011 focusing this time on describing the globally distributed and scaled industrial strength Java Platform as a Service that Netflix has built and run on top of AWS and Cassandra. Parts of that platform are being released as open source - Curator, Priam and Astyanax.
[Full slides now also available at http://www.slideshare.net/adrianco/netflix-on-cloud-combined-slides-for-dev-and-ops]
Short summary of why Netflix is running on the Amazon cloud, what is running there, what we have learned and where this is taking us.
This is the introduction section to a series of public presentations that will go into much more detail. The Silicon Valley Cloud Computing Meetup was on Oct 14th, QCon San Francisco November 3rd.
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
Architecture overview of Netflix Cloud Architecture with a focus on the Open Source components that Netflix has put and is planning to release on http://netflix.github.com
A recap of some of the most interesting things learned from the AWS re:Invent 2013 Conference. Easily the most intense and educational conference I've ever attended.
Kin Wilms, AWS Solutions Architect's presentation to the Production & Post-Production track at the Media & Entertainment Cloud Symposium on Nov 4, 2016
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...Amazon Web Services
Every day, the computing power of high-performance computing (HPC) clusters helps scientists make breakthroughs, such as proving the existence of gravitational waves and screening new compounds for new drugs. Yet building HPC clusters is out of reach for most organizations, due to the upfront hardware costs and ongoing operational expenses. Now the speed of innovation is only bound by your imagination, not your budget. Researchers can run one cluster for 10,000 hours or 10,000 clusters for one hour anytime, from anywhere, and both cost the same in the cloud. And with the availability of Public Data Sets in Amazon S3, petabyte scale data is instantly accessible in the cloud. Attend and learn how to build HPC clusters on the fly, leverage Amazon’s Spot market pricing to minimize the cost of HPC jobs, and scale HPC jobs on a small budget, using all the same tools you use today, and a few new ones too.
Latest version of the Netflix Cloud Architecture story was given at Gluecon May 23rd 2012. Gluecon rocks, and lots of Van Halen references were added for the occasion. There tradeoff between developer driven high functionality AWS based PaaS, and operations driven low cost portable PaaS is discussed. The three sections cover the developer view, the operator view and the builder view.
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)Amazon Web Services
High performance computing in the cloud is enabling high scale compute- and graphics-intensive workloads across industries, ranging from aerospace, automotive, and manufacturing to life sciences, financial services, and energy. AWS provides application developers and end users with unprecedented computational power for massively parallel applications, in areas such as large-scale fluid and materials simulations, 3D content rendering, financial computing, and deep learning. This session provides an overview of HPC capabilities on AWS, describes the newest generations of accelerated computing instances (including P2), as well as highlighting customer and partner use-cases across industries.
Attendees learn about best practices for running HPC workflows in the cloud, including graphical pre- and post-processing, workflow automation, and optimization. Attendees also learn about new and emerging HPC use cases: in particular, deep learning training and inference, large-scale simulations, and high performance data analytics.
Session presented at the 2nd IndicThreads.com Conference on Cloud Computing held in Pune, India on 3-4 June 2011.
http://CloudComputing.IndicThreads.com
Abstract:“With increasing demand, ever-growing datasets, unpredictable traffic patterns and need for faster response times, “scalable architecture” has become a necessity. Here, we will see how the traditional concepts and best practices for scalability have to be adopted for the cloud. Further, we will go through the unique advantages that Amazon AWS cloud offers for architecting scalable applications. As an architect, you need to identify the components and bottlenecks in your architecture and modify your application to leverage the underlying scalability.
We will cover the following topics:
Scalability principles for the cloud
Leveraging AWS services for application components
Shared nothing architecture
Asynchronous work queues for loosely coupled applications
Database scalability
Tools, connectors and enablers to help build, deploy and monitor your cloud environment
Scalability using Platform-as-a-Service offerings on top of AWS
An example of a horizontally scalable architecture for an enterprise application on Amazon AWS
This talk will act as a primer for a cloud architect to achieve an auto-scalable, highly available, fully-monitored edge-cached application.”
Speaker:
Kalpak Shah is the Founder & CEO of Clogeny Technologies Pvt. Ltd. and guides the overall strategic direction of the company. Clogeny is focused on niche software and product development in cloud computing and scalable applications domains. He is passionate about the ground-breaking economics and technology afforded by the cloud computing platforms. He has been leading and architecting cutting-edge product development across the cloud stack including IaaS, PaaS and SaaS vendors.
He has previously worked at organizations like Sun Microsystems and Symantec in the storage domain primarily distributed and disk filesystems. Kalpak has a Bachelors’ of Engineering degree in computer engineering from PICT, University of Pune.
AWS Evangelist, Ryan Shuttleworth, explores the extended features of AWS S3 in this Masterclass webinar.
AWS S3 hosts over 1.3 trillion objects and is used for storing a wide range of data, from system backups, web site assets and digital media. In this webinar we will explain the features of S3 from static website hosting, through server side encryption to Glacier integration. We'll dive deep into the feature sets of S3 to give a rounded overview of its capabilities, looking at common use cases, APIs and best practice.
To see the recording and demostration for this webinar on YouTube, please click on the following links:
Masterclass Webinar: Amazon S3 Recording - http://www.youtube.com/watch?v=HHuRJZChCYQ
Masterclass Webinar: Amazon S3 Demonstration - http://www.youtube.com/watch?v=JuffWMBeJkw
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...Rustem Feyzkhanov
One of the main issues with ML and DL deployment is finding the right way to train and operationalize the model within the company. Serverless approach for deep learning provides simple, scalable, affordable yet reliable architecture. The challenge of this approach is to keep in mind certain limitations in CPU, GPU and RAM, and organize training and inference of your model.
My presentation will show how to utilize services like Amazon SageMaker, AWS Batch, AWS Fargate, AWS Lambda and AWS Step Functions to organize deep learning workflows.
Are you challenged today with getting non-digital information into a digital format? Are you trying to find the most cost effective storage solutions for your digital content? Do you want to share your libraries rich information with a global audience? Attend this webinar to learn how to digitize, store and share your information quickly, efficiently and at the lowest cost possible.
Slides from QConSF Nov 19th, 2011 focusing this time on describing the globally distributed and scaled industrial strength Java Platform as a Service that Netflix has built and run on top of AWS and Cassandra. Parts of that platform are being released as open source - Curator, Priam and Astyanax.
[Full slides now also available at http://www.slideshare.net/adrianco/netflix-on-cloud-combined-slides-for-dev-and-ops]
Short summary of why Netflix is running on the Amazon cloud, what is running there, what we have learned and where this is taking us.
This is the introduction section to a series of public presentations that will go into much more detail. The Silicon Valley Cloud Computing Meetup was on Oct 14th, QCon San Francisco November 3rd.
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
Architecture overview of Netflix Cloud Architecture with a focus on the Open Source components that Netflix has put and is planning to release on http://netflix.github.com
A recap of some of the most interesting things learned from the AWS re:Invent 2013 Conference. Easily the most intense and educational conference I've ever attended.
Kin Wilms, AWS Solutions Architect's presentation to the Production & Post-Production track at the Media & Entertainment Cloud Symposium on Nov 4, 2016
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...Amazon Web Services
Every day, the computing power of high-performance computing (HPC) clusters helps scientists make breakthroughs, such as proving the existence of gravitational waves and screening new compounds for new drugs. Yet building HPC clusters is out of reach for most organizations, due to the upfront hardware costs and ongoing operational expenses. Now the speed of innovation is only bound by your imagination, not your budget. Researchers can run one cluster for 10,000 hours or 10,000 clusters for one hour anytime, from anywhere, and both cost the same in the cloud. And with the availability of Public Data Sets in Amazon S3, petabyte scale data is instantly accessible in the cloud. Attend and learn how to build HPC clusters on the fly, leverage Amazon’s Spot market pricing to minimize the cost of HPC jobs, and scale HPC jobs on a small budget, using all the same tools you use today, and a few new ones too.
Latest version of the Netflix Cloud Architecture story was given at Gluecon May 23rd 2012. Gluecon rocks, and lots of Van Halen references were added for the occasion. There tradeoff between developer driven high functionality AWS based PaaS, and operations driven low cost portable PaaS is discussed. The three sections cover the developer view, the operator view and the builder view.
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)Amazon Web Services
High performance computing in the cloud is enabling high scale compute- and graphics-intensive workloads across industries, ranging from aerospace, automotive, and manufacturing to life sciences, financial services, and energy. AWS provides application developers and end users with unprecedented computational power for massively parallel applications, in areas such as large-scale fluid and materials simulations, 3D content rendering, financial computing, and deep learning. This session provides an overview of HPC capabilities on AWS, describes the newest generations of accelerated computing instances (including P2), as well as highlighting customer and partner use-cases across industries.
Attendees learn about best practices for running HPC workflows in the cloud, including graphical pre- and post-processing, workflow automation, and optimization. Attendees also learn about new and emerging HPC use cases: in particular, deep learning training and inference, large-scale simulations, and high performance data analytics.
Session presented at the 2nd IndicThreads.com Conference on Cloud Computing held in Pune, India on 3-4 June 2011.
http://CloudComputing.IndicThreads.com
Abstract:“With increasing demand, ever-growing datasets, unpredictable traffic patterns and need for faster response times, “scalable architecture” has become a necessity. Here, we will see how the traditional concepts and best practices for scalability have to be adopted for the cloud. Further, we will go through the unique advantages that Amazon AWS cloud offers for architecting scalable applications. As an architect, you need to identify the components and bottlenecks in your architecture and modify your application to leverage the underlying scalability.
We will cover the following topics:
Scalability principles for the cloud
Leveraging AWS services for application components
Shared nothing architecture
Asynchronous work queues for loosely coupled applications
Database scalability
Tools, connectors and enablers to help build, deploy and monitor your cloud environment
Scalability using Platform-as-a-Service offerings on top of AWS
An example of a horizontally scalable architecture for an enterprise application on Amazon AWS
This talk will act as a primer for a cloud architect to achieve an auto-scalable, highly available, fully-monitored edge-cached application.”
Speaker:
Kalpak Shah is the Founder & CEO of Clogeny Technologies Pvt. Ltd. and guides the overall strategic direction of the company. Clogeny is focused on niche software and product development in cloud computing and scalable applications domains. He is passionate about the ground-breaking economics and technology afforded by the cloud computing platforms. He has been leading and architecting cutting-edge product development across the cloud stack including IaaS, PaaS and SaaS vendors.
He has previously worked at organizations like Sun Microsystems and Symantec in the storage domain primarily distributed and disk filesystems. Kalpak has a Bachelors’ of Engineering degree in computer engineering from PICT, University of Pune.
AWS Evangelist, Ryan Shuttleworth, explores the extended features of AWS S3 in this Masterclass webinar.
AWS S3 hosts over 1.3 trillion objects and is used for storing a wide range of data, from system backups, web site assets and digital media. In this webinar we will explain the features of S3 from static website hosting, through server side encryption to Glacier integration. We'll dive deep into the feature sets of S3 to give a rounded overview of its capabilities, looking at common use cases, APIs and best practice.
To see the recording and demostration for this webinar on YouTube, please click on the following links:
Masterclass Webinar: Amazon S3 Recording - http://www.youtube.com/watch?v=HHuRJZChCYQ
Masterclass Webinar: Amazon S3 Demonstration - http://www.youtube.com/watch?v=JuffWMBeJkw
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...Rustem Feyzkhanov
One of the main issues with ML and DL deployment is finding the right way to train and operationalize the model within the company. Serverless approach for deep learning provides simple, scalable, affordable yet reliable architecture. The challenge of this approach is to keep in mind certain limitations in CPU, GPU and RAM, and organize training and inference of your model.
My presentation will show how to utilize services like Amazon SageMaker, AWS Batch, AWS Fargate, AWS Lambda and AWS Step Functions to organize deep learning workflows.
For people who start to create a cloud service, it’s really important to know how to create a scalable cloud service to fit the growth of the future workloads. In this session, we will introduce how to design a scalable cloud service including AWS services introduction and best practices.
O'Reilly Webcast: Architecting Applications For The CloudO'Reilly Media
This presentation analyzes aspects of the Amazon EC2 IaaS cloud environment that differ from a traditional data center and introduces general best practices for ensuring data privacy, storage persistence, and reliable DBMS backup. Presented by Jorge Noa, CTO of Hyperstratus
Amazon Web Services (AWS) can make hosting scalable, highly-available websites and web applications easier and less expensive for the Enterprise Education customers. Join us for an informative webinar on tools AWS provides to elastically scale your architecture to avoid underutilized resources while reducing complexity with templates, partners, and tools to do much of the heavy lifting of creating and running a website for you.
source: http://www.sfbayacm.org/?p=1394
The specifics of a cloud’s computing architecture may have an impact on application design. This is particularly important in Infrastructure as a Service (IaaS) cloud environments.
This presentation analyzes aspects of the Amazon EC2 IaaS cloud environment that differ from a traditional datacenter and introduces general best practices for ensuring data privacy, storage persistence, and reliable DBMS backup. Best practices for application robustness and scalability on demand are reviewed and are especially significant in leveraging the full potential of an IaaS cloud. The need for a cloud application management and configuration system is briefly reviewed and two alternate approaches to cloud application management are described (RightScale and Kaavo).
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premise deployments to Amazon EMR in order to save costs, increase availability, and improve performance. Amazon EMR is a managed service that lets you process and analyze extremely large data sets using the latest versions of over 15 open-source frameworks in the Apache Hadoop and Spark ecosystems. This session will focus on identifying the components and workflows in your current environment and providing the best practices to migrate these workloads to Amazon EMR. We will explain how to move from HDFS to Amazon S3 as a durable storage layer, and how to lower costs with Amazon EC2 Spot instances and Auto Scaling. Additionally, we will go over common security recommendations and tuning tips to accelerate the time to production.
Learn about the patterns and techniques a business should be using in building their infrastructure on Amazon Web Services to be able to handle rapid growth and success in the early days. From leveraging highly scalable AWS services, to architecting best patterns, there are a number of smart choices you can make early on to help you overcome some typical infrastructure issues.
Presenter: Chris Munns,Solutions Architect, Amazon Web Services
AWS re:Invent 2016: High Performance Cinematic Production in the Cloud (MAE304)Amazon Web Services
The process of making a film is highly complex, and comprises of multiple workflows across story development, pre-production, production, post-production and final distribution. Given the size and amount of media and assets associated with each stage, high performance infrastructure is often essential to meeting deadlines.
In this session we will take a deeper dive at running a full cinematic production in the cloud, with a focus on solutions for each of the production stages. We will also look at best practices around design, optimization, performance, scheduling, scalability and low latency utilizing AWS technologies such as EC2, Lambda, Snowball, Direct Connect, and Partner Solutions.
1. Ne#lix
Cloud
Architecture
Qcon
Beijing
April
9,
2011
Adrian
Cockcro>
@adrianco
#ne#lixcloud
hAp://slideshare.net/adrianco
acockcro>@ne#lix.com
2. (ConHnuing
from
Keynote
Talk)
Who,
Why,
What
Ne#lix
in
the
Cloud
Cloud
Challenges
and
Learnings
Systems
and
OperaHons
Architecture
3. Amazon Cloud Terminology
See http://aws.amazon.com/ for details
This is not a full list of Amazon Web Service features
• AWS
–
Amazon
Web
Services
(common
name
for
Amazon
cloud)
• AMI
–
Amazon
Machine
Image
(archived
boot
disk,
Linux,
Windows
etc.
plus
applicaHon
code)
• EC2
–
ElasHc
Compute
Cloud
– Range
of
virtual
machine
types
m1,
m2,
c1,
cc,
cg.
Varying
memory,
CPU
and
disk
configuraHons.
– Instance
–
a
running
computer
system.
Ephemeral,
when
it
is
de-‐allocated
nothing
is
kept.
– Reserved
Instances
–
pre-‐paid
to
reduce
cost
for
long
term
usage
– Availability
Zone
–
datacenter
with
own
power
and
cooling
hosHng
cloud
instances
– Region
–
group
of
Availability
Zones
–
US-‐East,
US-‐West,
EU-‐Eire,
Asia-‐Singapore,
Asia-‐Japan
• ASG
–
Auto
Scaling
Group
(instances
booHng
from
the
same
AMI)
• S3
–
Simple
Storage
Service
(hAp
access)
• EBS
–
ElasHc
Block
Storage
(network
disk
filesystem
can
be
mounted
on
an
instance)
• RDB
–
RelaHonal
Data
Base
(managed
MySQL
master
and
slaves)
• SDB
–
Simple
Data
Base
(hosted
hAp
based
NoSQL
data
store)
• SQS
–
Simple
Queue
Service
(hAp
based
message
queue)
• SNS
–
Simple
NoHficaHon
Service
(hAp
and
email
based
topics
and
messages)
• EMR
–
ElasHc
Map
Reduce
(automaHcally
managed
Hadoop
cluster)
• ELB
–
ElasHc
Load
Balancer
• EIP
–
ElasHc
IP
(stable
IP
address
mapping
assigned
to
instance
or
ELB)
• VPC
–
Virtual
Private
Cloud
(extension
of
enterprise
datacenter
network
into
cloud)
• IAM
–
IdenHty
and
Access
Management
(fine
grain
role
based
security
keys)
4. Ne#lix
Deployed
on
AWS
Content
Logs
Play
WWW
API
Video
S3
DRM
Search
Metadata
Masters
Movie
EC2
EMR
Hadoop
CDN
rouHng
Device
Config
Choosing
TV
Movie
S3
Hive
Bookmarks
RaHngs
Choosing
Content
Business
Mobile
Delivery
Logging
Similars
Intelligence
iPhone
Network
CDN
6. Product
Trade-‐off
User
Experience
ImplementaHon
Consistent
Development
Experience
complexity
OperaHonal
Low
Latency
complexity
7. Synopsis
• The
Goals
– Faster,
Scalable,
Available
and
ProducHve
• AnH-‐paAerns
and
Cloud
Architecture
– The
things
we
wanted
to
change
and
why
• Capacity
Planning
and
Monitoring
• Next
Steps
8. Ne#lix
Cloud
Goals
• Faster
– Lower
latency
than
the
equivalent
datacenter
web
pages
and
API
calls
– Measured
as
mean
and
99th
percenHle
– For
both
first
hit
(e.g.
home
page)
and
in-‐session
hits
for
the
same
user
• Scalable
– Avoid
needing
any
more
datacenter
capacity
as
subscriber
count
increases
– No
central
verHcally
scaled
databases
– Leverage
AWS
elasHc
capacity
effecHvely
• Available
– SubstanHally
higher
robustness
and
availability
than
datacenter
services
– Leverage
mulHple
AWS
availability
zones
– No
scheduled
down
Hme,
no
central
database
schema
to
change
• ProducHve
– OpHmize
agility
of
a
large
development
team
with
automaHon
and
tools
– Leave
behind
complex
tangled
datacenter
code
base
(~8
year
old
architecture)
– Enforce
clean
layered
interfaces
and
re-‐usable
components
9. Old
Datacenter
vs.
New
Cloud
Arch
Central
SQL
Database
Distributed
Key/Value
NoSQL
SHcky
In-‐Memory
Session
Shared
Memcached
Session
ChaAy
Protocols
Latency
Tolerant
Protocols
Tangled
Service
Interfaces
Layered
Service
Interfaces
Instrumented
Code
Instrumented
Service
PaAerns
Fat
Complex
Objects
Lightweight
Serializable
Objects
Components
as
Jar
Files
Components
as
Services
10. The
Central
SQL
Database
• Datacenter
has
a
central
database
– Everything
in
one
place
is
convenient
unHl
it
fails
– Customers,
movies,
history,
configuraHon
• Schema
changes
require
downHme
This
An(-‐pa,ern
impacts
scalability,
availability
11. The
Distributed
Key-‐Value
Store
• Cloud
has
many
key-‐value
data
stores
– More
complex
to
keep
track
of,
do
backups
etc.
– Each
store
is
much
simpler
to
administer
– Joins
take
place
in
java
code
DBA
• No
schema
to
change,
no
scheduled
downHme
• Latency
for
Memcached
vs.
Oracle
vs.
SimpleDB
– Memcached
is
dominated
by
network
latency
<1ms
– Oracle
for
simple
queries
is
a
few
milliseconds
– SimpleDB
has
replicaHon
and
REST
overheads
>10ms
12. Database
MigraHon
• Why
SimpleDB?
– No
DBA’s
in
the
cloud,
Amazon
hosted
service
– Work
started
two
years
ago,
fewer
viable
opHons
– Worked
with
Amazon
to
speed
up
and
scale
SimpleDB
• AlternaHves?
– Now
rolling
out
Cassandra
as
“upgrade”
from
SimpleDB
– Need
several
opHons
to
match
use
cases
well
• Detailed
NoSQL
and
SimpleDB
Advice
– Sid
Anand
-‐
QConSF
Nov
5th
–
Ne#lix’
TransiHon
to
High
Availability
Storage
Systems
– Blog
-‐
hAp://pracHcalcloudcompuHng.com/
– Download
Paper
PDF
-‐
hAp://bit.ly/bhOTLu
13. Oracle
to
SimpleDB
(See
Sid’s
paper
for
details)
• SimpleDB
Domains
– De-‐normalize
mulHple
tables
into
a
single
domain
– Work
around
size
limits
(10GB
per
domain,
1KB
per
key)
– Shard
data
across
domains
to
scale
– Key
–
Use
distributed
sequence
generator,
GUID
or
natural
unique
key
such
as
customer-‐id
– Implement
a
schema
validator
to
catch
bad
aAributes
• ApplicaHon
layer
support
– Do
GROUP
BY
and
JOIN
operaHons
in
the
applicaHon
– Compose
relaHons
in
the
applicaHon
layer
– Check
constraints
on
read,
and
repair
data
as
a
side
effect
• Do
without
triggers,
PL/SQL,
clock
operaHons
14. The
SHcky
Session
• Datacenter
SHcky
Load
Balancing
– Efficient
caching
for
low
latency
– Tricky
session
handling
code
– Middle
Her
load
balancer
has
issues
in
pracHce
• Encourages
concentrated
funcHonality
– one
service
that
does
everything
This
An(-‐pa,ern
impacts
produc(vity,
availability
15. The
Shared
Session
• Cloud
Uses
Round-‐Robin
Load
Balancing
– Simple
request-‐based
code
– External
shared
caching
with
memcached
• More
flexible
fine
grain
services
– Works
beAer
with
auto-‐scaled
instance
counts
16. ChaAy
Opaque
and
BriAle
Protocols
• Datacenter
service
protocols
– Assumed
low
latency
for
many
simple
requests
• Based
on
serializing
exisHng
java
objects
– Inefficient
formats
– IncompaHble
when
definiHons
change
This
An(-‐pa,ern
causes
produc(vity,
latency
and
availability
issues
17. Robust
and
Flexible
Protocols
• Cloud
service
protocols
– JSR311/Jersey
is
used
for
REST/HTTP
service
calls
– Custom
client
code
includes
service
discovery
– Support
complex
data
types
in
a
single
request
• Apache
Avro
– Evolved
from
Protocol
Buffers
and
Thri>
– Includes
JSON
header
defining
key/value
protocol
– Avro
serializaHon
is
half
the
size
and
several
Hmes
faster
than
Java
serializaHon,
more
work
to
code
18. Persisted
Protocols
• Persist
Avro
in
Memcached
– Save
space/latency
(zigzag
encoding,
half
the
size)
– Less
briAle
across
versions
– New
keys
are
ignored
– Missing
keys
are
handled
cleanly
• Avro
protocol
definiHons
– Can
be
wriAen
in
JSON
or
generated
from
POJOs
– It’s
hard,
needs
beAer
tooling
19. Tangled
Service
Interfaces
• Datacenter
implementaHon
is
exposed
– Oracle
SQL
queries
mixed
into
business
logic
• Tangled
code
– Deep
dependencies,
false
sharing
• Data
providers
with
sideways
dependencies
– Everything
depends
on
everything
else
This
An(-‐pa,ern
affects
produc(vity,
availability
20. Untangled
Service
Interfaces
• New
Cloud
Code
With
Strict
Layering
– Compile
against
interface
jar
– Can
use
spring
runHme
binding
to
enforce
• Service
interface
is
the
service
– ImplementaHon
is
completely
hidden
– Can
be
implemented
locally
or
remotely
– ImplementaHon
can
evolve
independently
21. Untangled
Service
Interfaces
Two
layers:
• SAL
-‐
Service
Access
Library
– Basic
serializaHon
and
error
handling
– REST
or
POJO’s
defined
by
data
provider
• ESL
-‐
Extended
Service
Library
– Caching,
conveniences
– Can
combine
several
SALs
– Exposes
faceted
type
system
(described
later)
– Interface
defined
by
data
consumer
in
many
cases
23. Service
Architecture
PaAerns
• Internal
Interfaces
Between
Services
– Common
paAerns
as
templates
– Highly
instrumented,
observable,
analyHcs
– Service
Level
Agreements
–
SLAs
• Library
templates
for
generic
features
– Instrumented
Ne#lix
Base
Servlet
template
– Instrumented
generic
client
interface
template
– Instrumented
S3,
SimpleDB,
Memcached
clients
24. CLIENT
Request
Start
Timestamp,
Client
Inbound
Request
End
outbound
deserialize
end
Timestamp
serialize
start
Hmestamp
Hmestamp
Inbound
Client
deserialize
outbound
start
serialize
end
Hmestamp
Hmestamp
Client
network
receive
Hmestamp
Service
Request
Client
Network
send
Hmestamp
Instruments
Every
Service
network
send
Hmestamp
Step
in
the
call
Service
Network
receive
Hmestamp
Service
Service
outbound
inbound
serialize
end
serialize
start
Hmestamp
Hmestamp
Service
Service
outbound
inbound
serialize
start
SERVICE
execute
serialize
end
request
start
Hmestamp
Hmestamp
Hmestamp,
execute
request
end
Hmestamp
25. Boundary
Interfaces
• Isolate
teams
from
external
dependencies
– Fake
SAL
built
by
cloud
team
– Real
SAL
provided
by
data
provider
team
later
– ESL
built
by
cloud
team
using
faceted
objects
• Fake
data
sources
allow
development
to
start
– e.g.
Fake
IdenHty
SAL
for
a
test
set
of
customers
– Development
solidifies
dependencies
early
– Helps
external
team
provide
the
right
interface
26. One
Object
That
Does
Everything
• Datacenter
uses
a
few
big
complex
objects
– Movie
and
Customer
objects
are
the
foundaHon
– Good
choice
for
a
small
team
and
one
instance
– ProblemaHc
for
large
teams
and
many
instances
• False
sharing
causes
tangled
dependencies
– UnproducHve
re-‐integraHon
work
An(-‐pa,ern
impac(ng
produc(vity
and
availability
27. An
Interface
For
Each
Component
• Cloud
uses
faceted
Video
and
Visitor
– Basic
types
hold
only
the
idenHfier
– Facets
scope
the
interface
you
actually
need
– Each
component
can
define
its
own
facets
• No
false-‐sharing
and
dependency
chains
– Type
manager
converts
between
facets
as
needed
– video.asA(PresentaHonVideo)
for
www
– video.asA(MerchableVideo)
for
middle
Her
28. So>ware
Architecture
PaAerns
• Object
Models
– Basic
and
derived
types,
facets,
serializable
– Pass
by
reference
within
a
service
– Pass
by
value
between
services
• ComputaHon
and
I/O
Models
– Service
ExecuHon
using
Best
Effort
– Common
thread
pool
management
29. Cloud
OperaHons
Model
Driven
Architecture
Capacity
Planning
&
Monitoring
30. Tools
and
AutomaHon
• Developer
and
Build
Tools
– Jira,
Eclipse,
Jeeves,
Ivy,
ArHfactory
– Builds,
creates
.war
file,
.rpm,
bakes
AMI
and
launches
• Custom
Ne#lix
ApplicaHon
Console
– AWS
Features
at
Enterprise
Scale
(hide
the
AWS
security
keys!)
– Auto
Scaler
Group
is
unit
of
deployment
to
producHon
• Open
Source
+
Support
– Apache,
Tomcat,
Cassandra,
Hadoop,
OpenJDK/SunJDK,
CentOS/AmazonLinux
• Monitoring
Tools
– Keynote
–
service
monitoring
and
alerHng
– AppDynamics
–
Developer
focus
for
cloud
hAp://appdynamics.com
– EpicNMS
–
flexible
data
collecHon
and
plots
hAp://epicnms.com
– Nimso>
NMS
–
ITOps
focus
for
Datacenter
+
Cloud
alerHng
31. Model
Driven
Architecture
• Datacenter
PracHces
– Lots
of
unique
hand-‐tweaked
systems
– Hard
to
enforce
paAerns
• Model
Driven
Cloud
Architecture
– Perforce/Ivy/Jeeves
based
builds
for
everything
– Every
producHon
instance
is
a
pre-‐baked
AMI
– Every
applicaHon
is
managed
by
an
Autoscaler
No
excep(ons,
every
change
is
a
new
AMI
32. Model
Driven
ImplicaHons
• Automated
“Least
Privilege”
Security
– Tightly
specified
security
groups
– Fine
grain
IAM
keys
to
access
AWS
resources
– Performance
tools
security
and
integraHon
• Model
Driven
Performance
Monitoring
– Hundreds
of
instances
appear
in
a
few
minutes…
– Tools
have
to
“garbage
collect”
dead
instances
36. Capacity
Planning
in
Clouds
(a
few
things
have
changed…)
• Capacity
is
expensive
• Capacity
takes
Hme
to
buy
and
provision
• Capacity
only
increases,
can’t
be
shrunk
easily
• Capacity
comes
in
big
chunks,
paid
up
front
• Planning
errors
can
cause
big
problems
• Systems
are
clearly
defined
assets
• Systems
can
be
instrumented
in
detail
• Depreciate
assets
over
3
years
(reservaHons!)
37. Monitoring
Issues
• Problem
– Too
many
tools,
each
with
a
good
reason
to
exist
– Hard
to
get
an
integrated
view
of
a
problem
– Too
much
manual
work
building
dashboards
– Tools
are
not
discoverable,
views
are
not
filtered
• SoluHon
– Get
vendors
to
add
deep
linking
URLs
and
APIs
– IntegraHon
“portal”
Hes
everything
together
– Underlying
dependency
database
– Dynamic
portal
generaHon,
relevant
data,
all
tools
38. Data
Sources
• External
URL
availability
and
latency
alerts
and
reports
–
Keynote
External
TesHng
• Stress
tesHng
-‐
SOASTA
• Ne#lix
REST
calls
–
Chukwa
to
DataOven
with
GUID
transacHon
idenHfier
Request
Trace
Logging
• Generic
HTTP
–
AppDynamics
service
Her
aggregaHon,
end
to
end
tracking
• Tracers
and
counters
–
log4j,
tracer
central,
Chukwa
to
DataOven
ApplicaHon
logging
• Trackid
and
Audit/Debug
logging
–
DataOven,
Appdynamics
GUID
cross
reference
• ApplicaHon
specific
real
Hme
–
Nimso>,
Appdynamics,
Epic
JMX
Metrics
• Service
and
SLA
percenHles
–
Nimso>,
Appdynamics,
Epic,logged
to
DataOven
• Stdout
logs
–
S3
–
DataOven,
Nimso>
alerHng
Tomcat
and
Apache
logs
• Standard
format
Access
and
Error
logs
–
S3
–
DataOven,
Nimso>
AlerHng
• Garbage
CollecHon
–
Nimso>,
Appdynamics
JVM
• Memory
usage,
call
stacks,
resource/call
-‐
AppDynamics
• system
CPU/Net/RAM/Disk
metrics
–
AppDynamics,
Epic,
Nimso>
AlerHng
Linux
• SNMP
metrics
–
Epic,
Network
flows
-‐
FasHp
• Load
balancer
traffic
–
Amazon
Cloudwatch,
SimpleDB
usage
stats
AWS
• System
configuraHon
-‐
CPU
count/speed
and
RAM
size,
overall
usage
-‐
AWS
40. Dashboards
Architecture
• Integrated
Dashboard
View
– Single
web
page
containing
content
from
many
tools
– Filtered
to
highlight
most
“interesHng”
data
• Relevance
Controller
– Drill
in,
add
and
remove
content
interacHvely
– Given
an
applicaHon,
alert
or
problem
area,
dynamically
build
a
dashboard
relevant
to
your
role
and
needs
• Dependency
and
Incident
Model
– Model
Driven
-‐
Interrogates
tools
and
AWS
APIs
– Document
store
to
capture
dependency
tree
and
states
42. AppDynamics
How
to
look
deep
inside
your
cloud
applicaHons
• AutomaHc
Monitoring
– Base
AMI
includes
all
monitoring
tools
– Outbound
calls
only
–
no
discovery/polling
issues
– InacHve
instances
removed
a>er
a
few
days
• Incident
Alarms
(deviaHon
from
baseline)
– Business
TransacHon
latency
and
error
rate
– Alarm
thresholds
discover
their
own
baseline
– Email
contains
URL
to
Incident
Workbench
UI
45. Monitoring
Summary
• Broken
datacenter
oriented
tools
is
a
big
problem
• IntegraHng
many
different
tools
– They
are
not
designed
to
be
integrated
– We
have
“persuaded”
vendors
to
add
APIs
• If
you
can’t
see
deep
inside
your
app,
you’re
L
47. Next
Few
Years…
• “System
of
Record”
moves
to
Cloud
(now)
– Master
copies
of
data
live
only
in
the
cloud,
with
backups
– Cut
the
datacenter
to
cloud
replicaHon
link
• InternaHonal
Expansion
–
Global
Clouds
(later
in
2011)
– Rapid
deployments
to
new
markets
• Cloud
StandardizaHon?
– Cloud
features
and
APIs
should
be
a
commodity
not
a
differenHator
– DifferenHate
on
scale
and
quality
of
service
– CompeHHon
also
drives
cost
down
– Higher
resilience
and
scalability
We
would
prefer
to
be
an
insignificant
customer
in
a
giant
cloud
48. Takeaway
NeAlix
is
path-‐finding
the
use
of
public
AWS
cloud
to
replace
in-‐house
IT
for
non-‐trivial
applica(ons
with
hundreds
of
developers
and
thousands
of
systems.
acockcro>@ne#lix.com
hAp://www.linkedin.com/in/adriancockcro>
@adrianco
#ne#lixcloud