Effective Lessons for Building SOA from Amazon, Google and Lucidchart

Effective SOA
Lessons from Amazon, Google, and Lucidchart
By Derrick Isaacson

Can I get that
without the
bacon?
Said no one
ever
http://www.food.com/photo-finder/all/bacon?photog=1072593

http://baconipsum.com/?paras=1&type=all-meat&start-with-lorem=1

http://www.someecards.com/usercards/viewcard/MjAxMi03YWZiMjJiMTg3NDFhYTUy

Simplicity of Single Component Services
• I can’t remember if that getter function takes 100ns or 100ms. - Said no
engineer ever
• Should I try to model this server request as a “remote procedure call”?
• 6 orders of magnitude difference!
• My front-side bus fails for only 1 second every 17 minutes! - Said no
engineer ever
• 99.9% availability
• Our internet only supports .NET. - Said no engineer ever
• Do we need an SDK?

"A distributed system is at best a
necessary evil, evil because of the extra
complexity...
An application is rarely, if ever, intrinsically
distributed. Distribution is just the lesser of
the many evils, or perhaps better put, a
sensible engineering decision given the
trade-offs involved."
-David Cheriton, Distributed Systems Lecture Notes, ch. 1

Distributed System Architectures
Does it have to be “Service-oriented”?

http://upload.wikimedia.org/wikipedia/commons/d/da/KL_CoreMemory.jpg
Distributed Memory

RPC
<I’m>
<not>
<making>
<a>
<service>
<request>
<I’m>
<just>
<calling>
<a>
<procedure>

Distributed File System
mount -t nfs -o proto=tcp,port=2049 nfs-server:/ /mnt

Distributed Data Stores
• Replated MySQL
• Mongo
• S3
• RDS
• BigTable
• Cassandra
…

Service-oriented Architectures
Social Bookmarking App

GET /profiles/123
GET /users/123
Calculate something
GET /users/123/permissions
If user can’t view profile
send 403
POST /eventFeed {new profile view}
GET /users/123/friends
GET /bookmarks?userId=123
GET /catalog/books?ids=1,3,10
Calculate something else
GET /bookmarks/trending
Send response

Lucidchart.com by Status Code
96.5%
2xx or
3xx

Lucidchart.com 1s+ Latencies
10.8%
> 1s

What Happened?!?
I though SOA was supposed to make my app better!

Simple SOA Availability
<98.7%
99.5%
99.8%
99.6%
.995 * .998 * .998 * .996 = 0.987

A distributed system is at
best a necessary evil
<98.7%
99.5%
99.8%
99.6%

The CAP Theorem
http://learnyousomeerlang.com/distribunomicon

The CAP Theorem1
• Safety – nothing bad ever happens
• Liveness – good things happen
• Unreliability – network dis-connectivity,
crash failures, message loss, Byzantine
failures, slowdown, etc.
• Consistency – every response sent to a
client is correct
• Availability – every request gets a
response
• Partition tolerance – operating in the
face of arbitrary failures

Consistency: Nothing Bad Happens

Assumption: Failures Happen
Availability Consistency

ResponseHandler<User> handler = new ResponseHandler<User>()
{
public User handleResponse(
final HttpResponse response) {
int status = response.getStatusLine().getStatusCode();
if (status >= 200 && status < 300) {
HttpEntity entity = response.getEntity();
return entity != null ? Parser.parse(entity) : null;
} else {
…
}
}
};
HttpGet userGet = new HttpGet("http://example.com/users/123");
User user = httpclient.execute(userGet, handler);
https://hc.apache.org/httpcomponents-client-4.3.x/examples.html
Works great to calculate a user!

Best Effort Availability -
Euphemism for not always available

Best Effort Consistency -
Euphemism for not always consistent

Google File System: relaxed consistency model
Throughput
Latency

Amazon Checkout
http://highscalability.com/amazon-architecture

“WOW
I really regret
sacrificing
consistency for
availability”
-said no amazon ever That’s $74 Billion

Hang Consistency!
Add
• Caching
• Timeouts
• Retries
• Guessing
• Anything!

Tip 1:
HTTP
Caching
Availability/Performance Consistency

Tip 2: HTTP Caching as Fallback

Tip 3: Retries
• Exponential backoffs & max retries

Tip 3: HTTP Caching Technologies
• Apache HttpComponents – HttpClient Cache
• Ehcache
• Redis
• Memcached
• CloudFront
• Akamai
• Berkeley DB
• AWS SNS (for notifying caches components of changes)

Segmenting Consistency and Availability
1. Data Partitioning
Shopping Cart
Warehouse Inventory DB

Segmenting
2. Operation Partitioning
Reads
Writes
Dynamo PNUTS&

Segmenting
3. Functional partitioning
User Service, Document Snapshots
Document Service

Segmenting
4. Hierarchical Partitioning
Leaves
Root

http://www.slashgear.com/google-data-center-hd-photos-hit-where-the-internet-lives-gallery-17252451/

Stop Guessing and Just Calculate It
• Max I/O wait time = # of threads * (CONNECT_TIMEOUT +
READ_TIMEOUT)
• 9 front end servers received 1900 requests in 60 seconds and 300
for Flickr resources (16%).
• 35 requests per server per minute
• Max 100 threads, => 6,000 thread seconds in one minute
• Goal: ensure < 10% of thread seconds spent blocked on Flickr I/O
• 600 < 35 requests * (CONNECT_TIMEOUT + READ_TIMEOUT)
• CONNECT_TIMEOUT + READ_TIMEOUT < 17 seconds
TCP Connect
Send
Request Block on socket read Read response
CONNECT_TIMEOU
T
READ_TIMEOUT

Best Effort Consistency System
99.9%
99.5%
99.8%
99.6%

Wow, my
pizza has too
much cheese
and toppings
Said no one
ever
http://upload.wikimedia.org/wikipedia/commons/6/60/Pizza_Hut_Meat_Lover's_pizza_3.JPG

“WOW
My system has
too much
caching,
timeouts, and
availability.”
-said no one ever

Questions?
golucid.co
http://www.slideshare.net/DerrickIsaacson

References
1. Perspectives on the CAP Theorem
2. Bacon Ipsum
3. Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant
Web
4. The Google File System
5. Big Table
6. Amazon Architecture References
7. Apache HttpComponents
8. Apache HttpClient Cache
9. Ehcache

Effective Lessons for Building SOA from Amazon, Google and Lucidchart

Recommended

Recommended

More Related Content

Similar to Effective Lessons for Building SOA from Amazon, Google and Lucidchart

Similar to Effective Lessons for Building SOA from Amazon, Google and Lucidchart (20)

More from Derrick Isaacson

More from Derrick Isaacson (6)

Recently uploaded

Recently uploaded (20)

Effective Lessons for Building SOA from Amazon, Google and Lucidchart

Editor's Notes