Developing highly scalable
applications
By
Afkham Azeez
twitter.com/afkham_azeez
blog.afkham.org
About Me
• PMC member Apache Axis, Apache Stratos,
Committer Synapse & Web Services
• Member, Apache Software Foundation
•...
Agenda
• Some core concepts
• Vertical/horizontal scaling techniques
• Capacity planning
• Show me the code!

• Q&A, Discu...
Scalability
"The ability of the of a system to
continue to operate correctly even
when it is scaled to a larger size”
Availability

5
Availability

6
High Availability
A system that is designed for continuous
operation in the event of a failure of one or more
components. ...
How to decide required scale
(capacity) & availability?
• Average throughput (TPS)
• Max throughput (TPS)
• Monetary value...
Vertical Scaling
• Get the maximum out of each allocated JVM or
resource

• Increase CPU size
• Increase memory
Horizontal Scaling
Load Balancing
• Load balancing algorithms
• Round robin

• Weighted
• Response based
• Health check

• Failover-only
Clustering for scalability

12
Clustering for availability

Group Communication Channel/State replication

13
ReentrantReadWriteLock
ReentrantReadWriteLock
ReentrantReadWriteLock
http://bit.ly/1a8uu7n
REENTRANT READWRITE LOCK
EXERCISE
Thread Pooling
ThreadLocal
http://bit.ly/1a8uu7n
THREADLOCAL EXERCISE
ThreadPooling + ThreadLocal
CAUTION: Make sure that ThreadLocal variables
are reset prior to Threads being returned to the...
Object Pooling
• Some objects would be expensive to create
•

e.g. Connection objects

• Reuse objects

• State stored in ...
Lazy Loading

1. Lazy Initialization
A null field indicates that there is no data. When the value is
requested, a null che...
2. Virtual Proxy

Lazy Loading

The virtual proxy implements the same interface as the real
object, and when called for th...
Lazy Loading
3. Value Holder

A value holder is an object with a getValue method, which the
clients will invoke in order t...
Lazy Loading
4. Ghost
The real object without any data. The data is loaded as and
when required.
UUID id = UUID.getUuid();...
Lazy Loading - performance
Asynchrony
• Callback
• Dual channel
• Queing
Queuing
Producer - Consumer
Capacity Planning

Source: http://srinathsview.blogspot.com/2012/05/how-to-measure-performance-ofserver.html
Capacity Planning

Throughput = number of completed requests / time to complete the requests

No. of servers = (projected ...
Static membership
• Predefined members
• Other (non-predefined) nodes cannot join
Static group
M1

M2

M3

N

M4

32
Dynamic membership
• No predefined members
• Nodes can join & leave
Dynamic group
M1

M2

N
Join

M3

M4

33
Hybrid membership
•

Some predefined (well-known) members, and some
dynamic members

•

Nodes can join & leave

•

Members...
Multicast based membership management

M4

N

M1

Join (IP,
Port)

M2

M3

35
Well-known Address (WKA) based
membership management

Hybrid group
Dynamic members

Static members

M6

N

WK1

M5

WK2
No...
Multicast vs. WKA
Multicast

WKA

All nodes should be in the same subnet

Nodes can be in different networks

All nodes sh...
Auto-scaling
• Scale-out when load increases
• Scale-in when load decreases
• Always use the optimum amount of resources
•...
Auto-scaling – steady load
Auto-scaling – load increasing
Auto-scaling – load increasing
Auto-scaling – steady load
Auto-scaling – decreased load
Auto-scaling – decreased load
Auto-scaling – steady load
Autoscaling - Analysis & Results
Autoscaling - Analysis & Results
Single node

Cost

LOWEST

Availability

HIGHEST

48
Primary-secondary

Cost

LOWEST

Availability

HIGHEST

Primary

Secondary
49
Primary-secondary, multiple LB
HIGHEST

Cost

LOWEST

Availability

keepalived

Primary

Secondary
50
Active cluster, multiple LB
HIGHEST

Cost

LOWEST

Availability

keepalived

Active

Active

Active
51
Multi-zone or multi-datacenter Deployment
HIGHEST

Cloud
Controller

Zone 1

Cost

LOWEST

Availability

Zone 2

Region X
...
Multi-region deployment
HIGHEST

Zone 1

Zone 2

Region X

Zone 2

Cost

LOWEST

Availability

Zone 1

Region Y

53
Multiple IaaS (hybrid) Deployment
HIGHEST

Zone 1

Private cloud (data center)

Zone 2

Zone 1

Zone 2

Amazon EC2

Cost

...
Multi-region
Multi-IaaS

Multi-zone

Multi-node active cluster
- Single zone

Primary-Secondary,
with multiple LBs

Primar...
Hazelcast
Clustering and highly scalable data distribution
platform for Java, which is:
Lightning-fast; thousands of opera...
Hazelcast – In-memory data grid
Hazelcast – Management Center
http://bit.ly/1a8uu7n
HAZELCAST EXERCISE
JSR 107 – JCache (javax.cache.*)
CacheManager cacheManager =
Caching.getCacheManagerFactory().getCacheManager(“manger");
C...
JSR 107 – JCache (javax.cache.*)
CacheManager cacheManager
Caching.getCacheManagerFactory().getCacheManager("test");
Strin...
http://bit.ly/1a8uu7n
JCACHE EXERCISE
DISCUSSION
Thanks!
Upcoming SlideShare
Loading in …5
×

Java Colombo: Developing Highly Scalable Apps

605 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
605
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Java Colombo: Developing Highly Scalable Apps

  1. 1. Developing highly scalable applications By Afkham Azeez twitter.com/afkham_azeez blog.afkham.org
  2. 2. About Me • PMC member Apache Axis, Apache Stratos, Committer Synapse & Web Services • Member, Apache Software Foundation • Co-author, Axis2 Web Services • Director of Architecture, WSO2 Inc • Electronics, Arduino, 4x4 hobbyist • Blog: http://blog.afkham.org • Twitter: afkham_azeez 2
  3. 3. Agenda • Some core concepts • Vertical/horizontal scaling techniques • Capacity planning • Show me the code! • Q&A, Discussion
  4. 4. Scalability "The ability of the of a system to continue to operate correctly even when it is scaled to a larger size”
  5. 5. Availability 5
  6. 6. Availability 6
  7. 7. High Availability A system that is designed for continuous operation in the event of a failure of one or more components. However, the system may display some degradation of service, but will continue to perform correctly. High Availability: The proportion of time during which the service is accessible with reasonable response times should be close to 100%.
  8. 8. How to decide required scale (capacity) & availability? • Average throughput (TPS) • Max throughput (TPS) • Monetary value of a transaction • Average loss & max loss per second of downtime • Decide on how much to invest based on cost vs. benefit tradeoff 8
  9. 9. Vertical Scaling • Get the maximum out of each allocated JVM or resource • Increase CPU size • Increase memory
  10. 10. Horizontal Scaling
  11. 11. Load Balancing • Load balancing algorithms • Round robin • Weighted • Response based • Health check • Failover-only
  12. 12. Clustering for scalability 12
  13. 13. Clustering for availability Group Communication Channel/State replication 13
  14. 14. ReentrantReadWriteLock
  15. 15. ReentrantReadWriteLock
  16. 16. ReentrantReadWriteLock
  17. 17. http://bit.ly/1a8uu7n REENTRANT READWRITE LOCK EXERCISE
  18. 18. Thread Pooling
  19. 19. ThreadLocal
  20. 20. http://bit.ly/1a8uu7n THREADLOCAL EXERCISE
  21. 21. ThreadPooling + ThreadLocal CAUTION: Make sure that ThreadLocal variables are reset prior to Threads being returned to the pool
  22. 22. Object Pooling • Some objects would be expensive to create • e.g. Connection objects • Reuse objects • State stored in those objects need to be cleared before being returned to the pool
  23. 23. Lazy Loading 1. Lazy Initialization A null field indicates that there is no data. When the value is requested, a null check is performed to see if the actual data needs to be loaded. If(this.value== null) this.value = loadValue(); return this.value;
  24. 24. 2. Virtual Proxy Lazy Loading The virtual proxy implements the same interface as the real object, and when called for the very first time, it loads the real object & delegates to that object. class VirtualProxy implements Interface { private RealObject real; private RealObject getReal(){ if(real == null) real = new RealObject (); return real; } public void justDoIt(){ getReal().justDoIt() } } class RealObject implements Interface { public void justDoIt(){ System.out.println(“DONE!”); } }
  25. 25. Lazy Loading 3. Value Holder A value holder is an object with a getValue method, which the clients will invoke in order to obtain a reference to the real object. Note that the method may not necessarily be named getValue. private ValueHolder<Widget> valueHolder; public Widget getWidget { return valueHolder.getValue(); }
  26. 26. Lazy Loading 4. Ghost The real object without any data. The data is loaded as and when required. UUID id = UUID.getUuid(); String name; Connection dbConn; … public Connection getDbConnection(){ If(dbConn == null) dbConn = loadDatabaseConnection(); }
  27. 27. Lazy Loading - performance
  28. 28. Asynchrony • Callback • Dual channel • Queing
  29. 29. Queuing Producer - Consumer
  30. 30. Capacity Planning Source: http://srinathsview.blogspot.com/2012/05/how-to-measure-performance-ofserver.html
  31. 31. Capacity Planning Throughput = number of completed requests / time to complete the requests No. of servers = (projected max load * 1.3) / max throughput of one server max throughput of one server = max throughput of that server for the slowest scenario in the set of use cases
  32. 32. Static membership • Predefined members • Other (non-predefined) nodes cannot join Static group M1 M2 M3 N M4 32
  33. 33. Dynamic membership • No predefined members • Nodes can join & leave Dynamic group M1 M2 N Join M3 M4 33
  34. 34. Hybrid membership • Some predefined (well-known) members, and some dynamic members • Nodes can join & leave • Membership revolves around the static members Hybrid group Dynamic members M6 M5 M7 N Static members M1 M2 M3 Join (IP, Port) M4 34
  35. 35. Multicast based membership management M4 N M1 Join (IP, Port) M2 M3 35
  36. 36. Well-known Address (WKA) based membership management Hybrid group Dynamic members Static members M6 N WK1 M5 WK2 Notify M7 Join (IP, Port) WK3 WK4 36
  37. 37. Multicast vs. WKA Multicast WKA All nodes should be in the same subnet Nodes can be in different networks All nodes should be in the same multicast domain No multicasting requirement Multicasting should not be blocked No fixed IP addresses or hosts required At least one well-known IP address or host required Failure of any member does not affect membership discovery New members can join with some WKA nodes down, but not if all WKA nodes are down Does not work on IaaSs such as Amazon EC2 IaaS-friendly Requires keepalived, elastic IPs or some other mechanism for remapping IP addresses of WK members in cases of failure 37
  38. 38. Auto-scaling • Scale-out when load increases • Scale-in when load decreases • Always use the optimum amount of resources • Try out • AWS ELB • Apache Stratos Load Balancer • WSO2 ELB
  39. 39. Auto-scaling – steady load
  40. 40. Auto-scaling – load increasing
  41. 41. Auto-scaling – load increasing
  42. 42. Auto-scaling – steady load
  43. 43. Auto-scaling – decreased load
  44. 44. Auto-scaling – decreased load
  45. 45. Auto-scaling – steady load
  46. 46. Autoscaling - Analysis & Results
  47. 47. Autoscaling - Analysis & Results
  48. 48. Single node Cost LOWEST Availability HIGHEST 48
  49. 49. Primary-secondary Cost LOWEST Availability HIGHEST Primary Secondary 49
  50. 50. Primary-secondary, multiple LB HIGHEST Cost LOWEST Availability keepalived Primary Secondary 50
  51. 51. Active cluster, multiple LB HIGHEST Cost LOWEST Availability keepalived Active Active Active 51
  52. 52. Multi-zone or multi-datacenter Deployment HIGHEST Cloud Controller Zone 1 Cost LOWEST Availability Zone 2 Region X 52
  53. 53. Multi-region deployment HIGHEST Zone 1 Zone 2 Region X Zone 2 Cost LOWEST Availability Zone 1 Region Y 53
  54. 54. Multiple IaaS (hybrid) Deployment HIGHEST Zone 1 Private cloud (data center) Zone 2 Zone 1 Zone 2 Amazon EC2 Cost LOWEST Availability Zone 1 Zone 2 Rackspace Cloud 54
  55. 55. Multi-region Multi-IaaS Multi-zone Multi-node active cluster - Single zone Primary-Secondary, with multiple LBs Primary-Secondary, single LB Single Node Cost of Availability 55
  56. 56. Hazelcast Clustering and highly scalable data distribution platform for Java, which is: Lightning-fast; thousands of operations/sec. Fail-safe; no losing data after crashes. Dynamically scales as new servers added. Open source
  57. 57. Hazelcast – In-memory data grid
  58. 58. Hazelcast – Management Center
  59. 59. http://bit.ly/1a8uu7n HAZELCAST EXERCISE
  60. 60. JSR 107 – JCache (javax.cache.*) CacheManager cacheManager = Caching.getCacheManagerFactory().getCacheManager(“manger"); Cache<String, Integer> cache = cacheManager.getCache("sampleCache"); int value1 = 9876; cache.put(key, value1); int value = cache.get(key).intValue()
  61. 61. JSR 107 – JCache (javax.cache.*) CacheManager cacheManager Caching.getCacheManagerFactory().getCacheManager("test"); String cacheName = "cacheXXX"; Cache<String,Integer> cache = cacheManager.<String, Integer>createCacheBuilder(cacheName). setExpiry(CacheConfiguration.ExpiryType.MODIFIED, new CacheConfiguration.Duration(TimeUnit.SECONDS, 10)). setStoreByValue(false).build(); int value = 9876; cache.put(key, value); assertEquals(cache.get(key).intValue(), value);
  62. 62. http://bit.ly/1a8uu7n JCACHE EXERCISE
  63. 63. DISCUSSION
  64. 64. Thanks!

×