Your SlideShare is downloading. ×
0
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Writing Scalable Software in Java
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Writing Scalable Software in Java

12,745

Published on

Writing Scalable Software in Java - From multi-core to grid-computing

Writing Scalable Software in Java - From multi-core to grid-computing

Published in: Technology, News & Politics
4 Comments
20 Likes
Statistics
Notes
No Downloads
Views
Total Views
12,745
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
541
Comments
4
Likes
20
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. Writing Scalable Software in Java From multi-core to grid-computing
    • 2. Me • Ruben Badaró • Dev Expert at Changingworlds/Amdocs • PT.JUG Leader • http://www.zonaj.org
    • 3. What this talk is not about • Sales pitch • Cloud Computing • Service Oriented Architectures • Java EE • How to write multi-threaded code
    • 4. Summary • Define Performance and Scalability • Vertical Scalability - scaling up • Horizontal Scalability - scaling out • Q&A
    • 5. Performance != Scalability
    • 6. Performance Amount of useful work accomplished by a computer system compared to the time and resources used
    • 7. Scalability Capability of a system to increase the amount of useful work as resources and load are added to the system
    • 8. Scalability • A system that performs fast with 10 users might not do so with 1000 - it doesn’t scale • Designing for scalability always decreases performance
    • 9. Linear Scalability Throughput Resources
    • 10. Reality is sub-linear Throughput Resources
    • 11. Amdahl’s Law
    • 12. Scalability is about parallelizing • Parallel decomposition allows division of work • Parallelizing might mean more work • There’s almost always a part of serial computation
    • 13. Vertical Scalability
    • 14. Vertical Scalability Somewhat hard
    • 15. Vertical Scalability Scale Up • Bigger, meaner machines - More cores (and more powerful) - More memory - Faster local storage • Limited - Technical constraints - Cost - big machines get exponentially expensive
    • 16. Shared State • Need to use those cores • Java - shared-state concurrency - Mutable state protected with locks - Hard to get right - Most developers don’t have experience writing multithreaded code
    • 17. This is how they look like public static synchronized SomeObject getInstance() { return instance; } public SomeObject doConcurrentThingy() { synchronized(this) { //... } return ..; }
    • 18. Single vs Multi-threaded • Single-threaded - No scheduling cost - No synchronization cost • Multi-threaded - Context Switching (high cost) - Memory Synchronization (memory barriers) - Blocking
    • 19. Lock Contention Little’s Law The average number of customers in a stable system is equal to their average arrival rate multiplied by their average time in the system
    • 20. Reducing Contention • Reduce lock duration • Reduce frequency with which locks are requested (stripping) • Replace exclusive locks with other mechanisms - Concurrent Collections - ReadWriteLocks - Atomic Variables - Immutable Objects
    • 21. Concurrent Collections • Use lock stripping • Includes putIfAbsent() and replace() methods • ConcurrentHashMap has 16 separate locks by default • Don’t reinvent the wheel
    • 22. ReadWriteLocks • Pair of locks • Read lock can be held by multiple threads if there are no writers • Write lock is exclusive • Good improvements if object as fewer writers
    • 23. Atomic Variables • Allow to make check-update type of operations atomically • Without locks - use low-level CPU instructions • It’s volatile on steroids (visibility + atomicity)
    • 24. Immutable Objects • Immutability makes concurrency simple - thread- safety guaranteed • An immutable object is: - final - fields are final and private - Constructor constructs the object completely - No state changing methods - Copy internal mutable objects when receiving or returning
    • 25. JVM issues • Caching is useful - storing stuff in memory • Larger JVM heap size means longer garbage collection times • Not acceptable to have long pauses • Solutions - Maximum size for heap 2GB/4GB - Multiple JVMs per machine - Better garbage collectors: G1 might help
    • 26. Scaling Up: Other Approaches • Change the paradigm - Actors (Erlang and Scala) - Dataflow programming (GParallelizer) - Software Transactional Memory (Pastrami) - Functional languages, such as Clojure
    • 27. Scaling Up: Other Approaches • Dedicated JVM-friendly hardware - Azul Systems is amazing - Hundreds of cores - Enormous heap sizes with negligible gc pauses - HTM included - Built-in lock elision mechanism
    • 28. Horizontal Scalability
    • 29. Horizontal Scalability The hard part
    • 30. Horizontal Scalability Scale Out • Big machines are expensive - 1 x 32 core normally much more expensive than 4 x 8 core • Increase throughput by adding more machines • Distributed Systems research revisited - not new
    • 31. Requirements • Scalability • Availability • Reliability • Performance
    • 32. Typical Server Architecture
    • 33. ... # of users increases
    • 34. ... and increases
    • 35. ... too much load
    • 36. ... and we loose availability
    • 37. ... so we add servers
    • 38. ... and a load balancer
    • 39. ... and another one rides the bus
    • 40. ... we create a DB cluster
    • 41. ... and we cache wherever we can Cache Cache
    • 42. Challenges • How do we route requests to servers? • How do distribute data between servers? • How do we handle failures? • How do we keep our cache consistent? • How do we handle load peaks?
    • 43. Technique #1: Partitioning A F K P U ... ... ... ... ... E J O T Z Users
    • 44. Technique #1: Partitioning • Each server handles a subset of data • Improves scalability by parallelizing • Requires predictable routing • Introduces problems with locality • Move work to where the data is!
    • 45. Technique #2: Replication Active Backup
    • 46. Technique #2: Replication • Keep copies of data/state in multiple servers • Used for fail-over - increases availability • Requires more cold hardware • Overhead of replicating might reduce performance
    • 47. Technique #3: Messaging
    • 48. Technique #3: Messaging • Use message passing, queues and pub/sub models - JMS • Improves reliability easily • Helps deal with peaks - The queue keeps filling - If it gets too big, extra requests are rejected
    • 49. Solution #1: De- normalize DB • Faster queries • Additional work to generate tables • Less space efficiency • Harder to maintain consistency
    • 50. Solution #2: Non-SQL Database • Why not remove the relational part altogether • Bad for complex queries • Berkeley DB is a prime example
    • 51. Solution #3: Distributed Key/Value Stores • Highly scalable - used in the largest websites in the world, based on Amazon’s Dynamo and Google’s BigTable • Mostly open source • Partitioned • Replicated • Versioned • No SPOF • Voldemort (LinkedIn), Cassandra (Facebook) and HBase are written in Java
    • 52. Solution #4: MapReduce Map...
    • 53. Solution #4: MapReduce Map...
    • 54. Solution #4: MapReduce Divide Work Map...
    • 55. Solution #4: MapReduce Divide Work Map...
    • 56. Solution #4: MapReduce Divide Work Map...
    • 57. Solution #4: MapReduce Map...
    • 58. Solution #4: MapReduce Compute Map...
    • 59. Solution #4: MapReduce Return and aggregate Reduce...
    • 60. Solution #4: MapReduce Return and aggregate Reduce...
    • 61. Solution #4: MapReduce Return and aggregate Reduce...
    • 62. Solution #4: MapReduce • Google’s algorithm to split work, process it and reduce to an answer • Used for offline processing of large amounts of data • Hadoop is used everywhere! Other options such as GridGain exist
    • 63. Solution #5: Data Grid • Data (and computations) • In-memory - low response times • Database back-end (SQL or not) • Partitioned - operations on data executed in specific partition • Replicated - handles failover automatically • Transactional
    • 64. Solution #5: Data Grid • It’s a distributed cache + computational engine • Can be used as a cache with JPA and the like • Oracle Coherence is very good. • Terracotta, Gridgain, Gemfire, Gigaspaces, Velocity (Microsoft) and Websphere extreme scale (IBM)
    • 65. Retrospective • You need to scale up and out • Write code thinking of hundreds of cores • Relational might not be the way to go • Cache whenever you can • Be aware of data locality
    • 66. Q &A Thanks for listening! Ruben Badaró http://www.zonaj.org

    ×