Coherence: XTP Processing using SEDA

3,014 views
2,949 views

Published on

ABSTRACT

Modern telco billing processing systems face huge challenges in the search to move from mostly offline, or batch, processing of billing events to handling huge volumes of events in near-real time. The challenges for an architect of such a system are: how to handle the high throughput processing of near real-time events with low latency while maintaining strict transactional semantics and provide high availability and scalability of the service.

We present our extreme transaction processing solution leveraging Oracle Coherence as an in-memory-data-grid (IMDG) which provides reliable messaging, data storage, and asynchronous write-back to RDBMS. The system follows the staged event driven architecture (SEDA) design pattern providing for a flexible and manageable design that is both highly available and scalable. We discuss lessons learned during performance tuning and profiling plus best practices that are applicable for any Coherence user.

OUTLINE

0. Agenda
1. Extreme Transaction Processing systems
2. Discuss Billing Event processing as XTP
3. What are the approaches
a. RDBMS, RDBMS + SSD
b. In Memory DB (Oracle x10)
c. RDBMS + Caching
d. IMDG
4. Leverage IMDG using a SEDA Architecture
a. Specifics of architecture
5. Performance and tuning
a. Profiling tools that we built
b. What bottlenecks
6. Lessons learned - best practices
a. Incubator libraries
b. Locking
a. Tips and tricks
7. Q&A

Published in: Technology, Economy & Finance
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,014
On SlideShare
0
From Embeds
0
Number of Embeds
47
Actions
Shares
0
Downloads
209
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide
  • Coherence: XTP Processing using SEDA

    1. 1. Coherence: XTP Processing using SEDA Taylor Gautier Principal Client Architect, Grid Dynamics January 14, 2010 Copyright 2010 Grid Dynamics
    2. 2. About me... • Currently • Principal Client Architect - Grid Dynamics • Past • Terracotta - Java clustering software • Access - Telecomm Billing • Scale Eight - Petabyte scale web storage • Excite@Home - Millions of users Copyright 2010 Grid Dynamics
    3. 3. About me... • Currently • Principal Client Architect - Grid Dynamics • Past • Terracotta - Java clustering software ‣ Java enthusiast since 1996 • Access - Telecomm Billing • Scale Eight - Petabyte scale web storage • Excite@Home - Millions of users Copyright 2010 Grid Dynamics
    4. 4. About me... • Currently • Principal Client Architect - Grid Dynamics • Past • Terracotta - Java clustering software ‣ Java enthusiast since 1996 • Access - Telecomm Billing ‣ Mission Critical • Scale Eight - Petabyte scale web storage • Excite@Home - Millions of users Copyright 2010 Grid Dynamics
    5. 5. About me... • Currently • Principal Client Architect - Grid Dynamics • Past • Terracotta - Java clustering software ‣ Java enthusiast since 1996 • Access - Telecomm Billing ‣ Mission Critical ‣ Scalable Systems • Scale Eight - Petabyte scale web storage • Excite@Home - Millions of users Copyright 2010 Grid Dynamics
    6. 6. Agenda Copyright 2010 Grid Dynamics
    7. 7. Agenda • Discuss the challenge - XTP Copyright 2010 Grid Dynamics
    8. 8. Agenda • Discuss the challenge - XTP • Introduction to SEDA Copyright 2010 Grid Dynamics
    9. 9. Agenda • Discuss the challenge - XTP • Introduction to SEDA • Project Battery Copyright 2010 Grid Dynamics
    10. 10. Agenda • Discuss the challenge - XTP • Introduction to SEDA • Project Battery • Results and Best Practices Copyright 2010 Grid Dynamics
    11. 11. What’s the challenge? Copyright 2010 Grid Dynamics
    12. 12. XTP Copyright 2010 Grid Dynamics
    13. 13. XTP 1000+ TPS Copyright 2010 Grid Dynamics
    14. 14. XTP 1000+ TPS 35-50ms Copyright 2010 Grid Dynamics
    15. 15. XTP 1000+ TPS 35-50ms 99.9% - 99.999% Copyright 2010 Grid Dynamics
    16. 16. XTP 1000+ TPS 35-50ms 99.9% - 99.999% Copyright 2010 Grid Dynamics
    17. 17. XTP 1000+ TPS • High Throughput 35-50ms 99.9% - 99.999% Copyright 2010 Grid Dynamics
    18. 18. XTP 1000+ TPS • High Throughput 35-50ms • Low-latency 99.9% - 99.999% Copyright 2010 Grid Dynamics
    19. 19. XTP 1000+ TPS • High Throughput 35-50ms • Low-latency 99.9% - 99.999% • Reliable Copyright 2010 Grid Dynamics
    20. 20. XTP 1000+ TPS • High Throughput 35-50ms • Low-latency 99.9% - 99.999% • Reliable • Transactional Copyright 2010 Grid Dynamics
    21. 21. Use Case - Telco Billing Billing System Payment Telecommunication services system Events Events Service Payments provided logs User data Tariffing of services Billing rules Balance management Balance Users CRM Service enable/ Rules disable Balance data Balance ... Copyright 2010 Grid Dynamics
    22. 22. Why is billing hard? Billing engine • 20 million users Events Balance management Events • 10^9 Objects (counters) Tariffing Write-off • payment 1000 events/sec • 50 ms latency Queries Balance Batches SCALE!! Copyright 2010 Grid Dynamics
    23. 23. Traditional Approaches... Copyright 2010 Grid Dynamics
    24. 24. RDBMS • Pros • Mainstream technology • Well understood programming model • Cons • Expensive • Doesn’t scale horizontally Copyright 2010 Grid Dynamics
    25. 25. RDBMS + SSD • Pros • Same as RDBMS... • Speed up processing • Cons • Only improves performance, not scale Copyright 2010 Grid Dynamics
    26. 26. Dedicated Hardware • Pros • Can speed up performance • Often programming model is same • Cons • Expensive to scale • Difficult to provide H/A Copyright 2010 Grid Dynamics
    27. 27. In-memory database • Pros • Mature programming model • Excellent latency • Cons • Does not scale horizontally • System capacity limited Copyright 2010 Grid Dynamics
    28. 28. What about IMDG? as yn c RDBMS Copyright 2010 Grid Dynamics
    29. 29. IMDG • Pros • Low-latency • Horizontal scale • Built-in load balancing • Lower cost model • Cons • Programming model affected Copyright 2010 Grid Dynamics
    30. 30. What’s the challenge with the programming model? Copyright 2010 Grid Dynamics
    31. 31. Complex event processing Event driven model One event may trigger another event, which triggers another and so on in a cascade. Copyright 2010 Grid Dynamics
    32. 32. First, some history... Copyright 2010 Grid Dynamics
    33. 33. Scale Eight (2000) • Web based delivery of media files • Basically Amazon S3 • Total system capacity - several petabytes • On-site appliance for local read/write Copyright 2010 Grid Dynamics
    34. 34. On-site appliance Copyright 2010 Grid Dynamics
    35. 35. On-site appliance • Local cache of remote data Copyright 2010 Grid Dynamics
    36. 36. On-site appliance • Local cache of remote data • Exported NAS as either NFS or SMB Copyright 2010 Grid Dynamics
    37. 37. On-site appliance • Local cache of remote data • Exported NAS as either NFS or SMB • Version 1 - C++ Synchronous Threaded Copyright 2010 Grid Dynamics
    38. 38. On-site appliance • Local cache of remote data • Exported NAS as either NFS or SMB • Version 1 - C++ Synchronous Threaded • Version 2 - C++ Asynchronous Event-based Copyright 2010 Grid Dynamics
    39. 39. Lessons Learned Copyright 2010 Grid Dynamics
    40. 40. Lessons Learned Threaded Application Copyright 2010 Grid Dynamics
    41. 41. Lessons Learned Threaded Application • Linear worfklows easy Copyright 2010 Grid Dynamics
    42. 42. Lessons Learned Threaded Application • Linear worfklows easy • Branching worfklows hard Copyright 2010 Grid Dynamics
    43. 43. Lessons Learned Threaded Application • Linear worfklows easy • Branching worfklows hard • Locking is hard Copyright 2010 Grid Dynamics
    44. 44. Lessons Learned Threaded Application • Linear worfklows easy • Branching worfklows hard • Locking is hard • Context switching kills performance Copyright 2010 Grid Dynamics
    45. 45. Lessons Learned Threaded Application • Linear worfklows easy • Branching worfklows hard • Locking is hard • Context switching kills performance • Complexity means large code base is unmaintainable Copyright 2010 Grid Dynamics
    46. 46. Lessons Learned Threaded Application • Linear worfklows easy • Branching worfklows hard • Locking is hard • Context switching kills performance • Complexity means large code base is unmaintainable • Only experts can do it right Copyright 2010 Grid Dynamics
    47. 47. Lessons Learned Threaded Application Event Based Application • Linear worfklows easy • Branching worfklows hard • Locking is hard • Context switching kills performance • Complexity means large code base is unmaintainable • Only experts can do it right Copyright 2010 Grid Dynamics
    48. 48. Lessons Learned Threaded Application Event Based Application • Linear worfklows easy • Linear workflows complex • Branching worfklows hard • Locking is hard • Context switching kills performance • Complexity means large code base is unmaintainable • Only experts can do it right Copyright 2010 Grid Dynamics
    49. 49. Lessons Learned Threaded Application Event Based Application • Linear worfklows easy • Linear workflows complex • Branching worfklows hard • Branching workflows easy(ier) • Locking is hard • Context switching kills performance • Complexity means large code base is unmaintainable • Only experts can do it right Copyright 2010 Grid Dynamics
    50. 50. Lessons Learned Threaded Application Event Based Application • Linear worfklows easy • Linear workflows complex • Branching worfklows hard • Branching workflows easy(ier) • Locking is hard • Locking is easy • Context switching kills performance • Complexity means large code base is unmaintainable • Only experts can do it right Copyright 2010 Grid Dynamics
    51. 51. Lessons Learned Threaded Application Event Based Application • Linear worfklows easy • Linear workflows complex • Branching worfklows hard • Branching workflows easy(ier) • Locking is hard • Locking is easy • Context switching kills • Easy to max 1 CPU, hard to max performance many • Complexity means large code base is unmaintainable • Only experts can do it right Copyright 2010 Grid Dynamics
    52. 52. Lessons Learned Threaded Application Event Based Application • Linear worfklows easy • Linear workflows complex • Branching worfklows hard • Branching workflows easy(ier) • Locking is hard • Locking is easy • Context switching kills • Easy to max 1 CPU, hard to max performance many • Complexity means large code • Complexity means large code base is unmaintainable base is unmaintainable • Only experts can do it right Copyright 2010 Grid Dynamics
    53. 53. Lessons Learned Threaded Application Event Based Application • Linear worfklows easy • Linear workflows complex • Branching worfklows hard • Branching workflows easy(ier) • Locking is hard • Locking is easy • Context switching kills • Easy to max 1 CPU, hard to max performance many • Complexity means large code • Complexity means large code base is unmaintainable base is unmaintainable • Only experts can do it right • Only experts can do it right Copyright 2010 Grid Dynamics
    54. 54. What we need is something in between... Copyright 2010 Grid Dynamics
    55. 55. Introducing SEDA... Copyright 2010 Grid Dynamics
    56. 56. Introducing SEDA... • Staged Event Driven Architecture Copyright 2010 Grid Dynamics
    57. 57. Introducing SEDA... • Staged Event Driven Architecture • Introduced by Matt Welsh in 2002 as a research paper Copyright 2010 Grid Dynamics
    58. 58. Introducing SEDA... • Staged Event Driven Architecture • Introduced by Matt Welsh in 2002 as a research paper • Blends threaded and event based models Copyright 2010 Grid Dynamics
    59. 59. Introducing SEDA... • Staged Event Driven Architecture • Introduced by Matt Welsh in 2002 as a research paper • Blends threaded and event based models • Stages are completely independent and are threaded Copyright 2010 Grid Dynamics
    60. 60. Introducing SEDA... • Staged Event Driven Architecture • Introduced by Matt Welsh in 2002 as a research paper • Blends threaded and event based models • Stages are completely independent and are threaded • Stages are connected via queues Copyright 2010 Grid Dynamics
    61. 61. Introducing SEDA... • Staged Event Driven Architecture • Introduced by Matt Welsh in 2002 as a research paper • Blends threaded and event based models • Stages are completely independent and are threaded • Stages are connected via queues • Code branches occurs between stages Copyright 2010 Grid Dynamics
    62. 62. SEDA Examples Copyright 2010 Grid Dynamics
    63. 63. SEDA Examples 1 Stage 1 Stage 2 Stage 3 Stage 4 Copyright 2010 Grid Dynamics
    64. 64. SEDA Examples 1 Stage 1 Stage 2 Stage 3 Stage 4 II Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 Copyright 2010 Grid Dynamics
    65. 65. SEDA Examples 1 Stage 1 Stage 2 Stage 3 Stage 4 II Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 III Stage 1 Stage 2 Stage 3 Stage 4 Copyright 2010 Grid Dynamics
    66. 66. SEDA Examples 1 Stage 1 Stage 2 Stage 3 Stage 4 II Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 III Stage 1 Stage 2 Stage 3 Stage 4 Copyright 2010 Grid Dynamics
    67. 67. SEDA Examples 1 Stage 1 Stage 2 Stage 3 Stage 4 II Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 III Stage 1 Stage 2 Stage 3 Stage 4 Copyright 2010 Grid Dynamics
    68. 68. SEDA Benefits Copyright 2010 Grid Dynamics
    69. 69. SEDA Benefits • Easy to design Copyright 2010 Grid Dynamics
    70. 70. SEDA Benefits • Easy to design • Easy to understand Copyright 2010 Grid Dynamics
    71. 71. SEDA Benefits • Easy to design • Easy to understand • Easy to test Copyright 2010 Grid Dynamics
    72. 72. SEDA Benefits • Easy to design • Easy to understand • Easy to test • Easy to reuse Copyright 2010 Grid Dynamics
    73. 73. SEDA Benefits • Easy to design • Easy to understand • Easy to test • Easy to reuse • Event-driven architecture, synchronous programming model Copyright 2010 Grid Dynamics
    74. 74. SEDA Benefits • Easy to design • Easy to understand • Easy to test • Easy to reuse • Event-driven architecture, synchronous programming model • Easy to scale Copyright 2010 Grid Dynamics
    75. 75. Distributed SEDA Stage 1 Stage 2 Stage 3 Stage 4 Copyright 2010 Grid Dynamics
    76. 76. Distributed SEDA Stage 1 Stage 2 Stage 3 Stage 4 Copyright 2010 Grid Dynamics
    77. 77. Distributed SEDA Stage 1 Stage 2 Stage 3 Stage 4 Copyright 2010 Grid Dynamics
    78. 78. Distributed SEDA Stage 1 Stage 2 Stage 3 Stage 4 Copyright 2010 Grid Dynamics
    79. 79. Distributed SEDA Stage 1 Stage 2 Stage 3 Stage 4 Copyright 2010 Grid Dynamics
    80. 80. Distributed SEDA Stage 1 Stage 2 Stage 3 Stage 4 Copyright 2010 Grid Dynamics
    81. 81. Distributed SEDA Stage 1 Stage 2 Stage 3 Stage 4 Copyright 2010 Grid Dynamics
    82. 82. Distributed SEDA Stage 1 Stage 2 Stage 3 Stage 4 Copyright 2010 Grid Dynamics
    83. 83. Project overview... Copyright 2010 Grid Dynamics
    84. 84. Distributed SEDA BUSINESS BUSINESS BUSINESS BUSINESS LOGIC LOGIC LOGIC LOGIC DISTRIBUTED SEDA FRAMEWORK COHERENCE Copyright 2010 Grid Dynamics
    85. 85. Components... Generic processing unit, base Generic for all hierarchy Consumes inbound message Transformer and produces outbound message Routes inbound message to one of several outbound Router queues Duplicates inbound message to Fork several outbound queues Consumes several inbound Junction messages and produces outbound message. Synchronization point Copyright 2010 Grid Dynamics
    86. 86. Now model Billing Processing as Network Inbound event Outbound event Search for relevant Write-off check counters Tariffing Rules processing Asynchronous Changes fixation RDBMS replication Copyright 2010 Grid Dynamics
    87. 87. And scale it out... Horizontal Scaling Copyright 2010 Grid Dynamics
    88. 88. How to implement it in Coherence... Copyright 2010 Grid Dynamics
    89. 89. What do we need? Copyright 2010 Grid Dynamics
    90. 90. What do we need? ✓ Reliable data-storage Copyright 2010 Grid Dynamics
    91. 91. What do we need? ✓ Reliable data-storage ✓ Reliable queue Copyright 2010 Grid Dynamics
    92. 92. What do we need? ✓ Reliable data-storage ✓ Reliable queue ✓ Route work to the data Copyright 2010 Grid Dynamics
    93. 93. What do we need? ✓ Reliable data-storage ✓ Reliable queue ✓ Route work to the data ✓ RDBMS connectivity Copyright 2010 Grid Dynamics
    94. 94. Physical Architecture Copyright 2010 Grid Dynamics
    95. 95. Results... Data Scalability Hardware Scalability 2000 2000 1800 1800 1600 1600 Events/second Events/second 1400 1400 1200 1200 1000 1000 800 800 600 600 400 400 200 200 0 0 0.5 1 2 3 4 5 2 3 4 5 Users (millions) Servers Server Characteristics: CPU 2.5GHZ Quad Core RAM 32 GB Copyright 2010 Grid Dynamics
    96. 96. Best Practices • Always measure and fine-tune • Lock granularity - balance coarse vs. fine • Ordered locking - eliminate deadlocks • Optimistic locking - Detect conflicts on commit • Batching - speed up async write to DB • Incubator patterns Copyright 2010 Grid Dynamics
    97. 97. Queue Pattern • Initial results - ~200 msg/s • Custom version - ~5,000 msg/s • Latest incubator - ~5,000 msg/s Copyright 2010 Grid Dynamics
    98. 98. Conclusion • Demonstrated 5-10x faster than competing solutions • Highly flexible and scalable solution • Currently in acceptance testing Copyright 2010 Grid Dynamics
    99. 99. Questions? Copyright 2010 Grid Dynamics

    ×