Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rebuilding the Busiest Trading Exchange in the World to Scale 10X (Manjunath Shivakumar, PaddyPowerBetfair) Kafka Summit London 2019

291 views

Published on

Betfair is the largest online betting operation in the world, busier than London Stock Exchange at times. At Betfair, our biggest customers trade at ultra high frequencies pouring in millions of dollars into our trading systems. As such, low latency and reliability is key to everything we build. Our customers need to be able to view the current positions on offer on a market, place their orders accordingly and see them fulfilled reliably, all of which needs to happen in a few milliseconds. As global business, we also have a steady growth in a number of jurisdictions we operate in, so the number of customers operating at such frequencies are going up every single day. We need our exchange trading platform to be resilient, reliable, fast and easily scalable. On a busy Saturday afternoon when there is popular football going on, we see in excess of 200k transactions per second across our estate, and 99.9% of them being served with an SLA of 10ms. This used to be about 40k transactions per second a few years ago. So in order to get from that point to the present day, we needed to fundamentally re-engineer our backend systems, to be largely event driven, and Kafka was the perfect tool to help us solve this problem. On the back of the success of that platform, we are now rebuilding the core of our exchange platform that accepts and matches up to 25000 orders per second. Ordering of events is key to achieving this reliably, and again Kafka is at the centre of our solution here.

This presentation details some of the key scalability, reliability and resiliency challenges that we faced in this migration and how we overcame them, to rebuild our entire exchange with Kafka at its core.

Published in: Technology
  • New NFL Bankroll doubler for you. Fully Verified Proof [inside] ♣♣♣ http://scamcb.com/zcodesys/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Rebuilding the Busiest Trading Exchange in the World to Scale 10X (Manjunath Shivakumar, PaddyPowerBetfair) Kafka Summit London 2019

  1. 1. 1 I HOW EXCHANGE SCALED TO 160K RPS WITH KAFKAManjunath Shivakumar Teodora Marcu
  2. 2. WHO WE ARE Betfair founded in 2000 2 Merger of Paddy Power and Betfair in 2016 Paddy Power – Leading betting operator in Ireland and UK Betfair – Largest online betting exchange in the world Flutter Inc.
  3. 3. Betfair 3 BETTING EXCHANGE 101
  4. 4. Betfair 4 PEER-PEER BETTING PLATFORM • Odds set by user • Cuts out middle man • Upto 20% Better Odds than market • Recreational and Automated trading • Covering Sports, Politics, Special Events Odds –probability of an outcome Fractional/Decimal Odds - 7/2 = 4.5 Back – Bet for an outcome Lay – Bet against an outcome
  5. 5. Betfair 5 TRADING CYCLE • Users interact via web, mobile and API interfaces • High performance API to support order management, and discovery of current state of exchange • Low Latency High Frequency Trader Bots generate 75% of traffic and revenue • Peak loads of over 200k RPS Discover Markets Check Market State Calculate Position Place / Update / Cancel Orders Check Position
  6. 6. Betfair 6 LOAD PROFILE Discover Markets Check Market State Calculate Position Place / Update / Cancel Orders Check Position CATALOG API 15k MARKET READ API 95k TRANSACTIONAL API 3k BET READ API 46k 60% 29% 2% 9% 11%
  7. 7. Betfair 7 Monolithic Web and API nodes Unreliable UDP based event stream Heavy reliance on DB Legacy code impossible to change Developed a new lightweight API stack Cliff edge at 16k rps, but we needed 80k LEGACY ARCHITECTURE DB P S S Migration to new stack had hard deadline
  8. 8. Betfair 8 THANOS IS COMING
  9. 9. Betfair 9 TARGET Scalable Add consumers and scale them horizontally with minimal impact on the underlying systems Complete Stream Consumers need to catch up quickly under failure conditions from the stream
  10. 10. Betfair 10 TARGET Fast Minimised end to end latency Reliable Systems should not go down or if they do they need to recover immediately
  11. 11. Betfair 11 TARGET Efficient Publish only updates to stream, not fully images Encode data using efficient serialization/deserialization techniques
  12. 12. Betfair Periodically published full image of Exchange market state SNAPSHOTS + DELTAS 12 KEY CONCEPTS – DELTA + SNAPSHOT ( INBAND ) D D D DD S D D S D D D
  13. 13. Betfair 13 DELTA SNAPSHOTS EXAMPLE S S D D D D D D D D
  14. 14. 501 730 502 730 503 730 504 730 505 730 506 730 507 730 508 730 509 730 510 730 511 730 512 730 DELTA STREAM 101 101 102 101 103 101 104 101 105 101 106 101 107 107 108 107 109 107 110 107 111 107 112 107 502 502 502 502 502 511 511 511 511 511 511502 730 730 730 730 730 730 730 730 730 730 730 730 SNAPSHOT STREAM OrderChanges – MarketAccount Aggregation EndOfSnapshot 101 Kafka OffsetOrderSnapshot – Market Aggregation Snapshot Stream Dawn OrderChanges – No Aggregation101 502 730 Delta Stream Dawn Stream Version KEY CONCEPTS – DELTA + SNAPSHOT ( OUT OF BAND )
  15. 15. Betfair Using protocol buffers to to serialize structured data. 4-5 times smaller Reusable consumer libraries encoding stream mechanics, bootstrapping, heart beats, conditional decoding Fast serialization / deserialization, rapid and reliable delivery, efficient bootstrapping, fast failovers PROTOCOL CONSUMERS SUPER FAST 15 KEY CONCEPTS
  16. 16. Betfair UNDERPINNED BY RELIABLE, RESILIENT, BLAZING FAST KAFKA DB S S S P S K
  17. 17. Betfair SETUP ● 24 CPU, 32 GB RAM, 6 x 200 GB ● 3 Hosts ● 3 Replicas ● 3 Producers ● 5 Topics ● 100+ Consumers
  18. 18. Betfair 18 KAFKA ASSEMBLED
  19. 19. Betfair 19 KEY METRICS ● 200k Requests / sec ● Bytes In / sec ● 300 MBytes Out / sec ● 10k Messages In / sec ● 8 ms 99p response time ● <100 ms end to end lag ● 75 TB over 4.5 years
  20. 20. Betfair 20 MIGRATION STRATEGY
  21. 21. Betfair 21 MIGRATION STRATEGY ● Legacy and new stack in parallel ● Controlled throttle of traffic ● Continuous A/B evaluation ● Minimal customer impact ● Minimal DB impact ● 32 x 5k = 160k RPS S P K DB S S P SS
  22. 22. Betfair 22 KEY BENEFITS Seamless Migration Migrating from old stack to new stack was relatively smooth Rewind Replay State of system can be replayed for troubleshooting and analysis Agility Its very easy to build new features or add new data points into core streams Efficient Scaling It is exponentially easier today to scale our capacity
  23. 23. Betfair 23 THANOS SUBDUED …
  24. 24. Betfair 24 DESTINY STILL ARRIVESA YEAR LATER …
  25. 25. Betfair 25 MORE PROBLEMS ● Simplified Order Flow ● 2500 rps ● Input Queue Limits DB K DB ORDER PROCESSOR ORDER INPUT HANDLER
  26. 26. Betfair 26 REDESIGNED ORDER FLOW ● Rerouted Order Flow ● 25000+ rps ● Limit yet to be hit ● Simpler recovery ORDER PROCESSOR K K DB ORDER INPUT HANDLER
  27. 27. Betfair 27 ITS REDACTED ! ENDGAME..?
  28. 28. Betfair 28 THANOS IS ALWAYS COMING …
  29. 29. Betfair FRAUD DETECTION OPERATIONAL STREAMS PERSONALISATION HISTORICAL DATA 29 FUTURE CHALLENGES Detect service usage abuse and sanction dynamically. Prevent revenue loss. Operational metrics streams. How are customers using Exchange? Merge to generate a personalised data stream. We think you might like this. Record business streams into market files. Play these to train your bots.
  30. 30. Betfair MULȚUMESC 30 THANK YOU ‫ﺑ‬‫ﮩ‬‫ﺖ‬‫ﺑ‬‫ﮩ‬‫ﺖ‬‫ﺷ‬‫ﮑ‬‫ﺮ‬‫ﯾ‬‫ہ‬ Ευχαριστώ πολύ Благодаря ти Merci Teşekkür ederim ਤuਹਾਡਾ ਧ'ਨਵਾਦ ధన#$ా&'ల) Manjunath.Shivakumar@paddypowerbetfair.com manjusmail@gmail.com https://www.linkedin.com/in/manjunathshivakumar
  31. 31. Betfair Appendix
  32. 32. Betfair 32 SOLUTION ARCHITECTURE
  33. 33. KEY CONCEPTS – Resiliency BET / MARKET STORE API API KAFKA PRODUCER DC1 DC2 BET / MARKET STORE KAFKA PRODUCER
  34. 34. KEY CONCEPTS – Resiliency BET / MARKET STORE API API KAFKA PRODUCER DC1 DC2 BET / MARKET STORE KAFKA PRODUCER
  35. 35. KEY CONCEPTS – Resiliency BET / MARKET STORE API API KAFKA PRODUCER DC1 DC2 BET / MARKET STORE KAFKA PRODUCER
  36. 36. KEY CONCEPTS – Resiliency BET / MARKET STORE API API KAFKA PRODUCER DC1 DC2 BET / MARKET STORE KAFKA PRODUCER

×