Python at Scale
Concurrency at Beeswax
Ron Rothman
Advertiser*
Ad Exchange
Advertiser*
Beeswax
Advertiser*
Advertiser*
ad
❕ 200 ms
❕ 2 million per second
❕ non-infinite $ budget
❕ 99.99% uptime
❕ ~optimal bids
ad request bid requests
bids
Event Collection
Ad
Exchange Bidder
Event
Collector
Event Log
Stream
BeeswaxInternet
Event Collection
Ad
Exchange Bidder
Event
Collector
Event Log
Stream
BeeswaxInternet
Event Collection
Ad
Exchange Bidder
Event
Collector
Event Log
Stream
Load O(10K/sec)
Response time < 250 ms p99
Availability ♾️ Nines
1. Durably record the event
2. Update counters (database)
1
2
def handle_request(req):
'''Process an incoming event.'''
unmarshall(req) # processing
validate(req) # processing
# some business logic # processing
record_event(req) # network i/o
update_db_counters(req) # network i/o
return_response()
work
work
work
wait
wait
10 requests/sec
(100 ms per request)
def handle_request(req):
'''Process an incoming event.'''
unmarshall(req) # processing
validate(req) # processing
# some business logic # processing
record_event(req) # network i/o
update_db_counters(req) # network i/o
return_response()
work
work
work
wait
wait
Layers of Concurrency
Many machines EC2; Autoscale groups
Many processes Preforking web servers
Many threads Greenlets; asyncio
Containers ECS tasks
Serverless Lambdas
Elastic Load Balancer
EC2 EC2 EC2 EC2 EC2
autoscale
group
event
notifications
(HTTP requests)
EC2 instance
event
notifications
(from LB)
web server
processes
web server
process
greenlets/threads
requests
Threads or Greenlets?
Threads Greenlets
Preemptive Cooperative
Requires extensive locking Requires no locking
Lightweight Very lightweight
Leverage multiple cores* Single core
*Sadly, untrue for CPython
Gevent
● Create & manipulate greenlets
● Allows you to do non-blocking i/o
def handle_request(req):
'''Process an incoming event.'''
unmarshall(req) # processing
validate(req) # processing
# some business logic # processing
record_event(req) # network i/o
update_db_counters(req) # network i/o
return_response()
work
work
work
wait
wait
Gevent
● Create & manipulate greenlets
● Allows you to do non-blocking i/o
● Makes (i/o) libraries that you use non-blocking!
○ "monkey patching"
def handle_request(req):
'''Process an incoming event.'''
unmarshall(req) # processing
validate(req) # processing
# some business logic # processing
record_event(req) # network i/o
update_db_counters(req) # network i/o
return_response()
work
work
work
yields
yields
Happily Ever After?
Reality def handle_request(req):
'''Process an incoming event.'''
unmarshall(req)
decrypt(req)
validate(req)
check_whether_duplicate(req) # BLOCKING i/o
# some business logic
# more business logic
# even more business logic
record_event(req) # nonblocking i/o
update_db_counters(req) # BLOCKING i/o
update_other_db_counters(req) # BLOCKING i/o
update_other_other_db_counters(req) # BLOCKING i/o
return_response()
Reality def handle_request(req):
'''Process an incoming event.'''
unmarshall(req)
decrypt(req)
validate(req)
check_whether_duplicate(req) # BLOCKING i/o
# some business logic
# more business logic
# even more business logic
record_event(req) # nonblocking i/o
update_db_counters(req) # BLOCKING i/o
update_other_db_counters(req) # BLOCKING i/o
update_other_other_db_counters(req) # BLOCKING i/o
return_response()
Reality def handle_request(req):
'''Process an incoming event.'''
unmarshall(req)
decrypt(req)
validate(req)
check_whether_duplicate(req) # BLOCKING i/o
# some business logic
# more business logic
# even more business logic
record_event(req) # nonblocking i/o
update_db_counters(req) # BLOCKING i/o
update_other_db_counters(req) # BLOCKING i/o
update_other_other_db_counters(req) # BLOCKING i/o
return_response()
Reality def handle_request(req):
'''Process an incoming event.'''
unmarshall(req)
decrypt(req)
validate(req)
check_whether_duplicate(req) # BLOCKING i/o
# some business logic
# more business logic
# even more business logic
record_event(req) # nonblocking i/o
update_db_counters(req) # BLOCKING i/o
update_other_db_counters(req) # BLOCKING i/o
update_other_other_db_counters(req) # BLOCKING i/o
return_response()
Redis client ✅ pure python
DynamoDB client ✅ pure python
Aerospike client ⛔ wrapped C code
Which DB Libraries Can Be Monkey Patched?
Too Much Processing? Yield Often.
def handle_request(req):
'''Process an incoming event.'''
unmarshall(req)
decrypt(req)
greenlet_yield()
validate(req)
check_whether_duplicate(req) # BLOCKING i/o
# some business logic
# more business logic
greenlet_yield()
# even more business logic
record_event(req) # nonblocking i/o
...
Blocking C Code? Batch & Timeouts.
def handle_request(req):
'''Process an incoming event.'''
unmarshall(req)
decrypt(req)
validate(req)
# BLOCKING i/o
check_whether_duplicate(req, timeout_ms=5, max_tries=3)
# business logic
record_event(req) # nonblocking i/o
update_counters_in_memory(req) # occasional i/o
return_response()
� Blocking I/O calls waste CPU
� You're not as I/O bound as you think hope
� C extensions play by different rules
● Blocking I/O calls waste CPU
■ Gevent + monkey patch
● You're not as I/O bound as you think hope
■ Buffer & batch
● C extensions play by different rules
■ Short timeouts w/retries
Thank You 🙏�
ron {at} beeswax.com
References
● Gevent
● Falcon
● Bottle
● The Sharp Corners of Gevent
● Beeswax

Concurrent Python at Beeswax - Ron Rothman - NYC Python Meetup 2020

  • 1.
    Python at Scale Concurrencyat Beeswax Ron Rothman
  • 3.
    Advertiser* Ad Exchange Advertiser* Beeswax Advertiser* Advertiser* ad ❕ 200ms ❕ 2 million per second ❕ non-infinite $ budget ❕ 99.99% uptime ❕ ~optimal bids ad request bid requests bids
  • 4.
  • 5.
  • 6.
    Event Collection Ad Exchange Bidder Event Collector EventLog Stream Load O(10K/sec) Response time < 250 ms p99 Availability ♾️ Nines 1. Durably record the event 2. Update counters (database) 1 2
  • 7.
    def handle_request(req): '''Process anincoming event.''' unmarshall(req) # processing validate(req) # processing # some business logic # processing record_event(req) # network i/o update_db_counters(req) # network i/o return_response() work work work wait wait
  • 8.
  • 10.
    def handle_request(req): '''Process anincoming event.''' unmarshall(req) # processing validate(req) # processing # some business logic # processing record_event(req) # network i/o update_db_counters(req) # network i/o return_response() work work work wait wait
  • 11.
    Layers of Concurrency Manymachines EC2; Autoscale groups Many processes Preforking web servers Many threads Greenlets; asyncio Containers ECS tasks Serverless Lambdas
  • 12.
    Elastic Load Balancer EC2EC2 EC2 EC2 EC2 autoscale group event notifications (HTTP requests)
  • 13.
  • 14.
  • 15.
  • 16.
    Threads Greenlets Preemptive Cooperative Requiresextensive locking Requires no locking Lightweight Very lightweight Leverage multiple cores* Single core *Sadly, untrue for CPython
  • 17.
    Gevent ● Create &manipulate greenlets ● Allows you to do non-blocking i/o
  • 18.
    def handle_request(req): '''Process anincoming event.''' unmarshall(req) # processing validate(req) # processing # some business logic # processing record_event(req) # network i/o update_db_counters(req) # network i/o return_response() work work work wait wait
  • 19.
    Gevent ● Create &manipulate greenlets ● Allows you to do non-blocking i/o ● Makes (i/o) libraries that you use non-blocking! ○ "monkey patching"
  • 20.
    def handle_request(req): '''Process anincoming event.''' unmarshall(req) # processing validate(req) # processing # some business logic # processing record_event(req) # network i/o update_db_counters(req) # network i/o return_response() work work work yields yields
  • 21.
  • 22.
    Reality def handle_request(req): '''Processan incoming event.''' unmarshall(req) decrypt(req) validate(req) check_whether_duplicate(req) # BLOCKING i/o # some business logic # more business logic # even more business logic record_event(req) # nonblocking i/o update_db_counters(req) # BLOCKING i/o update_other_db_counters(req) # BLOCKING i/o update_other_other_db_counters(req) # BLOCKING i/o return_response()
  • 23.
    Reality def handle_request(req): '''Processan incoming event.''' unmarshall(req) decrypt(req) validate(req) check_whether_duplicate(req) # BLOCKING i/o # some business logic # more business logic # even more business logic record_event(req) # nonblocking i/o update_db_counters(req) # BLOCKING i/o update_other_db_counters(req) # BLOCKING i/o update_other_other_db_counters(req) # BLOCKING i/o return_response()
  • 24.
    Reality def handle_request(req): '''Processan incoming event.''' unmarshall(req) decrypt(req) validate(req) check_whether_duplicate(req) # BLOCKING i/o # some business logic # more business logic # even more business logic record_event(req) # nonblocking i/o update_db_counters(req) # BLOCKING i/o update_other_db_counters(req) # BLOCKING i/o update_other_other_db_counters(req) # BLOCKING i/o return_response()
  • 25.
    Reality def handle_request(req): '''Processan incoming event.''' unmarshall(req) decrypt(req) validate(req) check_whether_duplicate(req) # BLOCKING i/o # some business logic # more business logic # even more business logic record_event(req) # nonblocking i/o update_db_counters(req) # BLOCKING i/o update_other_db_counters(req) # BLOCKING i/o update_other_other_db_counters(req) # BLOCKING i/o return_response()
  • 26.
    Redis client ✅pure python DynamoDB client ✅ pure python Aerospike client ⛔ wrapped C code Which DB Libraries Can Be Monkey Patched?
  • 27.
    Too Much Processing?Yield Often. def handle_request(req): '''Process an incoming event.''' unmarshall(req) decrypt(req) greenlet_yield() validate(req) check_whether_duplicate(req) # BLOCKING i/o # some business logic # more business logic greenlet_yield() # even more business logic record_event(req) # nonblocking i/o ...
  • 28.
    Blocking C Code?Batch & Timeouts. def handle_request(req): '''Process an incoming event.''' unmarshall(req) decrypt(req) validate(req) # BLOCKING i/o check_whether_duplicate(req, timeout_ms=5, max_tries=3) # business logic record_event(req) # nonblocking i/o update_counters_in_memory(req) # occasional i/o return_response()
  • 29.
    � Blocking I/Ocalls waste CPU � You're not as I/O bound as you think hope � C extensions play by different rules
  • 30.
    ● Blocking I/Ocalls waste CPU ■ Gevent + monkey patch ● You're not as I/O bound as you think hope ■ Buffer & batch ● C extensions play by different rules ■ Short timeouts w/retries
  • 31.
    Thank You 🙏� ron{at} beeswax.com
  • 32.
    References ● Gevent ● Falcon ●Bottle ● The Sharp Corners of Gevent ● Beeswax

Editor's Notes

  • #30 Design for framework flexibility Buffer blocking i/o Short timeouts w/retries
  • #31 Design for framework flexibility Buffer blocking i/o Short timeouts w/retries