In this talk, we share our strategy to adopt asyncio and the tools we built: including common helper library for asyncio testing/debugging/profiling, static analysis and profiling tools for identify call stack, bug fixes and optimizations for asyncio module, design patterns for asyncio, etc. Those experiences are learn from large scale project -- Instagram Django Service.
3. ABOUT ME - JIMMY LAI
• Software Engineer in Instagram Infrastructure
• I like Python
• Recent interests: Python efficiency
• profiling
• Cython
• asyncio
3
4. INSTAGRAM BACKEND
• Python + Django
• Serving with uwsgi
• Data fetching from backends
• No. of processes > No. CPU
4
Server
uwsgi
Django process
sharedmemory
memcached
cassandra
thrift services
https://instagram-engineering.com/
...
CPU
Django process
Django process
Django process
Django process
Django process
5. BLOCKING I/O PROBLEMS
• Slow API: API takes longer time to finish. Bad user experience.
• CPU idle: Context switch between processes come with overhead.
• Harakiri: Long request process termination (uwsgi Harakiri). Restarting process has high
overhead.
5
6. WHAT'S ASYNCIO
• Asynchronous I/O
• Running I/O concurrently
• Blocking IO mode
• Async IO mode
6https://rarehistoricalphotos.com/samuel-reshevsky-age-8-france-1920/
• Simultaneous Exhibition
CPU I/O CPU I/O
CPU I/O
CPU I/O
CPU I/O
CPU I/O
time
7. ASYNCIO AS SOLUTION
• Slow API: API runs faster and user get better experiences.
• CPU idle: In-thread context switch vs process context switch.
• Harakiri: Just cancel pending async call. No need to kill process.
7
8. MYTHS ABOUT ASYNCIO
1. asyncio is multi-processes or parallel computing. It's single single-threaded.
• Only one function could be executed at one time.
• Only I/O could run concurrently.
2. asyncio is always faster regarding CPU and Latency.
• Overhead of event loop and context switch could be significant.
8
9. CPYTHON ASYNCIO
• asyncio module became available starting in CPython 3.4
• Instagram used version 2.7 for a long time and migrated to 3.5 in 2017
9
16. ASYNCIO ADOPTION IN INSTAGRAM JUST LIKE
decorate some trees in a forest
16
Instagram started using
Django and launched in
2010.
Large repo and many
developers.
17. ASYNCIO ADOPTION CHALLENGES
• scale: collaboration in large code repo with a lot of developers
• usability: asyncio utility and bug fix
• prioritization: too much blocking calls to migrate
• automation: reduce repeated manual effort
• efficiency: asyncio CPU overhead is very high
17
18. BACKEND CLIENT LIBRARIES ASYNCIO SUPPORT
• Thrift
• fbthrift py3 and py.asyncio namespaces
• Http
• aiohttp replaces requests
• Other backends
• https://github.com/aio-libs
18
27. GATHER DESIGN PATTERN
• To achieve the maximum concurrency
27
1 async def identity(value):
2 return value
3
4 async def run():
5 awaitables = [
6 f(),
7 g() if a is True else identity(None),
8 h() if b is True else identity(None),
9 ]
10 _, var1, var2 = await asyncio.gather(*awaitables)
1 async def run():
2 await f()
3 var1 = None
4 if a is True:
5 var1 = await g()
6
7 var2 = None
8 if b is True:
9 var2 = await h()
28. LINT
Provide guidance to write better asyncio code
• Rules:
1. async function should be named with async_ prefix
• e.g. async_func( ) vs func( )
2. gather await in loop
3. warning when adding new blocking calls
• implemented with ast + flake8
28
1 for data in data_list:
2 await async_func(data)
3
4 # use gather to run faster
5 await asyncio.gather(*[async_func(data) for data in data_list])
29. AUTOMATION
• Many of asyncio changes are simple and repetitive
• smart code modifier for asyncio adoption:
• collect caller-callee from runtime profiling and offline pyan static analysis
• modify source code ast tree
• change blocking call to async call
• add await
• auto formatting code using isort and black
29
source
code
ast
code
modifier
change
set
pull
request
30. CPU OVERHEAD
• Adopting asyncio could cost ~20% CPU instructions on Instagram servers.
• CPython asyncio was slow due to Python implementation of event loop and helpers.
• Optimization strategies:
• simplify the code and remove redundant computation
• Cython
• C API
• Available optimizations:
• uvloop: libuv + Cython binding for event loop
• CPython 3.6 implement Future and Task in C
• CPython 3.7 implement get_event_loop( ) in C. Future and gather( ) also become
faster.
30
31. CUSTOM OPTIMIZATION
• Example: gather( ) -> ensure_future( ) -> isfuture/iscoroutine/isawaitable
• Reorder: check iscoroutine first
• gather deduplicate coroutines using a dict. Remove the assumption.
• Implement all helper functions by C API
• Optimization result: reduce the overall asyncio CPU overhead by 2X (10%)
31
32. CURRENT RESULTS
• API latency become 30% faster on server side
• Better user engagement
• more media views
• more time spent
• Next Steps
• 100% asyncio
• concurrent request handling
32