Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PyCon Poland 2016: Maintaining a high load Python project: typical mistakes

1,631 views

Published on

The talk is about typical mistakes which a Python developer without much experience in high load systems can make. Possible issues and preventive actions will be discussed. Expected audience: developers who are new to an existing highly loaded service or folks who develop a system from scratch. All the stuff based on own production experience.

Published in: Software

PyCon Poland 2016: Maintaining a high load Python project: typical mistakes

  1. 1. Maintaining a high load Python project: typical mistakes Viacheslav Kakovskyi PyCon Poland 2016
  2. 2. Me! @kakovskyi Python Software Engineer at SoftServe Contributor of Atlassian HipChat — Python 2, Twisted Maintainer of KPIdata — Python 3, asyncio 2
  3. 3. Agenda 1. What is a high load project? 2. High load Python projects from my experience 3. Typical mistakes 4. Load testing of Python applications 5. Practical example of load testing 6. Summary 7. Further reading 3
  4. 4. What is a high load project? 4
  5. 5. What is high load project? ● 2+ nodes? ● 10 000 connections? ● 200 000 RPS? ● 5 000 000 daily active users? ● monitoring? ● scalability? ● Multi-AZ? ● redundancy? ● fault tolerance? ● high availability? ● disaster recovery? 5
  6. 6. What is a high load project? a project where an inefficient solution or a tiny bug has a huge impact on your business (due to a lack of resources)→ → causes an increase of costs $$$ or loss of reputation (due to performance degradation) 6
  7. 7. High load Python projects from my experience ● Instant messenger: ○ 100 000+ connected users ○ 100+ nodes ○ 100+ developers ● Embedded system for traffic analysis: ○ we can't scale and upgrade hardware 7
  8. 8. Typical mistakes ● Usage of a pure Python third-party dependency instead of C-based implementation 8
  9. 9. Typical mistakes: Pure Python dependencies ● Usage of pure Python third-party dependency ● Example: JSON parsing ● Note: check out C-based libraries with Python binding ○ simplejson ○ ujson ○ python-rapidjson ● Note: run your own benchmarks 9
  10. 10. Typical mistakes ● Usage of JSON as a serialization format by default 10
  11. 11. Typical mistakes: JSON all the things ● Usage of JSON as a serialization format by default ● Note: Check out faster formats ○ MessagePack ○ Protocol Buffers ○ Apache Thrift ● Note: Run benchmarks, again! ● Note: Try using YAML for configuration files 11
  12. 12. Typical mistakes ● Coding your high load Python project only with Python 12
  13. 13. Typical mistakes: Pure Python codebase ● Coding your high load Python project only with Python ● Note: use multi language approach ○ Golang ○ NodeJS ○ Rust ● Note: fine tune performance of Python ○ Cython ○ CPython C/C++ extensions ○ PyPy and CFFI 13
  14. 14. Typical mistakes ● Usage of synchronous Python frameworks for networking 14
  15. 15. Typical mistakes: synchronous Python ● High amount of concurrent connections ● Multithreaded approach isn't efficient due to overhead ● Requires usage of a select implementation on backend: ○ poll ○ epoll ○ kqueue 15
  16. 16. Typical mistakes: synchronous Python ● Note: use an asynchronous framework for high loaded solutions 16 Tornado
  17. 17. The answer: asyncio & aiohttp 17
  18. 18. Typical mistakes: synchronous Python ● Note: learn asyncio ● Note: check out the aio-libs 18 ○ aiohttp_admin ○ aiomcache ○ aiocouchdb ○ aiomeasures ○ aiobotocore ○ aiozmq ○ aioodbc ○ aiokafka ○ aioes ○ aiolocust ○ aiohttp ○ aiopg ○ aiomysql ○ aioredis ○ aiokafka
  19. 19. Typical mistakes ● No usage of threads and processes in project's code 19
  20. 20. Typical mistakes: no threads and processes usage ● Note: use threads to split different streams of work for IO-bound tasks ○ Flask ● Note: use processes to scale your IO-bound application inside one node ○ gunicorn + aiohttp ● Note: use threads or processes to delegate blocking jobs for CPU-bound tasks ○ ThreadPoolExecutor, ProcessPoolExecutor 20
  21. 21. Typical mistake: deployment of a new feature without load testing 21
  22. 22. Load testing of Python applications ● Purpose: predict when we fu*k production ● Must have for high load projects ● Helps to prevent the reputation losses ● Almost nobody does that 22
  23. 23. Load testing 101 ● Identify how the load might grow up ○ More users ○ More data ○ More operations ○ Less servers ○ Unexpected edge cases 23
  24. 24. Load testing 101 ● Define the most heavy and frequent operations ○ Insertions into data storages ■ PostgreSQL ■ ElasticSearch ■ Redis ○ Calculations and other CPU-bound tasks ○ Calls to external services ■ S3, etc. 24
  25. 25. Load testing 101 ● Identify how to trigger the operations from a user's perspective ○ REST API endpoints ○ Periodic processing of collected data 25
  26. 26. Load testing 101 ● Collect metrics of product, related to the operations ○ Application metrics with StatsD ■ Counters ■ Timers ○ Per node metrics with collectd ■ CPU ■ RAM ■ IO 26
  27. 27. Load testing 101 27 ● Create a tool, which behaves like gazillion users ○ Establish network connections ○ Make HTTP requests ■ Send some data ■ Retrieve information from our server
  28. 28. Load testing in practice 28
  29. 29. Load testing in practice ● KPIdata is an asyncio-based pet project for assessing the quality of higher education ● KPI means Kyiv Polytechnic Institute ● Students and alumni use the web-site as Yelp for choosing faculties, chairs, and specialities to study ● Check it out on kpidata.org 29
  30. 30. Load testing in practice ● LocustIO is a load testing tool written in Python ● Simulates millions of simultaneous users ● Runs load tests distributed over multiple hosts ● Supports HTTP, XMPP, XML-RPC ● Check it out on locust.io ● Note: it uses gevent under the hood 30
  31. 31. Load testing in practice: key features of KPIdata 31
  32. 32. Load testing in practice: key features of KPIdata 32
  33. 33. Load testing in practice: key features of KPIdata 33
  34. 34. Load testing in practice: key features of KPIdata 34
  35. 35. Load testing in practice: identify the load ● More users ○ Admission campaign starts ○ More schoolchildren will know about the site ● More data ○ Semester ends and we will receive a lot of feedbacks ○ New universities will be involved 35
  36. 36. Load testing in practice: define frequent operations ● Add a feedback for ○ faculty/semester/subject ● Retrieve statistics for ○ faculty/chair/group ● Search for a feedback ● Calculate ratings in background 36
  37. 37. Load testing in practice: identify the triggers ● /feedback ● /faculty/{code} ● /chair/{id} ● /group/{name} ● /rating/{entity} 37 GET POST ● /feedback
  38. 38. Load testing in practice: add application metrics 38 async def collect_web_handler_metrics (app, handler): async def middleware_handler (handler): path = request.path.replace( '/', '.') with statsd.timer('request.' + path): try: response = await handler(request) status_code = response.status except HTTPNotFound as response: status_code = 404 except Exception as response: status_code = 503 finally: statsd.incr('status_code.' .format(status_code) response.set_status(status_code) return response return middleware_handler
  39. 39. Load testing in practice: create dashboards 39 ● Graphite ● Grafana ● DataDog
  40. 40. Load testing in practice: create testing tool ● Create locustfile.py module for execution ● Define TaskSet of test functions ● Define HTTPLocust for spawning the tests 40
  41. 41. Load testing in practice: create testing tool 41 class KPIdataTaskSet(TaskSet): """Set of tasks that a Locust user will execute""" @task def test_get_faculty(self): with self.client.get('/faculty/fpm', catch_response=True) as response: if response.status_code == 200: response.success() if not response._is_reported: response.failure('Wrong status code. Received: {}. Expected: 200.' .format(response.status_code))
  42. 42. Load testing in practice: create testing tool 42 class KPIdataLocust(HttpLocust): """Represents HTTP user which attacks KPIdata web-site""" task_set = KPIdataTaskSet min_wait = 50 max_wait = 100 host = 'http://kpidata.org'
  43. 43. Load testing in practice: before running tests ● Infrastructure: CPU utilization 43
  44. 44. Load testing in practice: before running tests ● Random node: CPU utilization and Load Average 44
  45. 45. Load testing in practice: testing in progress 45
  46. 46. Load testing in practice: after testing ● Infrastructure: 50% CPU utilization 46
  47. 47. Load testing in practice: after testing ● Random node: ○ 53% CPU utilization ○ 3.5 Load Average 47
  48. 48. Results of load testing ● We know how many RPS we can serve with the environment ● We know what's going on when the limit is exceeded ● We know the bottlenecks of our platform ● We know if we can scale some part of the system 48
  49. 49. 49
  50. 50. Summary ● Try to find C-based analogs of 3rd party dependencies ● Check out serialization formats which are faster than JSON ● Fine tune your Python project with C extension, Cython or PyPy ● Write some services not in Python 50
  51. 51. Summary ● Use asyncio and aiohttp for networking applications ● Use ThreadPoolExecutor for blocking operations ● Use processes for scaling inside a node ● Perform load testing for new features before pushing them to production 51
  52. 52. Further reading ● @kakovskyi: Maintaining a high load Python project for newcomers ● @kakovskyi: Instant messenger with Python. Back-end development ● Asyncio-stack for web development ● PEP8 is not enough ● How HipChat Stores and Indexes Billions of Messages Using ElasticSearch ● A guide to analyzing Python performance ● Why Leading Companies Dark Launch - LaunchDarkly Blog ● What Is Async, How Does It Work, And When Should I Use It? 52
  53. 53. 53 @kakovskyi viach.kakovskyi@gmail.com Questions?

×