SlideShare a Scribd company logo
1 of 53
Download to read offline
Maintaining a high load Python project:
typical mistakes
Viacheslav Kakovskyi
PyCon Poland 2016
Me!
@kakovskyi
Python Software Engineer at SoftServe
Contributor of Atlassian HipChat — Python 2, Twisted
Maintainer of KPIdata — Python 3, asyncio
2
Agenda
1. What is a high load project?
2. High load Python projects from my experience
3. Typical mistakes
4. Load testing of Python applications
5. Practical example of load testing
6. Summary
7. Further reading
3
What is a high load project?
4
What is high load project?
● 2+ nodes?
● 10 000 connections?
● 200 000 RPS?
● 5 000 000 daily active users?
● monitoring?
● scalability?
● Multi-AZ?
● redundancy?
● fault tolerance?
● high availability?
● disaster recovery?
5
What is a high load project?
a project where an inefficient solution or a tiny bug
has a huge impact on your business
(due to a lack of resources)→
→ causes an increase of costs $$$ or loss of reputation
(due to performance degradation)
6
High load Python projects from my experience
● Instant messenger:
○ 100 000+ connected users
○ 100+ nodes
○ 100+ developers
● Embedded system for traffic analysis:
○ we can't scale and upgrade hardware
7
Typical mistakes
● Usage of a pure Python third-party dependency
instead of C-based implementation
8
Typical mistakes: Pure Python dependencies
● Usage of pure Python third-party dependency
● Example: JSON parsing
● Note: check out C-based libraries with Python binding
○ simplejson
○ ujson
○ python-rapidjson
● Note: run your own benchmarks
9
Typical mistakes
● Usage of JSON as a serialization format by default
10
Typical mistakes: JSON all the things
● Usage of JSON as a serialization format by default
● Note: Check out faster formats
○ MessagePack
○ Protocol Buffers
○ Apache Thrift
● Note: Run benchmarks, again!
● Note: Try using YAML for configuration files
11
Typical mistakes
● Coding your high load Python project only with Python
12
Typical mistakes: Pure Python codebase
● Coding your high load Python project only with Python
● Note: use multi language approach
○ Golang
○ NodeJS
○ Rust
● Note: fine tune performance of Python
○ Cython
○ CPython C/C++ extensions
○ PyPy and CFFI
13
Typical mistakes
● Usage of synchronous Python frameworks for networking
14
Typical mistakes: synchronous Python
● High amount of concurrent connections
● Multithreaded approach isn't efficient due to overhead
● Requires usage of a select implementation on backend:
○ poll
○ epoll
○ kqueue
15
Typical mistakes: synchronous Python
● Note: use an asynchronous framework for high loaded
solutions
16
Tornado
The answer:
asyncio
&
aiohttp
17
Typical mistakes: synchronous Python
● Note: learn asyncio
● Note: check out the aio-libs
18
○ aiohttp_admin
○ aiomcache
○ aiocouchdb
○ aiomeasures
○ aiobotocore
○ aiozmq
○ aioodbc
○ aiokafka
○ aioes
○ aiolocust
○ aiohttp
○ aiopg
○ aiomysql
○ aioredis
○ aiokafka
Typical mistakes
● No usage of threads and processes in project's code
19
Typical mistakes: no threads and processes usage
● Note: use threads to split different streams of work for
IO-bound tasks
○ Flask
● Note: use processes to scale your IO-bound application inside
one node
○ gunicorn + aiohttp
● Note: use threads or processes to delegate blocking jobs for
CPU-bound tasks
○ ThreadPoolExecutor, ProcessPoolExecutor
20
Typical mistake: deployment of a new
feature without load testing
21
Load testing of Python applications
● Purpose: predict when we fu*k production
● Must have for high load projects
● Helps to prevent the reputation losses
● Almost nobody does that
22
Load testing 101
● Identify how the load might grow up
○ More users
○ More data
○ More operations
○ Less servers
○ Unexpected edge cases
23
Load testing 101
● Define the most heavy and frequent operations
○ Insertions into data storages
■ PostgreSQL
■ ElasticSearch
■ Redis
○ Calculations and other CPU-bound tasks
○ Calls to external services
■ S3, etc.
24
Load testing 101
● Identify how to trigger the operations from a user's
perspective
○ REST API endpoints
○ Periodic processing of collected data
25
Load testing 101
● Collect metrics of product, related to the operations
○ Application metrics with StatsD
■ Counters
■ Timers
○ Per node metrics with collectd
■ CPU
■ RAM
■ IO
26
Load testing 101
27
● Create a tool, which behaves like gazillion users
○ Establish network connections
○ Make HTTP requests
■ Send some data
■ Retrieve information from our server
Load testing in practice
28
Load testing in practice
● KPIdata is an asyncio-based pet project for assessing the
quality of higher education
● KPI means Kyiv Polytechnic Institute
● Students and alumni use the web-site as Yelp for
choosing faculties, chairs, and specialities to study
● Check it out on kpidata.org
29
Load testing in practice
● LocustIO is a load testing tool written in Python
● Simulates millions of simultaneous users
● Runs load tests distributed over multiple hosts
● Supports HTTP, XMPP, XML-RPC
● Check it out on locust.io
● Note: it uses gevent under the hood
30
Load testing in practice: key features of KPIdata
31
Load testing in practice: key features of KPIdata
32
Load testing in practice: key features of KPIdata
33
Load testing in practice: key features of KPIdata
34
Load testing in practice: identify the load
● More users
○ Admission campaign starts
○ More schoolchildren will know about the site
● More data
○ Semester ends and we will receive a lot of
feedbacks
○ New universities will be involved
35
Load testing in practice: define frequent operations
● Add a feedback for
○ faculty/semester/subject
● Retrieve statistics for
○ faculty/chair/group
● Search for a feedback
● Calculate ratings in background
36
Load testing in practice: identify the triggers
● /feedback
● /faculty/{code}
● /chair/{id}
● /group/{name}
● /rating/{entity}
37
GET POST
● /feedback
Load testing in practice: add application metrics
38
async def collect_web_handler_metrics (app, handler):
async def middleware_handler (handler):
path = request.path.replace( '/', '.')
with statsd.timer('request.' + path):
try:
response = await handler(request)
status_code = response.status
except HTTPNotFound as response:
status_code = 404
except Exception as response:
status_code = 503
finally:
statsd.incr('status_code.' .format(status_code)
response.set_status(status_code)
return response
return middleware_handler
Load testing in practice: create dashboards
39
● Graphite
● Grafana
● DataDog
Load testing in practice: create testing tool
● Create locustfile.py module for execution
● Define TaskSet of test functions
● Define HTTPLocust for spawning the tests
40
Load testing in practice: create testing tool
41
class KPIdataTaskSet(TaskSet):
"""Set of tasks that a Locust user will execute"""
@task
def test_get_faculty(self):
with self.client.get('/faculty/fpm',
catch_response=True) as
response:
if response.status_code == 200:
response.success()
if not response._is_reported:
response.failure('Wrong status code.
Received: {}. Expected: 200.'
.format(response.status_code))
Load testing in practice: create testing tool
42
class KPIdataLocust(HttpLocust):
"""Represents HTTP user which attacks KPIdata web-site"""
task_set = KPIdataTaskSet
min_wait = 50
max_wait = 100
host = 'http://kpidata.org'
Load testing in practice: before running tests
● Infrastructure: CPU utilization
43
Load testing in practice: before running tests
● Random node: CPU utilization and Load Average
44
Load testing in practice: testing in progress
45
Load testing in practice: after testing
● Infrastructure: 50% CPU utilization
46
Load testing in practice: after testing
● Random node:
○ 53% CPU utilization
○ 3.5 Load Average
47
Results of load testing
● We know how many RPS we can serve with the environment
● We know what's going on when the limit is exceeded
● We know the bottlenecks of our platform
● We know if we can scale some part of the system
48
49
Summary
● Try to find C-based analogs of 3rd party dependencies
● Check out serialization formats which are faster than JSON
● Fine tune your Python project with C extension, Cython or
PyPy
● Write some services not in Python
50
Summary
● Use asyncio and aiohttp for networking applications
● Use ThreadPoolExecutor for blocking operations
● Use processes for scaling inside a node
● Perform load testing for new features before pushing
them to production
51
Further reading
● @kakovskyi: Maintaining a high load Python project for newcomers
● @kakovskyi: Instant messenger with Python. Back-end development
● Asyncio-stack for web development
● PEP8 is not enough
● How HipChat Stores and Indexes Billions of Messages Using
ElasticSearch
● A guide to analyzing Python performance
● Why Leading Companies Dark Launch - LaunchDarkly Blog
● What Is Async, How Does It Work, And When Should I Use It?
52
53
@kakovskyi
viach.kakovskyi@gmail.com
Questions?

More Related Content

What's hot

Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019Rafał Leszko
 
Jonathan Coveney: Why Pig?
Jonathan Coveney: Why Pig?Jonathan Coveney: Why Pig?
Jonathan Coveney: Why Pig?mortardata
 
Decision making - for loop , nested loop ,if-else statements , switch in goph...
Decision making - for loop , nested loop ,if-else statements , switch in goph...Decision making - for loop , nested loop ,if-else statements , switch in goph...
Decision making - for loop , nested loop ,if-else statements , switch in goph...sangam biradar
 
Types - slice, map, new, make, struct - Gopherlabs
Types - slice, map, new, make, struct - Gopherlabs Types - slice, map, new, make, struct - Gopherlabs
Types - slice, map, new, make, struct - Gopherlabs sangam biradar
 
Design and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageDesign and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageAsankhaya Sharma
 
Translating Qt Applications
Translating Qt ApplicationsTranslating Qt Applications
Translating Qt Applicationsaccount inactive
 
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Taiwan User Group
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
 
Luigi Presentation at OSCON 2013
Luigi Presentation at OSCON 2013Luigi Presentation at OSCON 2013
Luigi Presentation at OSCON 2013Erik Bernhardsson
 
202107 - Orion introduction - COSCUP
202107 - Orion introduction - COSCUP202107 - Orion introduction - COSCUP
202107 - Orion introduction - COSCUPRonald Hsu
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data ScienceErik Bernhardsson
 
Network programming with Qt (C++)
Network programming with Qt (C++)Network programming with Qt (C++)
Network programming with Qt (C++)Manohar Kuse
 
Open Source Weather Information Project with OpenStack Object Storage
Open Source Weather Information Project with OpenStack Object StorageOpen Source Weather Information Project with OpenStack Object Storage
Open Source Weather Information Project with OpenStack Object StorageSammy Fung
 
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...Apache Flink Taiwan User Group
 
Declarative Infrastructure Tools
Declarative Infrastructure Tools Declarative Infrastructure Tools
Declarative Infrastructure Tools Yulia Shcherbachova
 
Building your First gRPC Service
Building your First gRPC ServiceBuilding your First gRPC Service
Building your First gRPC ServiceJessie Barnett
 
Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...
Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...
Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...Flink Forward
 

What's hot (20)

gRPC: Beyond REST
gRPC: Beyond RESTgRPC: Beyond REST
gRPC: Beyond REST
 
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019
 
Jonathan Coveney: Why Pig?
Jonathan Coveney: Why Pig?Jonathan Coveney: Why Pig?
Jonathan Coveney: Why Pig?
 
Decision making - for loop , nested loop ,if-else statements , switch in goph...
Decision making - for loop , nested loop ,if-else statements , switch in goph...Decision making - for loop , nested loop ,if-else statements , switch in goph...
Decision making - for loop , nested loop ,if-else statements , switch in goph...
 
Types - slice, map, new, make, struct - Gopherlabs
Types - slice, map, new, make, struct - Gopherlabs Types - slice, map, new, make, struct - Gopherlabs
Types - slice, map, new, make, struct - Gopherlabs
 
C++ Coroutines
C++ CoroutinesC++ Coroutines
C++ Coroutines
 
Design and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageDesign and Implementation of the Security Graph Language
Design and Implementation of the Security Graph Language
 
Translating Qt Applications
Translating Qt ApplicationsTranslating Qt Applications
Translating Qt Applications
 
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Luigi Presentation at OSCON 2013
Luigi Presentation at OSCON 2013Luigi Presentation at OSCON 2013
Luigi Presentation at OSCON 2013
 
202107 - Orion introduction - COSCUP
202107 - Orion introduction - COSCUP202107 - Orion introduction - COSCUP
202107 - Orion introduction - COSCUP
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data Science
 
Introduction to gRPC
Introduction to gRPCIntroduction to gRPC
Introduction to gRPC
 
Network programming with Qt (C++)
Network programming with Qt (C++)Network programming with Qt (C++)
Network programming with Qt (C++)
 
Open Source Weather Information Project with OpenStack Object Storage
Open Source Weather Information Project with OpenStack Object StorageOpen Source Weather Information Project with OpenStack Object Storage
Open Source Weather Information Project with OpenStack Object Storage
 
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
Apache Software Foundation: How To Contribute, with Apache Flink as Example (...
 
Declarative Infrastructure Tools
Declarative Infrastructure Tools Declarative Infrastructure Tools
Declarative Infrastructure Tools
 
Building your First gRPC Service
Building your First gRPC ServiceBuilding your First gRPC Service
Building your First gRPC Service
 
Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...
Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...
Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...
 

Viewers also liked

WebCamp Ukraine 2016: Instant messenger with Python. Back-end development
WebCamp Ukraine 2016: Instant messenger with Python. Back-end developmentWebCamp Ukraine 2016: Instant messenger with Python. Back-end development
WebCamp Ukraine 2016: Instant messenger with Python. Back-end developmentViach Kakovskyi
 
DUMP-2012 - Только хардкор! - "Архитектура и запуск облачного сервиса в Amazo...
DUMP-2012 - Только хардкор! - "Архитектура и запуск облачного сервиса в Amazo...DUMP-2012 - Только хардкор! - "Архитектура и запуск облачного сервиса в Amazo...
DUMP-2012 - Только хардкор! - "Архитектура и запуск облачного сервиса в Amazo...it-people
 
Send Balls Into Orbit with Python3, AsyncIO, WebSockets and React
Send Balls Into Orbit with Python3, AsyncIO, WebSockets and ReactSend Balls Into Orbit with Python3, AsyncIO, WebSockets and React
Send Balls Into Orbit with Python3, AsyncIO, WebSockets and ReactTaras Lyapun
 
Прямая выгода BigData для бизнеса
Прямая выгода BigData для бизнесаПрямая выгода BigData для бизнеса
Прямая выгода BigData для бизнесаAlexey Lustin
 
Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017Pat Hermens
 
Designing Invisible Software at x.ai
Designing Invisible Software at x.aiDesigning Invisible Software at x.ai
Designing Invisible Software at x.aiAlex Poon
 
Автоматизация анализа логов на базе Elasticsearch
Автоматизация анализа логов на базе ElasticsearchАвтоматизация анализа логов на базе Elasticsearch
Автоматизация анализа логов на базе ElasticsearchPositive Hack Days
 
Shadow Fight 2: архитектура системы аналитики для миллиарда событий
Shadow Fight 2: архитектура системы аналитики для миллиарда событийShadow Fight 2: архитектура системы аналитики для миллиарда событий
Shadow Fight 2: архитектура системы аналитики для миллиарда событийVyacheslav Nikulin
 
Using Pivotal Cloud Foundry with Google’s BigQuery and Cloud Vision API
Using Pivotal Cloud Foundry with Google’s BigQuery and Cloud Vision APIUsing Pivotal Cloud Foundry with Google’s BigQuery and Cloud Vision API
Using Pivotal Cloud Foundry with Google’s BigQuery and Cloud Vision APIVMware Tanzu
 
Optimizing Data Architecture for Natural Language Processing
Optimizing Data Architecture for Natural Language ProcessingOptimizing Data Architecture for Natural Language Processing
Optimizing Data Architecture for Natural Language ProcessingAlex Poon
 
2013-02-02 03 Голушко. Полнотекстовый поиск с Elasticsearch
2013-02-02 03 Голушко. Полнотекстовый поиск с Elasticsearch2013-02-02 03 Голушко. Полнотекстовый поиск с Elasticsearch
2013-02-02 03 Голушко. Полнотекстовый поиск с ElasticsearchОмские ИТ-субботники
 
Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Pythonanntp
 
GATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringGATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringAhmed Magdy Ezzeldin, MSc.
 
Cloud Machine Learning with Google Cloud Platform
Cloud Machine Learning with Google Cloud PlatformCloud Machine Learning with Google Cloud Platform
Cloud Machine Learning with Google Cloud PlatformMichal Brys
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in SolrTommaso Teofili
 
Elastic Stackにハマった話
Elastic Stackにハマった話Elastic Stackにハマった話
Elastic Stackにハマった話Kazuhiro Kosaka
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEDiana Maynard
 

Viewers also liked (20)

WebCamp Ukraine 2016: Instant messenger with Python. Back-end development
WebCamp Ukraine 2016: Instant messenger with Python. Back-end developmentWebCamp Ukraine 2016: Instant messenger with Python. Back-end development
WebCamp Ukraine 2016: Instant messenger with Python. Back-end development
 
DUMP-2012 - Только хардкор! - "Архитектура и запуск облачного сервиса в Amazo...
DUMP-2012 - Только хардкор! - "Архитектура и запуск облачного сервиса в Amazo...DUMP-2012 - Только хардкор! - "Архитектура и запуск облачного сервиса в Amazo...
DUMP-2012 - Только хардкор! - "Архитектура и запуск облачного сервиса в Amazo...
 
Send Balls Into Orbit with Python3, AsyncIO, WebSockets and React
Send Balls Into Orbit with Python3, AsyncIO, WebSockets and ReactSend Balls Into Orbit with Python3, AsyncIO, WebSockets and React
Send Balls Into Orbit with Python3, AsyncIO, WebSockets and React
 
Прямая выгода BigData для бизнеса
Прямая выгода BigData для бизнесаПрямая выгода BigData для бизнеса
Прямая выгода BigData для бизнеса
 
Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017
 
Designing Invisible Software at x.ai
Designing Invisible Software at x.aiDesigning Invisible Software at x.ai
Designing Invisible Software at x.ai
 
Pycon UA 2016
Pycon UA 2016Pycon UA 2016
Pycon UA 2016
 
Автоматизация анализа логов на базе Elasticsearch
Автоматизация анализа логов на базе ElasticsearchАвтоматизация анализа логов на базе Elasticsearch
Автоматизация анализа логов на базе Elasticsearch
 
Shadow Fight 2: архитектура системы аналитики для миллиарда событий
Shadow Fight 2: архитектура системы аналитики для миллиарда событийShadow Fight 2: архитектура системы аналитики для миллиарда событий
Shadow Fight 2: архитектура системы аналитики для миллиарда событий
 
Using Pivotal Cloud Foundry with Google’s BigQuery and Cloud Vision API
Using Pivotal Cloud Foundry with Google’s BigQuery and Cloud Vision APIUsing Pivotal Cloud Foundry with Google’s BigQuery and Cloud Vision API
Using Pivotal Cloud Foundry with Google’s BigQuery and Cloud Vision API
 
Optimizing Data Architecture for Natural Language Processing
Optimizing Data Architecture for Natural Language ProcessingOptimizing Data Architecture for Natural Language Processing
Optimizing Data Architecture for Natural Language Processing
 
2013-02-02 03 Голушко. Полнотекстовый поиск с Elasticsearch
2013-02-02 03 Голушко. Полнотекстовый поиск с Elasticsearch2013-02-02 03 Голушко. Полнотекстовый поиск с Elasticsearch
2013-02-02 03 Голушко. Полнотекстовый поиск с Elasticsearch
 
Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Python
 
GATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringGATE : General Architecture for Text Engineering
GATE : General Architecture for Text Engineering
 
Cloud Machine Learning with Google Cloud Platform
Cloud Machine Learning with Google Cloud PlatformCloud Machine Learning with Google Cloud Platform
Cloud Machine Learning with Google Cloud Platform
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in Solr
 
Elastic Stackにハマった話
Elastic Stackにハマった話Elastic Stackにハマった話
Elastic Stackにハマった話
 
Machine learning with Google machine learning APIs - Puppy or Muffin?
Machine learning with Google machine learning APIs - Puppy or Muffin?Machine learning with Google machine learning APIs - Puppy or Muffin?
Machine learning with Google machine learning APIs - Puppy or Muffin?
 
Webscraping with asyncio
Webscraping with asyncioWebscraping with asyncio
Webscraping with asyncio
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 

Similar to PyCon Poland 2016: Maintaining a high load Python project: typical mistakes

WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...WebCamp
 
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable,  Robust Kafka ReplicatoruReplicator: Uber Engineering’s Scalable,  Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable, Robust Kafka ReplicatorMichael Hongliang Xu
 
DevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and ProjectsDevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and ProjectsFedir RYKHTIK
 
Socket Programming with Python
Socket Programming with PythonSocket Programming with Python
Socket Programming with PythonGLC Networks
 
Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 reviewManageIQ
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For OperatorsKevin Brockhoff
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018Jay Bryant
 
Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023Nelson Calero
 
Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)Jay Bryant
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in ProductionRobert Sanders
 
Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019Jay Bryant
 
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017Jay Bryant
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...StormForge .io
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish styleLars Albertsson
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonTatiana Al-Chueyr
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For ArchitectsKevin Brockhoff
 
Upleveling Analytics with Kafka with Amy Chen
Upleveling Analytics with Kafka with Amy ChenUpleveling Analytics with Kafka with Amy Chen
Upleveling Analytics with Kafka with Amy ChenHostedbyConfluent
 
Glowing bear
Glowing bear Glowing bear
Glowing bear thehyve
 

Similar to PyCon Poland 2016: Maintaining a high load Python project: typical mistakes (20)

WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
 
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable,  Robust Kafka ReplicatoruReplicator: Uber Engineering’s Scalable,  Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
 
DevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and ProjectsDevOps for TYPO3 Teams and Projects
DevOps for TYPO3 Teams and Projects
 
Socket Programming with Python
Socket Programming with PythonSocket Programming with Python
Socket Programming with Python
 
CollegeDiveIn presentation
CollegeDiveIn presentationCollegeDiveIn presentation
CollegeDiveIn presentation
 
Sprint 44 review
Sprint 44 reviewSprint 44 review
Sprint 44 review
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
 
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
 
Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023
 
Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
 
Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019
 
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and Python
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
Upleveling Analytics with Kafka with Amy Chen
Upleveling Analytics with Kafka with Amy ChenUpleveling Analytics with Kafka with Amy Chen
Upleveling Analytics with Kafka with Amy Chen
 
Glowing bear
Glowing bear Glowing bear
Glowing bear
 

Recently uploaded

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 

Recently uploaded (20)

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 

PyCon Poland 2016: Maintaining a high load Python project: typical mistakes

  • 1. Maintaining a high load Python project: typical mistakes Viacheslav Kakovskyi PyCon Poland 2016
  • 2. Me! @kakovskyi Python Software Engineer at SoftServe Contributor of Atlassian HipChat — Python 2, Twisted Maintainer of KPIdata — Python 3, asyncio 2
  • 3. Agenda 1. What is a high load project? 2. High load Python projects from my experience 3. Typical mistakes 4. Load testing of Python applications 5. Practical example of load testing 6. Summary 7. Further reading 3
  • 4. What is a high load project? 4
  • 5. What is high load project? ● 2+ nodes? ● 10 000 connections? ● 200 000 RPS? ● 5 000 000 daily active users? ● monitoring? ● scalability? ● Multi-AZ? ● redundancy? ● fault tolerance? ● high availability? ● disaster recovery? 5
  • 6. What is a high load project? a project where an inefficient solution or a tiny bug has a huge impact on your business (due to a lack of resources)→ → causes an increase of costs $$$ or loss of reputation (due to performance degradation) 6
  • 7. High load Python projects from my experience ● Instant messenger: ○ 100 000+ connected users ○ 100+ nodes ○ 100+ developers ● Embedded system for traffic analysis: ○ we can't scale and upgrade hardware 7
  • 8. Typical mistakes ● Usage of a pure Python third-party dependency instead of C-based implementation 8
  • 9. Typical mistakes: Pure Python dependencies ● Usage of pure Python third-party dependency ● Example: JSON parsing ● Note: check out C-based libraries with Python binding ○ simplejson ○ ujson ○ python-rapidjson ● Note: run your own benchmarks 9
  • 10. Typical mistakes ● Usage of JSON as a serialization format by default 10
  • 11. Typical mistakes: JSON all the things ● Usage of JSON as a serialization format by default ● Note: Check out faster formats ○ MessagePack ○ Protocol Buffers ○ Apache Thrift ● Note: Run benchmarks, again! ● Note: Try using YAML for configuration files 11
  • 12. Typical mistakes ● Coding your high load Python project only with Python 12
  • 13. Typical mistakes: Pure Python codebase ● Coding your high load Python project only with Python ● Note: use multi language approach ○ Golang ○ NodeJS ○ Rust ● Note: fine tune performance of Python ○ Cython ○ CPython C/C++ extensions ○ PyPy and CFFI 13
  • 14. Typical mistakes ● Usage of synchronous Python frameworks for networking 14
  • 15. Typical mistakes: synchronous Python ● High amount of concurrent connections ● Multithreaded approach isn't efficient due to overhead ● Requires usage of a select implementation on backend: ○ poll ○ epoll ○ kqueue 15
  • 16. Typical mistakes: synchronous Python ● Note: use an asynchronous framework for high loaded solutions 16 Tornado
  • 18. Typical mistakes: synchronous Python ● Note: learn asyncio ● Note: check out the aio-libs 18 ○ aiohttp_admin ○ aiomcache ○ aiocouchdb ○ aiomeasures ○ aiobotocore ○ aiozmq ○ aioodbc ○ aiokafka ○ aioes ○ aiolocust ○ aiohttp ○ aiopg ○ aiomysql ○ aioredis ○ aiokafka
  • 19. Typical mistakes ● No usage of threads and processes in project's code 19
  • 20. Typical mistakes: no threads and processes usage ● Note: use threads to split different streams of work for IO-bound tasks ○ Flask ● Note: use processes to scale your IO-bound application inside one node ○ gunicorn + aiohttp ● Note: use threads or processes to delegate blocking jobs for CPU-bound tasks ○ ThreadPoolExecutor, ProcessPoolExecutor 20
  • 21. Typical mistake: deployment of a new feature without load testing 21
  • 22. Load testing of Python applications ● Purpose: predict when we fu*k production ● Must have for high load projects ● Helps to prevent the reputation losses ● Almost nobody does that 22
  • 23. Load testing 101 ● Identify how the load might grow up ○ More users ○ More data ○ More operations ○ Less servers ○ Unexpected edge cases 23
  • 24. Load testing 101 ● Define the most heavy and frequent operations ○ Insertions into data storages ■ PostgreSQL ■ ElasticSearch ■ Redis ○ Calculations and other CPU-bound tasks ○ Calls to external services ■ S3, etc. 24
  • 25. Load testing 101 ● Identify how to trigger the operations from a user's perspective ○ REST API endpoints ○ Periodic processing of collected data 25
  • 26. Load testing 101 ● Collect metrics of product, related to the operations ○ Application metrics with StatsD ■ Counters ■ Timers ○ Per node metrics with collectd ■ CPU ■ RAM ■ IO 26
  • 27. Load testing 101 27 ● Create a tool, which behaves like gazillion users ○ Establish network connections ○ Make HTTP requests ■ Send some data ■ Retrieve information from our server
  • 28. Load testing in practice 28
  • 29. Load testing in practice ● KPIdata is an asyncio-based pet project for assessing the quality of higher education ● KPI means Kyiv Polytechnic Institute ● Students and alumni use the web-site as Yelp for choosing faculties, chairs, and specialities to study ● Check it out on kpidata.org 29
  • 30. Load testing in practice ● LocustIO is a load testing tool written in Python ● Simulates millions of simultaneous users ● Runs load tests distributed over multiple hosts ● Supports HTTP, XMPP, XML-RPC ● Check it out on locust.io ● Note: it uses gevent under the hood 30
  • 31. Load testing in practice: key features of KPIdata 31
  • 32. Load testing in practice: key features of KPIdata 32
  • 33. Load testing in practice: key features of KPIdata 33
  • 34. Load testing in practice: key features of KPIdata 34
  • 35. Load testing in practice: identify the load ● More users ○ Admission campaign starts ○ More schoolchildren will know about the site ● More data ○ Semester ends and we will receive a lot of feedbacks ○ New universities will be involved 35
  • 36. Load testing in practice: define frequent operations ● Add a feedback for ○ faculty/semester/subject ● Retrieve statistics for ○ faculty/chair/group ● Search for a feedback ● Calculate ratings in background 36
  • 37. Load testing in practice: identify the triggers ● /feedback ● /faculty/{code} ● /chair/{id} ● /group/{name} ● /rating/{entity} 37 GET POST ● /feedback
  • 38. Load testing in practice: add application metrics 38 async def collect_web_handler_metrics (app, handler): async def middleware_handler (handler): path = request.path.replace( '/', '.') with statsd.timer('request.' + path): try: response = await handler(request) status_code = response.status except HTTPNotFound as response: status_code = 404 except Exception as response: status_code = 503 finally: statsd.incr('status_code.' .format(status_code) response.set_status(status_code) return response return middleware_handler
  • 39. Load testing in practice: create dashboards 39 ● Graphite ● Grafana ● DataDog
  • 40. Load testing in practice: create testing tool ● Create locustfile.py module for execution ● Define TaskSet of test functions ● Define HTTPLocust for spawning the tests 40
  • 41. Load testing in practice: create testing tool 41 class KPIdataTaskSet(TaskSet): """Set of tasks that a Locust user will execute""" @task def test_get_faculty(self): with self.client.get('/faculty/fpm', catch_response=True) as response: if response.status_code == 200: response.success() if not response._is_reported: response.failure('Wrong status code. Received: {}. Expected: 200.' .format(response.status_code))
  • 42. Load testing in practice: create testing tool 42 class KPIdataLocust(HttpLocust): """Represents HTTP user which attacks KPIdata web-site""" task_set = KPIdataTaskSet min_wait = 50 max_wait = 100 host = 'http://kpidata.org'
  • 43. Load testing in practice: before running tests ● Infrastructure: CPU utilization 43
  • 44. Load testing in practice: before running tests ● Random node: CPU utilization and Load Average 44
  • 45. Load testing in practice: testing in progress 45
  • 46. Load testing in practice: after testing ● Infrastructure: 50% CPU utilization 46
  • 47. Load testing in practice: after testing ● Random node: ○ 53% CPU utilization ○ 3.5 Load Average 47
  • 48. Results of load testing ● We know how many RPS we can serve with the environment ● We know what's going on when the limit is exceeded ● We know the bottlenecks of our platform ● We know if we can scale some part of the system 48
  • 49. 49
  • 50. Summary ● Try to find C-based analogs of 3rd party dependencies ● Check out serialization formats which are faster than JSON ● Fine tune your Python project with C extension, Cython or PyPy ● Write some services not in Python 50
  • 51. Summary ● Use asyncio and aiohttp for networking applications ● Use ThreadPoolExecutor for blocking operations ● Use processes for scaling inside a node ● Perform load testing for new features before pushing them to production 51
  • 52. Further reading ● @kakovskyi: Maintaining a high load Python project for newcomers ● @kakovskyi: Instant messenger with Python. Back-end development ● Asyncio-stack for web development ● PEP8 is not enough ● How HipChat Stores and Indexes Billions of Messages Using ElasticSearch ● A guide to analyzing Python performance ● Why Leading Companies Dark Launch - LaunchDarkly Blog ● What Is Async, How Does It Work, And When Should I Use It? 52