This document discusses Python web application development. It summarizes popular packages for web development with Flask including SQLAlchemy, Celery, and TensorFlow Model Server. It provides best practices for Flask, Celery, and Docker deployment. It also discusses profiling Python applications and handling signals in Docker containers.
2. Web Development packages
• Flask - Backend framework
• SQLAlchemy - ORM
• Celery - Task Queue
• Pydantic - Validation
• dashboard-support-py - Internal Authentication
• uWSGI - WSGI complaint web server
• tensorflow_model_server - Deploy models in production
• Redis - In-memory key–value
database
• docker - containerization
3. Flaskname inspired from bottle
• WSGI compliant microframework, with no batteries included. Gives developer utmost flexibility.
• Not suitable for large codebases where MVC structure makes organization easier.
• Includes development server for quick development, that should not be used in production
• Does not come with ORM and is most commonly used with SQLAlchemy.
• Good Practices:
• Use application factory and blueprints for organizing feature level code.
• Use existing exception classes and status code instead of rewriting them again.
• Add logging to all exceptions.
• Application and Request Contexts
• Application context and request contexts are pushed in order for each request.
• Flask has several thread-local variables like current_app(proxy to application context),
request(proxy to request context) and g which hold reference to any object like
database_connection throughout the entire request-response cycle.
• Contexts should always be pushed before accessing these current_app and request variables
or you would end up with common RuntimeError: Working outside of request context.
4. Celery
• Commonly used task queue library in python with
support for many brokers RabbitMQ, Redis, SQS etc.,
• Task names with arguments are serialized (supports
json, pickle) and sent over the network to broker
• Support for multiple execution pools, like prefork
(multiprocessing), solo (single process), eventlet,
gevent
• Has signal handling mechanisms for SIGTERM
• Redis as a Broker
• Task messages concerning a queue are stored as
list. Number of pending tasks can be found by
issuing LLEN from redis-cli or client library.
• New tasks are pushed with LPUSH by producer
• Tasks are prefetched by workers using BRPOP
5. Celery Good Practices
• Keep tasks idempotent as much as possible(retry should always produce same results).
• Common exceptions like network errors, should be caught and retried with exponential backoff
• Create a subclass of BaseTask and add handlers for task success, failure, return and retries.
• Use send_task or signature methods importing of calling task’s apply_async or delay methods.
• Use Signals for decoupling application logic with tracking/cleanup logic. Signals are executed either
in producer process or worker MainProcess/ForkProcess depending on execution pool
• Useful signals include after_task_publish, task_prerun, celeryd_init, worker_process_init
• Don’t use result_backend in production(if the process that called it doesn’t wait on its result) for
performance improvement.
• Good Configuration to start with
• worker_prefetch_multiplier = 1
• task_acks_late = True
• task_reject_on_worker_lost = False
• worker_max_tasks_per_child = 2000
• task_ignore_result = True
• task_soft_time_limit = 120
• task_time_limit = 180
6.
7. Deployment
• flask
• uWSGI
• WSGI compliant production grade web server with limited community support.
• Has many configuration options and legacy defaults which are difficult to tune for
specific use cases.
• mod_wsgi with Apache
• Model inference
• Tensorflow model server
• High-performant model server with batching support for better utilization of hardware
resources. Exposes grpc and REST APIs natively. Recommended to use gRPC API.
• gRPC
• Has language agnostic implementation with similar API in different languages
• Serializes the request body using protocol buffers resulting in low network payload
• Uses faster HTTP/2 protocol underneath, and has its own advantages over HTTP/1.1
8. Profiling
• Profile your applications to find where it’s spending time and optimise those areas.
• Work by periodically collecting samples of execution point of a process by interrupting the
OS.
• py-spy is one such profiler specific to python. Produces nice flame graphs and stack traces
9. Docker - PID 1 case
• Scaling up/down mechanisms for load balancing should be designed carefully for containers.
• Docker sends the SIGTERM signals(handlers inside application defined by application developers)
to the running containers for graceful shutdowns while scaling down. If the container does not
shutdown in the predefined duration after cleaning up (10-30 seconds), docker sends SIGKILL.
• Docker runs the process defined in ENTRYPOINT with PID 1 and passes all the signals to this
process. This has certain limitations:
• If the ENTRYPOINT is defined as /bin/bash -c celery -A celery_app worker or entrypoint.sh,
shell gets PID 1 with celery as its child process(different PID).
• Shell behaves differently when it is run as init process. It ignores SIGTERM/SIGINT signals.
• So the child process (celery) doesn’t receive any signal and continues to pick more tasks from
the broker. Finally, when the SIGKILL is sent, worker exits instantaneously leading in process
failures.
• Solution:
• Use exec to run the command inside entrypoint.sh like exec celery -A celery_app worker
inside entrypoint.sh This replaces the bash process and takes PID 1.
• If required, lightweight init system like tini (which takes PID 1) can also be used on top of this
as they have signal passing capabilities to child processes.