3. Intelligent Security Automation hexadite.com
Who We Are
Max Braitmaiere max@hexadite.com
VP of Engineering
Evgeniy (Jenya) Privalov jenya@hexadite.com
Software Developer
4. Intelligent Security Automation hexadite.com
Agenda
• Why Django
• Why Django ORM (for backend)
• Django DB connection handling (backend and frontend)
• Django + Postgres tweaks & tricks
• Running uTests with DB
6. Intelligent Security Automation hexadite.com
Advantages
• Written in Python.
• Complex solution for web apps
• Secure
• Batteries Included
• Predefined structures
• Community
• Good documentation
11. Intelligent Security Automation hexadite.com
Thread Local
local() holds all connection(s)
references in a Thread local store
Connections == [processes] * [threads]
"""
from django/db/utils.py
"""
...
class ConnectionHandler(object):
def __init__(self, databases=None):
"""
databases is an optional dictionary of database definitions
(structured
like settings.DATABASES).
"""
self._databases = databases
self._connections = local()
...
12. Intelligent Security Automation hexadite.com
Web Request Lifecycle
close_if_unusable_or_obsolete()
• if unrecoverable errors have
occurred
• if it outlived its maximum age
(CONN_MAX_AGE)
called on request started / finished
"""
from django/db/__init__.py
"""
...
# Register an event to reset transaction state and close connections past
# their lifetime.
def close_old_connections(**kwargs):
for conn in connections.all():
conn.close_if_unusable_or_obsolete()
signals.request_started.connect(close_old_connections)
signals.request_finished.connect(close_old_connections)
13. Intelligent Security Automation hexadite.com
Regular Backend Implications
• Any use of the ORM to work with data, will create a Thread
local connection
• Long running services don’t operate within a “request” == no
call to close_if_unusable_or_obsolete()
14. Intelligent Security Automation hexadite.com
The Transaction Block
from django.db import transaction
with transaction.atomic():
# query and/or change data...
with transaction.atomic():
# runs an “inner” transaction
Used to define the beginning and end of a
transaction.
We can leverage this to define a clear “scope
of work” in code.
NOTE: With async, don’t yield within a
transaction block
15. Intelligent Security Automation hexadite.com
Enter ClosingAtomic
"""
from (our) transaction.py
"""
class _ClosingAtomic(object):
def __init__(self, base_atomic):
self._atomic = base_atomic
def __enter__(self):
self._atomic.__enter__()
def __exit__(self, exc_type, exc_val, exc_tb):
# end actual transaction
self._atomic.__exit__(exc_type, exc_val, exc_tb)
# get connection
conn = django.db.transaction.get_connection(self._atomic.using)
# if db connection is not closed, and this is the last transaction - close connection
if conn.connection is not None and not conn.in_atomic_block:
conn.close()
16. Intelligent Security Automation hexadite.com
Our Own “Atomic”
"""
from (our) transaction.py
"""
class _ClosingAtomic(object):
"""
Open/Closes an atomic transaction, on last transaction close - terminates the DB connection.
"""
def closing_atomic(using=None, savepoint=True):
"""
...
"""
return _ClosingAtomic(django.db.transaction.atomic(using=using, savepoint=savepoint))
# default atomic transaction with closing
atomic = closing_atomic
17. Intelligent Security Automation hexadite.com
Enter Pooling
• Django middleware (connection pooling within a process)
• External Connection pooling
18. Intelligent Security Automation hexadite.com
PgBouncer (https://pgbouncer.github.io)
“pool_mode” Specifies when a server connection can be reused by other clients.
• session: Server is released back to pool after client disconnects. Default.
• transaction: Server is released back to pool after transaction finishes.
• statement: Server is released back to pool after query finishes.
PostgreSQL
PgBouncerApps
hundreds fixed ~40
19. Intelligent Security Automation hexadite.com
Enter CleanupAtomic
"""
from (our) transaction.py
"""
class _CleanupAtomic(object):
...
def __enter__(self):
conn = django.db.transaction.get_connection(self._atomic.using)
# if this is the first transaction block - cleanup connection if unusable
if not conn.in_atomic_block:
conn.close_if_unusable_or_obsolete()
self._atomic.__enter__()
def __exit__(self, exc_type, exc_val, exc_tb):
self._atomic.__exit__(exc_type, exc_val, exc_tb)
conn = django.db.transaction.get_connection(self._atomic.using)
# if this is the last transaction block - cleanup connection if unusable
if not conn.in_atomic_block:
conn.close_if_unusable_or_obsolete()
20. Intelligent Security Automation hexadite.com
Putting It All Together
def cleanup_atomic(using=None, savepoint=True):
return _CleanupAtomic(django.db.transaction.atomic(...))
# default atomic transaction with cleanup
atomic = cleanup_atomic
...
with atomic():
# exiting this context will trigger the “cleanup”
with atomic():
# exiting this context affects inner transaction (no “cleanup”)
• PgBouncer can “multiplex” many
connections passing through only X
transactions
• No constant connection closing
• Uses the CONN_MAX_AGE if set
• Usage is applicable for request handling
and long running services
• Default Django autocommit is not
destructive
22. Intelligent Security Automation hexadite.com
Text Indexing (PostgreSQL)
"""
from django.db.backends.postgresql.schema
"""
class DatabaseSchemaEditor(BaseDatabaseSchemaEditor):
def _create_like_index_sql(self, model, field):
…
if db_type.startswith('varchar'):
return self._create_index_sql(..., suffix='_like')
elif db_type.startswith('text'):
return self._create_index_sql(..., suffix='_like')
...
def _alter_field(self, model, old_field, new_field, old_type, new_type,
old_db_params, new_db_params, strict=False):
...
self._create_like_index_sql(model, new_field)
...
• Fields with database column
types varchar and text need a
second index that specifies their
operator class , which is needed
when performing correct LIKE
operations, outside the C locale
23. Intelligent Security Automation hexadite.com
Text Indexing (Custom Fields)
"""
from (our) fields.py
"""
class TextFieldIndexControlled(TextField):
...
class CharFieldIndexControlled(CharField):
def __init__(self, *args, **kwargs):
self.like_index = kwargs.pop('like_index', True)
super(CharFieldIndexControlled, self).__init__(*args, **kwargs)
def deconstruct(self):
name, path, args, kwargs = super(CharFieldIndexControlled, self).deconstruct()
# Only include 'like_index' if it's not the default
if self.like_index != True:
kwargs['like_index'] = self.like_index
return name, path, args, kwargs
24. Intelligent Security Automation hexadite.com
Text Indexing (SchemaEditor)
"""
from (our) schema.py
"""
class DatabaseSchemaEditor(schema.DatabaseSchemaEditor):
def _create_like_index_sql(self, model, field):
if not getattr(field, 'like_index', True):
# if like_index is False explicitly , don't create
return None
return super(DatabaseSchemaEditor, self)._create_like_index_sql(model, field)
def _alter_field(self, model, old_field, new_field, old_type, new_type,
old_db_params, new_db_params, strict=False):
...
# ADD YOU LOGIC HERE
_create_like_index_sql(..)
...
• SchemaEditor class is
responsible for emitting
schema-changing
statements to the
database
25. Intelligent Security Automation hexadite.com
Text Indexing (DatabaseWrapper)
"""
from (our) base.py
"""
from django.db.backends.postgresql_psycopg2 import base
from schema import DatabaseSchemaEditor
class DatabaseWrapper(base.DatabaseWrapper):
SchemaEditorClass = DatabaseSchemaEditor # our schema editor
DataBaseWrapper is a class that
represent the DB connection in Django
27. Intelligent Security Automation hexadite.com
Bulk delete
Django needs to fetch objects into memory to send signals and handle cascades. However, if there are no cascades and no signals,
then Django may take a fast-path and delete objects without fetching into memory. For large deletes this can result in significantly
reduced memory usage.
https://docs.djangoproject.com/en/1.10/ref/models/querysets/#delete
28. Intelligent Security Automation hexadite.com
QuerySets Iterator
"""
from (our) iterators.py
"""
def querysets_bulk_iterator(qs, batchsize=500, gc_collect=True, nullify_each_iteration=False):
start = 0
end = batchsize
last_element = max(qs.count(), batchsize)
while start <= last_element:
if nullify_each_iteration:
yield qs.order_by('pk')[:batchsize]
else:
yield qs.order_by('pk')[start:end]
start += batchsize
end += batchsize
if gc_collect:
gc.collect()
raise StopIteration
• gc.collect() explicitly tell to
the garbage collector start to
collect.
• when done with each yielded
queryset need to call del() on
the queryset
29. Intelligent Security Automation hexadite.com
Foreign Keys
class User(models.Model):
name = models.CharField()
class Order(models.Model):
user = models.ForeignKey(User, related_name="orders", db_constraint=False, db_index=False)
• Maintain integrity and
consistency of data
• Dramatic slow down on writes
• Django has out of the box
solution db_constraint=False
• Indexing may also not be
required db_index=False
30. Intelligent Security Automation hexadite.com
PostgresSQL Tunings
• models.PositiveIntegerField
• Find useless, unused indexes and delete them.
• Use “ANALYZE” and “EXPLAIN” to optimize PostgreSQL queries.
• Vacuum database.
• Analyse database.
• Tune PostgreSQL server configuration.
32. Intelligent Security Automation hexadite.com
Best Practice
• Regular utest: mock data calls, will allow best performance
• Django utest: Use in memory sqlite, performs all DB
operations much faster.
What if our DAL/BL uses some custom capabilities?
• Array Fields
• JSON Field query
• etc
33. Intelligent Security Automation hexadite.com
Test DB
Use a predefined suffix:
• Allows isolation
• Multiple tests can run on
same machine
"""
from (test) settings.py
"""
import os
TEST_DB_SUFFIX = os.getenv( 'OUR_TEST_DB_SUFFIX', '')
DATABASES = {
'default': {
'ENGINE': 'our.backends.postgresql_psycopg2',
'HOST': '127.0.0.1',
'PORT': '5432',
'NAME': 'db' + TEST_DB_SUFFIX,
'TEST': {'NAME': 'test_db' + TEST_DB_SUFFIX},
'USER': 'user',
'PASSWORD': 'password',
'CONN_MAX_AGE': 300, # persistent connections for 5
minutes
},
}
37. Intelligent Security Automation hexadite.com
Data Migration
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.db import migrations
def update_usernames(apps, schema_editor):
User = apps.get_model('app', 'User')
User.objects.update(name=Upper(F('name')))
class Migration(migrations.Migration):
dependencies = [
('app', '0001_initial'),
]
operations = [
migrations.RunPython(update_usernames),
]
• Data Migration - change or
loading of data. Should be
done manually.
• For large data sets could take a
lot of time.
38. Intelligent Security Automation hexadite.com
Delayed Migration
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.db import migrations
def update_usernames_delayed(apps, schema_editor):
User = apps.get_model('app', 'User')
User.objects.update(name=Upper(F('name')))
class Migration(migrations.Migration):
dependencies = [
('app', '0001_initial'),
]
operations = [
# REGULAR DJANGO MIGRATION OPERATIONS
]
delayed_operations = [
migrations.RunPython(update_usernames_delayed),
]
• delayed_operations won’t
run with regular operations
39. Intelligent Security Automation hexadite.com
Delayed Migration (Migrate Command)
MIGRATIONS_DUMPS_PATH = os.path.join(os.path.dirname(__file__), 'migrations')
MIGRATIONS_PATH = os.path.join(os.path.dirname(__file__), 'migrations',
'delayed_migrations.json')
class Command(CommandMigration):
def handle(self, *args, **options):
self.delayed_migrations_to_run = []
super(Command, self).handle(*args, **options)
if self.delayed_migrations_to_run:
with open(MIGRATIONS_PATH, 'wb') as f:
json.dump(self.delayed_migrations_to_run, f)
def migration_progress_callback(self, action, migration=None, fake=False):
if action == 'apply_success' and not fake:
if hasattr(migration, 'delayed_operations'):
self.delayed_migrations_to_run.append("%s.%s" % (migration.app_label, migration.name))
return super(Command, self).migration_progress_callback(action, migration, fake)
• Add custom migrate command
which will be use
delayed_operations
• Run all your migrations with
this command
40. Intelligent Security Automation hexadite.com
Delayed Migration (Command)
class Command(CommandMigration):
def handle(self, *args, **options):
self.executor = MigrationExecutor(self.connection, self.migration_progress_callback)
state = ProjectState(real_apps=list(self.loader.unmigrated_apps))
states = self.build_migration_states(plan, state)
try:
for migration in plan:
self.apply_migration(states[migration], migration)
plan.remove(migration)
finally:
self.update_migrations_state(plan)
• Run custom “delayed”
migration when you need
Cyber analyst thinking at the speed of automation
Modeled after the investigative and decision-making skills of top cyber analysts and driven by artificial intelligence, Hexadite Automated Incident Response Solution (AIRS™) remediates threats and compresses weeks of work into minutes. With analysts free to focus on the most advanced threats, Hexadite optimizes overtaxed security resources for increased productivity, reduced costs and stronger overall security.
3yr old startup, working mainly with Python (2.7) stack (Angular2 frontend)
Presentation is updated for Django 1.10.5 and PostgresSQL 9.5
Here we are going to give an overview of why we went with Django in general
Complex solution for web apps: templates, forms, orm, etc.
Secure: Django provides good security protection out of the box
Batteries Included: A lot of functionality provided out of the box + third party packages that can help to accelerate development
Predefined structures: urls, apps, models
Monolithic application: You always get all django apps, you cannot off the features you don’t need
Specific consideration to use Django ORM with web-apps and with backend services
Application design, maintainability: you don’t deal with your DB, like every ORM
Code and Business-Logic reuse: using Django ORM in web apps and re-use same logic and ORM in backend services
Our issues with connection handling in service applications
Thread Local has the advantage of avoiding handling with locks (and multi-threading sync overhead in general)
The total active connections is equal to total number of active threads that ever accessed the DB (across all relevant processes)
CONN_MAX_AGE
Default: 0
The lifetime of a database connection, in seconds. Use 0 to close database connections at the end of each request — Django’s historical behavior — and None for unlimited persistent connections.
“Connections Leak” any thread may create a connection which can stay persistent without the thread ever requiring it again or remain in a bad state.
Our atomic wrapper for connections closing on last transaction block
When working with async libs such as asyncio/trollius yielding within a transaction block will have unexpected results as the state of the connection will be passed to an undetermined code path.
Our atomic wrapper for connections closing on last transaction block
The Good:
Defining our own “atomic” allows us to use it across our code base with an easy way to tweak and change behavior in the future
Connections are cleaned up (closed)
The Bad:
Opening and closing connections on end of every transaction is “expensive” and may be unnecessary
Any code that does not use transaction.atomic() may still “leak” conections
A middleware solution is not optimal if an external pooling solution is possible, as it allows a better separation of concerns where each component does “one thing”
PgBouncer allows multiplexing hundreds of connections, re-using a pre-established connection pool to the DB.
We selected the “transaction” mode this allows us to support both actual transactions and the default autocommit (see https://docs.djangoproject.com/en/1.10/topics/db/transactions/ )
Our atomic wrapper for connections cleanup.
First and last transaction block will call close_if_unusable_or_obsolete(): reset transaction state and close connections past their lifetime.
Our atomic wrapper for connections closing on last transaction block (uses close_if_unusable_or_obsolete() )
2 indexes per each varchar and text fields, more indexes == slower inserts , updates. In case we don’t want to use them, we can remove them after creation with simple sql query or prevent their creation
First stage is to create a Char/Text field definition that adds a new option “like_index”
deconstruct() is called to produce the migration files content
change logic in alter field of our custom editor, skipps producing the “like” index if forced off (like_index == False)
In order to add custom schema editor to our project , we need to change django database engine and in order to do that we need to change the DatabaseWarpper the class that represent connection.
In order to add custom schema editor to our project , we need to change django database engine and in order to do that we need to change the DataBaseWarpper the class that represent connection.
set nullify to True if you delete the queryset
every time your insert or update row with foreign key database should to check if this key exists (and index the value)
impact depends on DB
Indexing: In cases we always filter by more than one field, it may be redundant to index all fields - even if those are “foreign keys”
useful references
PositiveIntegerField will add a constraint in DB - in general check the actual schema that gets created (when first using a new field type)
Using a real DB has its issues, but can be a better fit when we want to test some related BL and make sure it works in production.
More optionals topics
it’s a problem when you want to separate schema and data migrations but still to preserve migrations mechanism
python manage.py migrate will be applied only for operations, for delayed_operations we will save migrations execution plan
with the migration progress we will update the delayed execution path in proper order, in the end execution path is saved to file ( different approach could be applied, database for example)
run delayed migrations when you need
a lot of code is missing here..