Rails data migrations

Andrey Koleshko
Back-end developer @ Toptal
Github/twitter: @ka8725
Email: ka8725@gmail.com
Rails data migrations

● Code is never set in stone
● DB structure mutates
○ Columns/tables rename/drop
○ Move one type of relationship to other (e.g. from “belongs to” to
“has and belongs to many”, from “has many” to “has one”, etc.)
● Zero-downtime policy (production experience)
● Ton of data to migrate
● Public API exposed for other services
● NoSQL
The problem definition

● No production yet
● Production without zero-downtime policy
● Production with zero-downtime policy
Different situations

● No production yet
● Production without zero-downtime policy
● Production with zero-downtime policy
Different situations: the hardest case

Schema migrations != Data migrations
Tell things apart

class AddStatusToUser < AR::Migration
def up
add_column :users, :status, :string
end
def down
remove_column :users, :status
end
end
Tell things apart: schema migrations

def up
User.find_each do |user|
user.status = 'active'
user.save!
end
end
...
Tell things apart: data migrations

● Write data migrations inside schema migrations (1)
● Write data migrations separately from schema migrations (2)
Different solutions

● Write any Rails code carelessly (a)
● Redefine models and use them in place (b)
● Call migration data code written outside (seeds, services, etc.) (c)
● Raw SQL (d)
● Rake tasks (e)
Different solutions

|{1, 2} x {a, b, c, d, e}| = 10
Different solutions

● Do you need the migrations functioning forever?
● Is a developer environment important more than production?
Pick a solution based on balance

● Do you need the migrations functioning forever?
○ No, clean them up from time to time
○ Don’t run all migrations at fresh start
○ Local/staging loads dump and the final schema at once
○ Obfuscate dump if needed
● Is a developer environment important more than production?
○ Obviously no, see the points above
My choice

def up
User.find_each do |user|
user.status = 'active'
user.save!
end
end
...
Solution #1: Ruby code inside schema migration

● Error-prone - What if someone renames User model later?
● Not recommended
Solution #1: Ruby code inside schema migration

class User < ActiveRecord::Base; end
def up
User.find_each { |user| user.update!(status: ‘active’) }
end
...
Solution #2: Redefine models inside migrations

class User < AR::Base; belongs_to :role, polymorphic:
true; end
class Role < AR::Base; has_many :users, as: :role; end
----------------------------------------------------------
role = Role.create!(name: 'admin')
User.create!(nick: '@ka8725', role: role)
Solution #2: Redefine models inside migrations. Bug

> user = User.find_by(nick: '@ka8725')
> user.role # => nil

> user = User.find_by(nick: '@ka8725')
> user.role # => nil
> user.role_type # => AddStatusToUser::Role
Expected:
> user.role_type == Role # => true

● Much better than the previous one
● Error-prone - How to deal with tricky associations?
● Interesting bug with polymorphic associations
● Not recommended
Solution #2: Redefine models inside migrations

Common approach in Rails community?

● Has all previous problems
● Not a better choice
● Not recommended
Solution #3: Call migration data code written outside
from schema migrations

● Fast execution
● No previous problems
Solution #4: Raw SQL
● SQL knowledge
● More time to code

Solution #5: Rake tasks
● Define custom Rake tasks
● Run when needed
rake db_migration:fix_data

Solution #5: Rake tasks
● Not a bad choice
● Requires some manual work
● Can be automated
● Can be developed to similar solution as schema migrations
in Rails

Not bad solution for a start
● Define data migrations inside schema migrations
● But write tests for data migrations
● https://railsguides.net/change-data-in-migrations-like-a-boss/
● https://github.com/ka8725/migration_data

● Similar solution for schema migrations with versioning
○ https://github.com/ilyakatz/data-migrate
● Write SQL
● Schema migrations are made in several steps
○ https://blog.codeship.com/rails-migrations-zero-downtime/
● Heavy migrations (last for hours) are split into several
background jobs scheduled with some interval
The best choice suites production zero-downtime

Sort and run combined:
for local env only!

● Schema migrations should be fast (<1s)
● Avoid data migrations inside schema migrations
● Data migrations run after deployment
● Complementary actions are made on following deploys if the
data migration is run successfully
Production zero-downtime: deployment caveats

Zero downtime
Production
code
DB
Deploy timeline
Schema migrations
Symlink

Zero downtime
Production
code
DB
Deploy timeline
Schema migrations
Symlink
Data migrations

Split to smaller jobs
Process(1-1000) Process(10001-2000) Process(20001-3000)
j#1 j#2
j#3
j#6
j#4
j#7
j#8
j#5

@ka8725
Andrey Koleshko
Remotely working vetetran
Questions?

Rails data migrations

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Rails data migrations

Similar to Rails data migrations (20)

More from Andrei Kaleshka

More from Andrei Kaleshka (7)

Recently uploaded

Recently uploaded (20)

Rails data migrations