Obevo is an open-source database deployment tool that handles complex database schemas and deployments at enterprise scale. It addresses challenges such as maintaining migration files, determining dependency order, and onboarding existing production databases. Obevo represents database objects as files in a similar structure to code, enables stateful objects through multiple change sections, and uses dependency analysis and topological sorting to determine deployment order. It also supports ORM and in-memory database integration through translation layers.
6. Areas under Database Deployment
Production deployments
Non-production deployments (testing)
Code maintenance and organization
Solving existing messes 😀
6
7. deploy with
UAT config
DB Deploy Within SDLC
Deploy your database code as you would
deploy your application code (DevOps / IaC)
7
Source
Code
Binary
Unit Test UAT Production
compile
Run in
unit test deploy with
UAT config
deploy with
Prod config
Deploying Software
DDLs
DDL Package
compile
Deploy in in-
mem DB
deploy with
Prod config
Deploying a Database
Unit Test UAT
Productio
n
8. What goes into a DB Deployment
Stateful Scripts
(incremental definitions)
8
Stateless Scripts
(full / rerunnable definitions)
9. Stateful Script Deployments
9
create table Employee (
id bigint,
name VARCHAR(32),
status INT,
PRIMARY KEY (id)
)
ALTER TABLE employee
ADD department
VARCHAR(32)
CREATE TABLE employee (
id BIGINT,
name VARCHAR(32),
status INT,
department VARCHAR(32),
salary BIGDECIMAL,
PRIMARY KEY (id)
)
ALTER TABLE employee
ADD salary BIGDECIMAL
Note: full table DDL
is never executed in
the DB
10. Stateless Script Deployments
10
CREATE OR REPLACE VIEW
v_emp AS
SELECT * FROM employee
CREATE OR REPLACE VIEW
v_emp AS
SELECT * FROM employee
WHERE status = 0
CREATE OR REPLACE VIEW
v_emp AS
SELECT * FROM employee
WHERE status = 1
DEPT_ID,DEPT_NAME,TYPE
1,Finance,A
2,IT,A
3,Operations,B
DEPT_ID,DEPT_NAME,TYPE
2,IT,A
3,Operations,B
4,Tax,B
5,Research,C
DEPT_ID,DEPT_NAME,TYPE
2,IT,A
3,Operations,C
4,Tax,B
5,Research,C
6,Engineering,D
Note: full object
definition in DB is
represented in code
11. How DB Deploy Tools Work
Scripts modeled as entries in a deploy log
Tool applies changes not yet in the deploy log
11
s1
s2
s3
Source Code
User Schema
Deploy Log
s4
s5
s1
s2
s3
s4
s5
Database
V1
package
V2
package
Changeset:
s1 s2 s3
Changeset:
s4 s5
Apply to
schema
Apply to
log
Apply to
schema
Apply to
log
12. Migration Script Representation
DB Deploy tools will mostly differ on the following
aspects:
▪ How to representation migrations: migrations
per file? denoting subsections within a file?
▪ How to order migrations: use a naming
convention on the file? use a separate
“command file” to list the order?
▪ Rerunnability/mutability of scripts under certain
circumstances
12
14. Migration File Maintenance
14
Database Schema
/table
Employee
Department
Project
Committee
/view
V_Manager
V_CommManager
V_Consultant
Java (ORM)
com.mycompany.model
Employee.java
Department.java
Project.java
Committee.java
com.mycompany.helper
ManagerView.java
CommManagerView.java
ConsultantView.java
Migrations
version1.sql
version2_emp.sql
version2_dept.sql
version3.sql
version4_1.sql
version4_2.sql
version4_3.sql
V_Manager_1.sql
V_Manager_2.sql
V_Manager_3.sql
version5.sql
Redundant view
definitions
across files
Similar structure
in database as in
code
No clear mapping
between migrations
and objects
Easy to find which file to
edit/review/clean/deploy for
a particular object
(object name == file name)
Deploy order is
explicit and clear
15. Change Ordering
15
create table
(...)
add foreign key
DEPARTMENT on
dept_id
Employee table
create table
(...)
add column
dept_id
Department table
create view as
select *
from EMPLOYEE
where STATUS =
1
Manager view
create view as
select *
from MANAGER
join DEPARTMENT
...
CommManager view
So why not just arrange scripts by file?
How to order
scripts?
How to discover
dependencies?
How to handle
multiple scripts for
stateful objects?
16. At Scale and Complexity
Various object types (tables, SPs, views, packages, ...)
Many developers, many releases
Hundreds and thousands of objects
16
20. Obevo Approach
Let’s solve problems for all kinds of systems
20
New applications Long-lived systems
Tens/hundreds of objects Hundreds/thousands of objects
Tables only Tables, views, procedures, and more
Unit testing with in-memory databases Integration testing with regular databases
21. Technical Problems to Address
1) To facilitate editing, reviewing, and deploying
objects for both simple and complex projects
2) To integrate well with unit-testing and
integration-testing tools
3) To onboard existing production systems with
ease
21
23. Overview of the Benefits
23
Database Schema
/table
Employee
Department
Project
Committee
/view
V_Manager
V_CommManager
V_Consultant
Obevo File Representation
/table
Employee.sql
Department.sql
Project.sql
Committee.sql
/view
V_Manager.sql
V_CommManager.sql
V_Consultant.sql
Similar structure
in database as in
code Easy to find which file to
edit/review/clean/deploy for
a particular object
(object name == file name)
Can selectively deploy
subset of schema for
unit/integration testing
24. Revisiting the Challenges
24
create table
(...)
add foreign key
DEPARTMENT on
dept_id
Employee table
create table
(...)
add column
dept_id
Department table
create view as
select *
from EMPLOYEE
where STATUS =
1
Manager view
create view as
select *
from MANAGER
join DEPARTMENT
...
CommManager view
How to order
scripts?
How to discover
dependencies?
How to handle
multiple scripts for
stateful objects?
25. Break up stateful files into multiple sections
(Stateless files can remain as is)
Change Key = Object Name + Change Name
Handling Stateful Objects
25
//// CHANGE
name=”create”
create table (...)
//// CHANGE name=”FK”
add foreign key
DEPARTMENT on dept_id
Employee table
//// CHANGE name=”create”
create table (...)
//// CHANGE name=”dept_id”
add column dept_id
Department table
create view as
select *
from EMPLOYEE
where STATUS =
1
Manager view
26. Algorithm for ordering nodes in a graph that respect
edge dependencies
Topological Sorting for Ordering
26
EMPLOYEE.create
EMPLOYEE.FK
DEPARTMENT.create
DEPARTMENT.dept_id
V_MANAGER V_COMMMANAGER
Acceptable Orderings:
One example:
1. DEPARTMENT.create
2. DEPARTMENT.dept_id
3. EMPLOYEE.create
4. EMPLOYEE.FK
5. V_MANAGER
6. V_COMMMANAGER
Another example:
1. EMPLOYEE.create
2. V_MANAGER
3. DEPARTMENT.create
4. DEPARTMENT.dept_id
5. V_COMMMANAGER
6. EMPLOYEE.FK
... and many more
Great! Now how do we actually determine
these dependencies?
I have to parse my DBMS syntax!
27. Dependency Discovery
Low-tech solution: text-search for object names
Allow overriding false positives via annotations
27
create table
(...)
add foreign key
DEPARTMENT on
dept_id
Employee table
create table
(...)
add column
dept_id
Department table
create view as
select *
from EMPLOYEE
where STATUS =
1
Manager view
create view as
select *
from V_MANAGER
join DEPARTMENT
...
CommManager view
Object Names:
1. EMPLOYEE
2. DEPARTMENT
3. V_MANAGER
4. V_COMMMANAGER
29. ORM Integration
ORMs can generate latest view of DDLs, but more difficulty with
migration scripts
Rely on Obevo for deployment and verification that deployed DDLs
match your ORM model
29
Java
Domain
model
Full DDL
Definitions
1) DDL gen
via
Hibernate
SchemaExport
utility
Obevo
deploy
codebase
Obevo
baseline
validation
DB
2) Convert
new objects to
Obevo format
3) Convert all
objects to
baseline
4) Verify at build-time that
deploy scripts equal baseline
5) Deploy
scripts
30. In-memory Testing
In-memory databases have grown in
popularity for unit testing
How to test table DDLs in unit tests?
▪ Separate DDLs for in-memory DBs
(but original DBs are not tested)
▪ Use another language from SQL for
migrations (but lose out on SQL
ecosystem)
▪ What about a translation layer?
30
31. In-memory DB Translation Layer
Focus on preserving the main object
structure, avoid DBMS-specific values
Use ASTs to extract clauses from SQL;
handle or ignore irrelevant sections
Limited to certain object types (e.g. tables,
views, but no procedures)
Users can fall back to custom SQL if
needed
31
create table MyTable (
ID bigint autoincrement,
DESCRIPTION text,
COUNT bigint
) lock datarows
Handle via domains:
create domain
TEXT as
LONGVARCHAR
Ignore post-table
text (usually
relates to
storage)
Ignore or handle text
after column data type
34. Reverse Engineering
Prefer to reverse-engineer all objects in a schema
No strong standard API available that can generate
object definitions across DBMS types
▪ JDBC Metadata is inconsistently implemented
▪ SchemaCrawler API is a good start, but not at a
sufficient level for reverse engineering
34
35. Reverse Engineering - Approach
Obevo leverages vendor-provided APIs, e.g.
pg_dump, DB2LOOK, Oracle DBMS_METADATA
Most APIs only provide text output
Obevo has text-parsing logic to convert that output
to the Obevo folder structure
▪ This is an easier problem to solve than to try a
Java-based metadata API
35
36. Other Topics (beyond scope of talk)
Centralized permission management and cleanup
Rollback
Phased deployments
Long-running deployments and index creation
DB2 REORG handling
… and more …
36
40. CREDITS
Special shout-outs to some useful open-sourced
products leveraged along the way:
▪ SchemaCrawler for DB API access
▪ JGraphT for graph algorithm implementations
▪ SlidesCarnival for this presentation template
40
Editor's Notes
Motto: to solve database deployment problems no matter the complexity
Take poll of audience: see who is working on different stages of a product
New application vs. existing system
With SDLC vs. without SDLC
Let’s set the stage here for how we want to think about database deployments: as part of a regular development process.
We need not dive into this too much for the presentation, as this mindset has already grown within the industry and we need not be redundant.
We can quickly speak to the fact that this approach means not deploying databases via a custom UI. Some vendor products are like that. But we are focusing on those tools that can do this via code.
Take poll of audience: see whose applications use these various object types.
“Stateful” vs. “Stateless” refers to the kinds of migration scripts needed to maintain an object, and whether previously-run scripts against that object are still needed.
For Stateful object types, the end result of an object is the accumulation of incremental modification scripts on that object (akin to incremental commits to a database or to a source code repository).
Stateless object types can be recreated with a full object definition without needing to know about the previous scripts that updated the object.
(Note for presenters - this slide has animations)
Most DB Deploy tools are geared for Stateful objects, as those are the more prominent and tricky case to deal with (i.e. table DDLs).
As such, an important rule for such stateful scripts: they cannot be modified in source code after being deployed. These scripts can thus build up over time, and so tools may provide utilies to do cleanup, but such cleanup must be done thoughtfully.
Migrations are geared towards stateful objects. It is possible to represent stateless objects with stateful changes, but there is an overhead
We do acknowledge that it is possible to generate a “latest” view of the database after the fact, but it requires extra steps and is not possible with all tooling.
Anecdotally, for teams that haven’t onboarded to Obevo, we’ve seen some examples of teams maintaining the “database-like” structure for deployments for readability or deploying to unit tests, even though those scripts aren’t actually used for deployments. This is a redundancy that we sought to eliminate with Obevo.
Presenter’s note: this slide is animated
Note to audience: all SQL shown here is pseudo-sql. We condense it for readability on the slide.
Presenter’s note: the last bullet point on this slide is carried over into the diagram on the next slide
The problems described on the previous slides may ultimately not be too big for teams. Many folks have used tools with those patterns and it can work for them.
But, at the scale with which firms can operate, in terms of number of people, systems, and years-in-existence of an application, we do think it is a worthwhile effort to solve those problems. And obviously, we want to solve those problems for complex customers and still be easily usable for simpler use cases.
Presenter’s note: this diagram is intended to highlight the last bullet point on the previous slide. There is an animated entrypoint to this slide.
Presenter’s note: this slide was picked up from an earlier slide
Let’s revisit our earlier slide: how do we account for the problems listed here?
Presenter notes: the next slide will show the breakup of these files into the migration scripts that the tool understands.
https://en.wikipedia.org/wiki/Topological_sorting
AST parsing is the ideal answer, but not practical given the number of database dialects to handle and the difficult to actually obtain that parsing logic.
This solutions is low-tech but it works effectively well. In practice, the largest example we had for this was a system of 800 tables and 5000 stored procedures
Go into details on the ORM tooling. Why deploys cannot match
Side-point on Liquibase, but before diving into this, take a time-check and then a poll of the audience to see if they are familiar with it.
-------------------------------------------------
A popular tool that many folks do use.
Nice approach to abstract differences across DBMS types and to simplify the migration logic.
However, I still prefer the SQL-based approach, as the ecosystem is far too great for that. People know SQL, documentation is written for SQL, code-generation is written for SQL. I’d be averse to recommending a whole enterprise to move wholesale to a new “sublanguage”, especially when DBMS-specific features may still be needed.
That said, I do think Liquibase and Obevo could complement each other:
The Obevo deploy engine is agnostic of SQL (note the MongoDB implementation).
Thus, the Liquibase syntax could be a front-end to the Obevo implementation
We can leverage Liquibase’s benefits of representing table changes in a DBMS-agnostic manner, while leveraging Obevo’s core engine to handle objects at scale and stateless object types elegantly
I could use Liquibase to generate my test cases for Obevo across different platforms (as I do truly need to test all platforms).
However, I suspect that many applications only need to handle two dialects at most: their production DB, and their in-memory DB (if they use in-memory database testing at all)
We also leverage the translation functionality that the in-memory database systems provide; however, they cannot effectively provide translations on their own.
Talking point: I took this translation idea from another engineer on my team that used this in their homegrown tool. It was regular-expression based. From a technical perspective, I thought it was crazy that it would work. But it was very effective in practice, and so we continued with it. Regular expressions were fragile, and so we did refactor more away from it and towards ASTs, but nonetheless, the foundation for the idea was set.