Obevo: Database Deployment Tool for Enterprise Scale

Obevo
NY Java SIG
You can find me at:
Github: shantstepanian
Twitter: @shantstepanian
shant.p.stepanian@gmail.com
1

Obevo: At a Glance
Database Deployment Tool handling Enterprise
Scale and Complexity
Open-sourced by Goldman Sachs in 2017
http://github.com/goldmansachs/obevo
DB2, Oracle, PostgreSQL, SQL Server, Sybase,
MongoDB
2

Agenda: Product Overview
Overview of database deployment problem space
Overview of tooling in the market
How Obevo helps
3

Agenda: Hands-on Demo
Perform a simple database deployment
Explore Java integrations: ORM and in-memory
testing
Reverse-engineering existing applications
4

Areas under Database Deployment
Production deployments
Non-production deployments (testing)
Code maintenance and organization
Solving existing messes 😀
6

deploy with
UAT config
DB Deploy Within SDLC
Deploy your database code as you would
deploy your application code (DevOps / IaC)
7
Source
Code
Binary
Unit Test UAT Production
compile
Run in
unit test deploy with
UAT config
deploy with
Prod config
Deploying Software
DDLs
DDL Package
compile
Deploy in in-
mem DB
deploy with
Prod config
Deploying a Database
Unit Test UAT
Productio
n

What goes into a DB Deployment
Stateful Scripts
(incremental definitions)
8
Stateless Scripts
(full / rerunnable definitions)

Stateful Script Deployments
9
create table Employee (
id bigint,
name VARCHAR(32),
status INT,
PRIMARY KEY (id)
)
ALTER TABLE employee
ADD department
VARCHAR(32)
CREATE TABLE employee (
id BIGINT,
name VARCHAR(32),
status INT,
department VARCHAR(32),
salary BIGDECIMAL,
PRIMARY KEY (id)
)
ALTER TABLE employee
ADD salary BIGDECIMAL
Note: full table DDL
is never executed in
the DB

Stateless Script Deployments
10
CREATE OR REPLACE VIEW
v_emp AS
SELECT * FROM employee
v_emp AS
WHERE status = 0
v_emp AS
WHERE status = 1
DEPT_ID,DEPT_NAME,TYPE
1,Finance,A
2,IT,A
3,Operations,B
2,IT,A
3,Operations,B
4,Tax,B
5,Research,C
2,IT,A
3,Operations,C
4,Tax,B
5,Research,C
6,Engineering,D
Note: full object
definition in DB is
represented in code

How DB Deploy Tools Work
Scripts modeled as entries in a deploy log
Tool applies changes not yet in the deploy log
11
s1
s2
s3
Source Code
User Schema
Deploy Log
s4
s5
s1
s2
s3
s4
s5
Database
V1
package
V2
package
Changeset:
s1 s2 s3
Changeset:
s4 s5
Apply to
schema
Apply to
log
Apply to
schema
Apply to
log

Migration Script Representation
DB Deploy tools will mostly differ on the following
aspects:
▪ How to representation migrations: migrations
per file? denoting subsections within a file?
▪ How to order migrations: use a naming
convention on the file? use a separate
“command file” to list the order?
▪ Rerunnability/mutability of scripts under certain
circumstances
12

Problems with current tooling?
13

Migration File Maintenance
14
Database Schema
/table
Employee
Department
Project
Committee
/view
V_Manager
V_CommManager
V_Consultant
Java (ORM)
com.mycompany.model
Employee.java
Department.java
Project.java
Committee.java
com.mycompany.helper
ManagerView.java
CommManagerView.java
ConsultantView.java
Migrations
version1.sql
version2_emp.sql
version2_dept.sql
version3.sql
version4_1.sql
version4_2.sql
version4_3.sql
V_Manager_1.sql
V_Manager_2.sql
V_Manager_3.sql
version5.sql
Redundant view
definitions
across files
Similar structure
in database as in
code
No clear mapping
between migrations
and objects
Easy to find which file to
edit/review/clean/deploy for
a particular object
(object name == file name)
Deploy order is
explicit and clear

Change Ordering
15
create table
(...)
add foreign key
DEPARTMENT on
dept_id
Employee table
create table
(...)
add column
dept_id
Department table
create view as
select *
from EMPLOYEE
where STATUS =
1
Manager view
create view as
select *
from MANAGER
join DEPARTMENT
...
CommManager view
So why not just arrange scripts by file?
How to order
scripts?
How to discover
dependencies?
How to handle
multiple scripts for
stateful objects?

At Scale and Complexity
Various object types (tables, SPs, views, packages, ...)
Many developers, many releases
Hundreds and thousands of objects
16

At Scale and Complexity
Too complex! Why bother?
Because systems live on and still need development.
Can we work through this?
18

Obevo Approach
Let’s solve problems for all kinds of systems
20
New applications Long-lived systems
Tens/hundreds of objects Hundreds/thousands of objects
Tables only Tables, views, procedures, and more
Unit testing with in-memory databases Integration testing with regular databases

Technical Problems to Address
1) To facilitate editing, reviewing, and deploying
objects for both simple and complex projects
2) To integrate well with unit-testing and
integration-testing tools
3) To onboard existing production systems with
ease
21

Object-Based Code
Organization

Overview of the Benefits
23
Database Schema
/table
Employee
Department
Project
Committee
/view
V_Manager
V_CommManager
V_Consultant
Obevo File Representation
/table
Employee.sql
Department.sql
Project.sql
Committee.sql
/view
V_Manager.sql
V_CommManager.sql
V_Consultant.sql
Similar structure
in database as in
code Easy to find which file to
edit/review/clean/deploy for
a particular object
(object name == file name)
Can selectively deploy
subset of schema for
unit/integration testing

Revisiting the Challenges
24
create table
(...)
add foreign key
DEPARTMENT on
dept_id
Employee table
create table
(...)
add column
dept_id
Department table
create view as
select *
from EMPLOYEE
where STATUS =
1
Manager view
create view as
select *
from MANAGER
join DEPARTMENT
...
CommManager view
How to order
scripts?
How to discover
dependencies?
How to handle
multiple scripts for
stateful objects?

Break up stateful files into multiple sections
(Stateless files can remain as is)
Change Key = Object Name + Change Name
Handling Stateful Objects
25
//// CHANGE
name=”create”
create table (...)
//// CHANGE name=”FK”
add foreign key
DEPARTMENT on dept_id
Employee table
//// CHANGE name=”create”
create table (...)
//// CHANGE name=”dept_id”
add column dept_id
Department table
create view as
select *
from EMPLOYEE
where STATUS =
1
Manager view

Algorithm for ordering nodes in a graph that respect
edge dependencies
Topological Sorting for Ordering
26
EMPLOYEE.create
EMPLOYEE.FK
DEPARTMENT.create
DEPARTMENT.dept_id
V_MANAGER V_COMMMANAGER
Acceptable Orderings:
One example:
1. DEPARTMENT.create
2. DEPARTMENT.dept_id
3. EMPLOYEE.create
4. EMPLOYEE.FK
5. V_MANAGER
6. V_COMMMANAGER
Another example:
1. EMPLOYEE.create
2. V_MANAGER
3. DEPARTMENT.create
4. DEPARTMENT.dept_id
5. V_COMMMANAGER
6. EMPLOYEE.FK
... and many more
Great! Now how do we actually determine
these dependencies?
I have to parse my DBMS syntax!

Dependency Discovery
Low-tech solution: text-search for object names
Allow overriding false positives via annotations
27
create table
(...)
add foreign key
DEPARTMENT on
dept_id
Employee table
create table
(...)
add column
dept_id
Department table
create view as
select *
from EMPLOYEE
where STATUS =
1
Manager view
create view as
select *
from V_MANAGER
join DEPARTMENT
...
CommManager view
Object Names:
1. EMPLOYEE
2. DEPARTMENT
3. V_MANAGER
4. V_COMMMANAGER

ORM Integration
ORMs can generate latest view of DDLs, but more difficulty with
migration scripts
Rely on Obevo for deployment and verification that deployed DDLs
match your ORM model
29
Java
Domain
model
Full DDL
Definitions
1) DDL gen
via
Hibernate
SchemaExport
utility
Obevo
deploy
codebase
Obevo
baseline
validation
DB
2) Convert
new objects to
Obevo format
3) Convert all
objects to
baseline
4) Verify at build-time that
deploy scripts equal baseline
5) Deploy
scripts

In-memory Testing
In-memory databases have grown in
popularity for unit testing
How to test table DDLs in unit tests?
▪ Separate DDLs for in-memory DBs
(but original DBs are not tested)
▪ Use another language from SQL for
migrations (but lose out on SQL
ecosystem)
▪ What about a translation layer?
30

In-memory DB Translation Layer
Focus on preserving the main object
structure, avoid DBMS-specific values
Use ASTs to extract clauses from SQL;
handle or ignore irrelevant sections
Limited to certain object types (e.g. tables,
views, but no procedures)
Users can fall back to custom SQL if
needed
31
create table MyTable (
ID bigint autoincrement,
DESCRIPTION text,
COUNT bigint
) lock datarows
Handle via domains:
create domain
TEXT as
LONGVARCHAR
Ignore post-table
text (usually
relates to
storage)
Ignore or handle text
after column data type

Reverse Engineering
How to onboard this system?
33

Reverse Engineering
Prefer to reverse-engineer all objects in a schema
No strong standard API available that can generate
object definitions across DBMS types
▪ JDBC Metadata is inconsistently implemented
▪ SchemaCrawler API is a good start, but not at a
sufficient level for reverse engineering
34

Reverse Engineering - Approach
Obevo leverages vendor-provided APIs, e.g.
pg_dump, DB2LOOK, Oracle DBMS_METADATA
Most APIs only provide text output
Obevo has text-parsing logic to convert that output
to the Obevo folder structure
▪ This is an easier problem to solve than to try a
Java-based metadata API
35

Other Topics (beyond scope of talk)
Centralized permission management and cleanup
Rollback
Phased deployments
Long-running deployments and index creation
DB2 REORG handling
… and more …
36

Onto the Kata!
https://github.com/goldmansachs/obevo-kata/
38

THANKS!
Any questions?
You can find me at:
@shantstepanian
shant.p.stepanian@gmail.com
39

CREDITS
Special shout-outs to some useful open-sourced
products leveraged along the way:
▪ SchemaCrawler for DB API access
▪ JGraphT for graph algorithm implementations
▪ SlidesCarnival for this presentation template
40

Obevo: Database Deployment Tool for Enterprise Scale

Recommended

Recommended

More Related Content

Similar to Obevo: Database Deployment Tool for Enterprise Scale

Similar to Obevo: Database Deployment Tool for Enterprise Scale (20)

Recently uploaded

Recently uploaded (20)

Obevo: Database Deployment Tool for Enterprise Scale

Editor's Notes