The document discusses object-relational impedance mismatch that occurs when object-oriented programming languages are used to interact with relational databases. It provides examples from a Ruby on Rails application that uses an ORM called ActiveRecord to show how straightforward object mapping can lead to many database queries being executed to retrieve related data. The document argues this performance issue could be avoided by better integrating database operations directly into objects and methods.
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedance Mismatch @ Postgres Open
1. A New Approach to Resolve
Object-Relational Impedance
Mismatch
Henrietta Dombrovskaya
Enova
Chicago IL
hdombrovskaya@enova.com
2. What is Object-Relational
Impedance Mismatch and why It is
bad?
Why should we care?
We should care because we care
about..
2013 Henrietta Dombrovskaya – Enova 2
3. Database Application Performance
Everybody wants their application to function efficiently,
right?
After all – why choose to use databases in the first
place?
Because
The DBMS is specialized software designed
to manage data in the most efficient way.
Nevertheless, the most common complaint of application
developers is ….
2013 Henrietta Dombrovskaya – Enova 3
5. Let’s see…
Where do we usually start to look, if we want
to see, why the database performance is
bad?...
2013 Henrietta Dombrovskaya – Enova 5
6. Our plan of actions
Look at the most long-running queries and try to
optimize them
Look at the queries, which are executed most
often, and make sure all of them are tuned and/or
running fast enough
Everything looks good, why is the app still slow?
2013 Henrietta Dombrovskaya – Enova 6
7. Helpful reports:
2013 Henrietta Dombrovskaya – Enova 7
Our performance monitoring tool (pgBadger) produces
several classes of reports, among them:
Longest-running queries (slowest queries)
Most frequently running queries
Queries, which take up the most time
(top offenders).
8. Based on these reports - is a database
really slow?
The top 50 most offensive queries, i.e. the queries
which eat up the most of the execution time, almost
exclusively are the queries, which are very short
(fast), but executed too many times
For example
SELECT
*
FROM
loans
WHERE
id=?
is executed 8,500,000 times during the day, each
time taking several milliseconds, with total execution
time about 2.5 hours.
2013 Henrietta Dombrovskaya – Enova 8
9. You may say – that’s how many times the
screen is refreshed, but…
UK: account home controller as accessed about 50,000
times during the day, and the customer support tool even
less
However, the pgBadger log shows total almost 1.5 million
executions of different versions of the
SELECT sum_accounts
statement and similar number of
SELECT*
FROM
loans.
Statistics show that some application controllers produce
over 1,000 database calls for each screen refresh.
2013 Henrietta Dombrovskaya – Enova
10. How could this possibly be happening?!
Let’s take a step back…
2013 Henrietta Dombrovskaya – Enova 10
11. First came a program…
• Once upon a time there was a program…
• Until mid-60’s – only sequential access storage was available; a program was
considered a stable object, which may process different input data, and produce
different output
Input Output
• Direct access storage appeared in the early 70s, allowing concurrent data access and
selective updates. At the same time developers started to face rising complexity and
costs of software development, concurrency control and data integrity problems, and
a need of specialized data-management software became apparent.
• DBMS emerged as specialized programs for centralized data management
2013 Henrietta Dombrovskaya – Enova
Program
11
DBMS
12. Since Then We Have…
-Imperative programming languages, which tells, how to do
things
for
(i:=1,
i++,
n)
do
…
end;
and
- Declarative data manipulating languages, which define
what to do:
SELECT
first_name,
last_name
FROM
people
WHERE
id=101
2013 Henrietta Dombrovskaya – Enova 12
13. Database provides:
• Independent data storage
• Complex data structures
• Data integrity
• Concurrent access and data consistency
Application provides:
• User interface
• Application control flow
2013 Henrietta Dombrovskaya – Enova 13
14. So…
Now everybody is happy, right?
…not really…
2013 Henrietta Dombrovskaya – Enova 14
15. Why? Because…
Both imperative programming languages and
declarative query languages work perfectly to
accomplish the tasks the were designed to
accomplish.
The problems start, when we try to make them to work
together.
2013 Henrietta Dombrovskaya – Enova 15
16. What is Impedance Mismatch?
These tools were designed to operate with objects of different size
(granularity): variables vs. tables
And different means of identification: address vs. set of attribute values
More precisely:
• Object incompatibility - database objects are “too big” for the
imperative programming language
• This means that classes and methods for tables and sets should be
integrated with the database to work effectively (methods should be
executed by the DB engine)
• But most of the architectures do not allow such integration, which
leads to reimplementation of database operations at the application
layer
2013 Henrietta Dombrovskaya – Enova 16
17. When We Have an OO Application…
• During one screen rendering: objects may belong to
different classes with different access methods
• Object-Relational Mapping (ORM): maps a database
object to the in-memory application object:
– solves a problem of abstraction from details of data storing
– does not provide effective means of manipulation with data sets
This particular case of impedance mismatch is called ORIM
– object-relational impedance mismatch
2013 Henrietta Dombrovskaya – Enova 17
18. ORIM Definition
“A set of conceptual and technical difficulties that is
encountered when a relational database management
system (RDBMS) is being used by a program written in
an object-oriented programming language or style,
particularly apparent when objects or class definitions
are mapped in a straightforward way to database
tables or relational schema”
Ambler, S., Agile Database Techniques: Effective
Strategies for the Agile Software Developer
2013 Henrietta Dombrovskaya – Enova 18
19. Development Standards & Performance
Internal company standards and/or industry standards may
prevent from creating efficient database application.
Typical OO design recommendations - clearly separate the
following layers of classes:
• End-user interface
• Business logic
• Relationships
• Interaction with the database
The standard allows limited interaction between adjacent
levels, therefore equivalents of database operations may
be implemented within several different processes,
sometimes on different machines.
2013 Henrietta Dombrovskaya – Enova 19
20. Translation:
• As a database developer you would assume, that to get
all payments for all loans for a given customer, you need
to join a loans table with the payments table
• But for an application developer loans and payments
belong to different classes, therefore they are accessed
by different sets of methods
• Which means, that the join operation is replicated
“manually”..
Let’s look at some examples…
2013 Henrietta Dombrovskaya – Enova 20
21. What We Have at Enova?
Enova is a Ruby on Rails shop that uses the
ActiveRecord Object Relational Mapping (ORM) library
to communicate with a Postgres database.
With ActiveRecord a database table or view is mapped
into a class and an object instance is tied to a single
row in the table.
ActiveRecord is the default “model” component of the
model-view-controller web-application framework Ruby
on Rails, and is also a stand-alone ORM package for
other Ruby applications.
2013 Henrietta Dombrovskaya – Enova 21
23. What This Means for Application/
Database Interaction
Due to the lack of awareness of the underlying
database interaction on the part of the object methods,
one controller performs multiple trips to the database
For example…
2013 Henrietta Dombrovskaya – Enova 23
25. Corresponding Application Log
SELECT
*
FROM
customers
WHERE
(customers.id
=
12470535)
SELECT
*
FROM
people
WHERE
(people.id=
61657007
AND
(type
=
'CustomerPerson'))
AND
(
(people.type
=
'CustomerPerson'
)
);
SELECT
addresses.*,
people_addresses.serial_number
FROM
addresses
INNER
JOIN
people_addresses
ON
addresses.id
=
people_addresses.address_id
WHERE
(people_addresses.person_id
=
61657007
AND
(eff_end_date
is
NULL));
2013 Henrietta Dombrovskaya – Enova 25
26. But Wait, There’s More!
SELECT
*
FROM
approvals
WHERE
(customer_id
=
12470535)
ORDER
BY
processed_on
desc
LIMIT
1
SELECT
*
FROM
customers
WHERE
(customers.id
=
12470535)
SELECT
*
FROM
loans
WHERE
(loans.id
=
25563928)
SELECT
*
FROM
loans
WHERE
(customer_id
=
12470535
and
status_cd
in
('applied','approved','on_hold'))
ORDER
BY
funding_date
DESC
LIMIT
1
SELECT
*
FROM
loans
WHERE
(customer_id
=
12470535
and
status_cd
in
('applied','approved',E
'on_hold'))
ORDER
BY
funding_date
DESC
LIMIT
1
SELECT
*
FROM
loans
WHERE
(customer_id
=
12470535
and
status_cd
in
(‘issued','issued_pmt_proc'))
ORDER
BY
funding_date
DESC
LIMIT
1
SELECT
*
FROM
loans
WHERE
(customer_id
=
12470535
and
status_cd
in
('applied','approved',E'on_hold'))
ORDER
BY
funding_date
DESC
LIMIT
1
SELECT
*
FROM
customers
WHERE
(customers.person_id
=
61657007)
SELECT
count(*)
AS
count_all
FROM
loans
WHERE
(loans.customer_id
=
12470535
AND
(status_cd
in
('applied','approved','on_hold','issued','issued_pmt_proc')
and
loan_type_cd
=
'installment'))
SELECT
*
FROM
loans
WHERE
(customer_id
=
12470535
and
status_cd
in
('applied','approved','on_hold'))
ORDER
BY
funding_date
DESC
LIMIT
1
SELECT
*
FROM
loans
WHERE
(customer_id
=
12470535
and
status_cd
in
('applied','approved','on_hold'))
ORDER
BY
funding_date
DESC
LIMIT
1
2013 Henrietta Dombrovskaya – Enova 26
27. What We Should See Instead…
2013 Henrietta Dombrovskaya – Enova
SELECT
*
FROM
customers
WHERE
(customers."id"
=
12470535)
SELECT
*
FROM
people
WHERE
(people."id"
=
61657007
AND
(type
=
'CustomerPerson'))
AND
(
(people."type"
=
'CustomerPerson'
)
);
SELECT
addresses.*,
people_addresses.serial_number
FROM
addresses
INNER
JOIN
people_addresses
ON
addresses.id
=
people_addresses.address_id
WHERE
(people_addresses.person_id
=
61657007
AND
(eff_end_date
is
NULL));
27
28. Why Does This Happen?
… because the app is written in an imperative way:
“Fist we need to select all loans,
then for each loan we need to select…..”
2013 Henrietta Dombrovskaya – Enova 28
29. There Are Some Ways To Improve
Eager loading generates almost correct SQL:
SELECT customers."id”
,customers."created_by”
<..>,customers."person_id"
<..>,customers_extra."estimated_monthly_living_expense"
<..>,people."type"
<...>,addresses."id"<...>
,companies."id" <...>
FROM customers
LEFT OUTER JOIN customers_extra
ON customers_extra.customer_id = customers.id
LEFT OUTER JOIN people
ON people.id = customers.person_id AND people."type" = 'CustomerPerson’
LEFT OUTER JOIN people_addresses ON people_addresses.person_id =
people.id LEFT OUTER JOIN addresses ON addresses.id = <…>
WHERE (customers."id" = 17674188)
2013 Henrietta Dombrovskaya – Enova 29
30. But Then We Apply the Next Method
has_many :bank_accounts, :foreign_key => 'person_id' do
def default_for_customer(customer, reload = false)
@bank_account_cache = {} if @bank_account_cache.nil? ||
reload
@bank_account_cache[customer.brand.code] ||= find(:first,
:joins => "INNER JOIN payment_instruments.defaults pid
ON pid.payment_instrument_id =
payment_instruments.bank_accounts.payment_instrument_id",
:conditions => ["pid.customer_id = ?", customer],
:readonly => false)
end
And we execute SELECT again!
2013 Henrietta Dombrovskaya – Enova 30
31. If We Continue With Existing Frameworks
… this problem (ORIM) will never be solved, and
we will continue to loose money on timeouts!
2013 Henrietta Dombrovskaya – Enova 31
32. New Approach to Resolve Object-Relational
Impedance Mismatch
Our new approach allows drastically reduce the
number of calls to the database, and get
exactly the data we need.
An application view can be rendered using one
or two calls to the database instead of 500-900
call in the old app. In addition, it allows to
optimize queries execution without any
changes to the application and/or models.
2013 Henrietta Dombrovskaya – Enova 32
33. The old methods are unaware of the data
which is already available – make them
data-aware!
Isn’t this contrary to the standard OO
approach?
Yes, but…
This is the only way to improve the App/DB
interaction.
2013 Henrietta Dombrovskaya – Enova 33
34. Logic Split Methodology
Disassemble method into atomic steps,
Identify ones, which require data retrieval
Using the knowledge about database
objects relationships, construct a single
query
Execute
Use retrieved data in other steps
2013 Henrietta Dombrovskaya – Enova 34
37. Under the Hood: Database Calls
SELECT
vl.value
AS
account
,SUM(CASE
vl.value
WHEN
pt.debit_account_cd
THEN
pt.amount
ELSE
0
END)
-‐
SUM(CASE
vl.value
WHEN
pt.credit_account_cd
THEN
pt.amount
ELSE
0
END)
AS
sum
FROM
payment_transactions
pt
JOIN
valuelists
vl
ON
vl.type_cd
='transaction_account'
AND
vl.value
IN
(pt.debit_account_cd,
pt.credit_account_cd)
AND
loan_id=?
… and then a value for specific account is selected.
2013 Henrietta Dombrovskaya – Enova 37
38. Drawbacks
The method itself would allow retrieving all the information
related to one loan “in one shot”.
However, because the application developers are unaware
of the underlying layers, there appears to be no difference:
– Whether we obtain the values of all account balances one-by-
one by following normal Object-Oriented method logic, in an
imperative way
– Or if we obtain them all, simultaneously
Now for some results:
2013 Henrietta Dombrovskaya – Enova 38
41. Under the Hood: Database Calls
SELECT
loan_id
,
sum(CASE
WHEN
debit_account_cd
=
'uncollected_principal'
THEN
pt.amount
ELSE
0
END
-‐CASE
WHEN
credit_account_cd='uncollected_principal'
THEN
pt.amount
ELSE
0
END)
AS
uncollected_principal
<...>
,
sum(CASE
WHEN
debit_account_cd
=
'uncollected_nsf_fees'
THEN
pt.amount
ELSE
0
END
-‐
CASE
WHEN
credit_account_cd
=
'uncollected_nsf_fees'
THEN
pt.amount
ELSE
0
END)
AS
uncollected_nsf_fees
,
sum(CASE
WHEN
debit_account_cd
=
'installment_principal'
THEN
pt.amount
ELSE
0
END
-‐
CASE
WHEN
credit_account_cd
=
'installment_principal'
THEN
pt.amount
ELSE
0
END)
AS
installment_principal
FROM
payment_transactions_committed
pt
INNER
JOIN
loans
l
ON
l.id=pt.loan_id
WHERE
customer_id={?}
GROUP
BY
loan_id
2013 Henrietta Dombrovskaya – Enova 41
44. So… now everybody happy?...
…not really…
What do application developers say at
this point?....
Wait! What about the business logic?!
2013 Henrietta Dombrovskaya – Enova 44
45. What About the Business Logic?
We need some business logic to know how
to execute joins and selects – just enough
to minimize the number of db calls.
Selected results transformations and
manipulations do not have to be executed
on the database side.
2013 Henrietta Dombrovskaya – Enova 45
46. One More Time
ü Disassemble method into atomic steps,
ü Identify ones which require data retrieval
ü Using knowledge about database objects
relationships, construct a single query
ü Execute
ü Use retrieved data in other steps
2013 Henrietta Dombrovskaya – Enova 46
47. Let’s review another example – account
balance calculation for OEC loans.
2013 Henrietta Dombrovskaya – Enova 47
48. Account_Balance Method for OEC
Ø Obtain account principal balance
Ø Obtain outstanding fees and interest as of
next payment due date
Ø Calculate the interest credit (unearned
interest) for the number of days left before
the payment due date
Ø Obtain existing customer balance
Ø Calculate the total account balance using the
values obtained on steps 1-4.
2013 Henrietta Dombrovskaya – Enova 48
49. Traditional Object-Oriented Approach
Create Account_Balance method which would call the
following methods:
Ø Principal_Balance,
Ø Interest_Amount,
Ø Fees_Amount,
Ø Customer_Balance
Ø Interest_Credit
Each of these methods would interact with a database
independently.
2013 Henrietta Dombrovskaya – Enova 49
50. Drilling Down Into Each Of the Steps
1. Principal balance as described for account_outstanding requires a
single database call.
2. Outstanding interest and fees require one database call each.
3. Interest credit calculation:
3.1. Obtain the daily interest rate for this customer
3.2. Obtain base amount, which is used to calculate the total interest
3.3. Obtain the number of days, for which the interest should be
credited:
3.3.1. Obtain the next payment due date
3.3.2. Calculate number of days based on obtained date and todays’ date
3.4. Calculate amount of credit, based on results from the previous
three steps
4. Customer balance can be obtained using one database call, same
as steps 1-3.
2013 Henrietta Dombrovskaya – Enova 50
51. Combining Steps With Data Retrieval
Those steps are: 1,2,3, 3.1, 3.2, 3.3.1, 4,
which means that the database accessing
method should execute the following task:
For a given loan, retrieve payment
transactions, which show principal balance,
current interest, fees and customer balance,
also retrieve loan’s daily interest rate and
next payment due date.
2013 Henrietta Dombrovskaya – Enova 51
52. SELECT statement
SELECT l.id AS loan_id
,sum( CASE WHEN debit_account_cd = ‘principal’
AND t.acct_date<= v_current_date
THEN t.amount ELSE 0 END
- CASE WHEN credit_account_cd
=’principal’
AND t.acct_date<= v_current_date
THEN t.amount ELSE 0 END ) AS amount_payable
,sum (CASE WHEN
t.debit_account_cd=’fees_provisional’
THEN t.amount ELSE 0 END
- CASE WHEN t.credit_account_cd= ‘fees_provisional’
THEN t.amount ELSE 0 END )
AS fees_provisional
,sum (CASE WHEN t.debit_account_cd=’
interest_provisional’
THEN t.amount ELSE 0 END
- CASE WHEN t.credit_account_cd =’interest_provisional’
THEN t.amount ELSE 0 END )
AS interest_provisional
,st.end_date AS next_closing_date
,l.daily_rate AS interest_rate
,sum (CASE WHEN t.debit_account_cd
='customer_balance'
THEN -t.amount ELSE 0 END
- CASE WHEN t.credit_account_cd ='customer_balance'
THEN -t.amount ELSE 0 END )
AS customer_balance
FROM loans l
LEFT OUTER JOIN payment_transactions_committed
t ON
l.id=t.loan_id
LEFT OUTER JOIN statements st ON
l.id=st.loan_id
WHERE l.id={?}
GROUP BY l.id
,l.daily_rate
,st.end_date
2013 Henrietta Dombrovskaya – Enova 52
53. Ruby Method: account_balance
def account_balance
amount_payable = amount_outstanding + amount_charged_off
amt = amount_payable - customer_balance
provisional_interest = interest_provisional
provisional_fees = fees_provisional
if provisional_interest > 0.0
amt = amt + provisional_fees + [ provisional_interest - [unearned_interest,
0].max, 0 ].max
else
amt = amt + provisional_fees
end
amt = amt.round_near
return [0, amt].max
end
2013 Henrietta Dombrovskaya – Enova 53
55. How It Works
This methods is used as a callback in a more complex method,
which returns Loan_Summary where most of the values can be
selected directly from the database.
For compliance reasons all calculations are executed in the
Ruby model.
We need a “virtual column” as one of the fields of the data
output; the Ruby callback can be inserted into the data set here.
The callback produces required value, and the whole data set is
passed to the upper-level method for future processing and then
to the web application.
2013 Henrietta Dombrovskaya – Enova 55
56. 2013 Henrietta Dombrovskaya – Enova 56
LOAN_ID,
FUNDING_DATE<
FUNDING_AMOUNT
FEE
STATE
APR
LOAN_STATUS
AMOUNT_DUE
ACCOUNT_BALANCE
PRINCIPAL_AMOUNT
CUSTOMER_BALANCE
INTEREST_PROVISIONAL
<…>
LANGUAGE_CD
ACCOUNT_BALANCE
METHOD
Note, that all data elements can still
be retrieved with a single select
statement, which won’t be possible
within the standard ORM
framework.
57. On a Larger Scale
In our new application a single Postgres
function retrieves summaries for all
customer loans.
Here is some execution statistics:
2013 Henrietta Dombrovskaya – Enova Financial 57
58. Number of Database Calls/Customer
2013 Henrietta Dombrovskaya – Enova Financial 58
1 2 3 4 5 6 7 8 9
Old App avg_calls 121 196 373 449 632 765 819 923 1129
New App avg_calls 4 4 4 4 4 4 4 4 4
0
200
400
600
800
1000
1200
Old App avg_calls New App avg_calls
60. Now
• We save both time
• And money!
2013 Henrietta Dombrovskaya – Enova 60
61. Were we the first to notice the problem?
Definitely NOT!
2013 Henrietta Dombrovskaya – Enova 61
62. Related Work
The problem of object-relational impedance
mismatch is a constant discussion topic,
when it comes to developing an efficient
application.
In recent years multiple attempts were made
to try to resolve this issue with no significant
outcome.
2013 Henrietta Dombrovskaya – Enova Financial 62
63. First Came HORA
HORA (Hybrid Object-Relational Architecture)
approach was first introduced in 1993 and became
a foundation of multiple ORM-based systems, such
as:
• Java Persistence
• ActiveJDBC
• ADO.NET
• Ruby on Rails ActiveRecord
• Many others.
All implementations have the same ORIM issues.
2013 Henrietta Dombrovskaya – Enova Financial 63
64. Hybernate
Hybernate is a high-performance Object/Relational
persistence and query service, considered one of the
most flexible and powerful Object/Relational
solutions on the market.
• Mapping from Java classes to database tables
and from Java data types to SQL data types.
• Significantly reducing development time
• Allowing the application developers concentrate
on the business side of the project.
2013 Henrietta Dombrovskaya – Enova Financial 64
65. Hybernate Developers Claim
• In contrast to other solutions it does not hide
“the power of SQL” from developers.
• This claim is true in some sense, since the
solution, indeed allows us to write queries
similar to SQL queries.
2013 Henrietta Dombrovskaya – Enova Financial 65
66. The Caveat
• Creation of complex queries using Hybernate is
not an easy task
• Similar to other ORM systems, Hybernate
prompts for solutions, which seem more natural
for application developers
2013 Henrietta Dombrovskaya – Enova Financial 66
67. Newer Version of Active Record
• Allows Eager loading and some customer
queries, but has the same limitations as
Hybernate
• Eager loading may cause an excessive
application memory usage
2013 Henrietta Dombrovskaya – Enova Financial 67
68. Agile Technology
• acknowledges the existence of ORIM ;
• not only acknowledges the technical impedance
mismatch, but also a cultural impedance
mismatch:
“The object-oriented paradigm is based on proven software
engineering principles. The relational paradigm, however, is
based on proven mathematical principles”.
• while raising awareness of the problem, the
solutions, proposed by Agile data technology
refers mostly to the database schema changes
and/or more careful design and data refactoring,
but it offers little in terms of dealing with inefficient
queries.
2013 Henrietta Dombrovskaya – Enova Financial 68
70. Future Work
Rewriting a whole application using our new
methodology of splitting logic between an application and
a database; we are in the middle of the process.
Problems:
– Large amount of the legacy code, where most of the
business logic is embedded in Ruby classes.
– Virtually no business specifications, which means that we
have to extract the business logic from the existing code.
– The legacy application is evolving and existing models are
being modified all the time.
– The human factor
Because of this….
2013 Henrietta Dombrovskaya – Enova Financial 70
71. Moving In Two Directions:
ü Continue to rewrite the larger parts of our
applications, shooting for having only a
couple of database queries per screen
rendering;
ü While doing this, clarifying our technology,
making it more transparent and easier to
use by application developers.
2013 Henrietta Dombrovskaya – Enova Financial 71
72. Aknowledgements
Many thanks to:
• Enova CIO/CTO Fred Lee
• Team Raven, and everybody who worked on the Aperture
project during it’s lifetime:
• Sheldon Strauch and the entire DB Dev team
• Chad Slaughter and Kurt Stephens
• Ben Heilman and Donny Jekels
2013 Henrietta Dombrovskaya – Enova Financial 72
– Cheetana Yogeesh
– Allan Zoppa
– Ana Lebon
– Kevin Glowacz
– Neha Bhardwaj
– Jef Jonjevic
– Marc Groulx
– Richard Lee
– Preeti Dhiman
– Luda Baklanova