2. Introduction
The intent of this paper is to provide an in-depth analysis of the software engineering merits
of the incorporation of an Object Relational Mapping (ORM) framework. The discussion will
be based around answering the following question: In a persistent, object oriented
application, is an ORM framework the optimal method of implementing a persistent layer?
The paper progresses in the manner I expect a developer advocating the use of an ORM
framework would present ideas to a software architect. First, an overview of persistence in
an object oriented context, including a justification of relational databases as a persistence
back-end. Second, potential problems associated with the paradigm mismatch between
in-memory objects and relational database tables. Third, a demonstration of the methods by
which ORM frameworks address these problems. Fourth, a discussion of the process of
integrating an ORM framework into an existing software project. Finally, two case studies
which follow real projects through the transition to ORM software.
The intended audience is experienced programmers who are considering options for making
an application persistent. It is written with the expectation that the reader has some formal
knowledge of computer science, such as the difference between the heap and the stack.
This paper does not assume familiarity with relational database theory.
My goal is not to establish a single, brief answer to the above question, but rather to build a
base of contextual evidence which will allow the reader to address the usefulness of an ORM
framework in his or her own projects.
Why did I choose this topic?
I have had the opportunity to work on two projects which were both in the process of
transitioning to an ORM framework. In both cases, I joined the project after the decision
had been made to use an ORM framework. This paper is an investigation into the
justifications for the use of an ORM framework. By exploring this topic, I hope to gain both a
better understanding of ORM technology, and the decision making process used when
making significant design changes in a software project.
3. Object Relational Mapping: SE Perspective
Introduction
Why did I choose this topic?
Part 1: Understanding Object Oriented Persistence
Persistence: a simple example:
Adding a relation
Why do we need databases?
Java with a relational database
Part 1 Conclusions
Part 2: The Object/Relational Mismatch
Problem 1: References Between Classes
Problem 2: Sub/Super Class Relationships
Problem 3: Managing Identity
Problem 4: Developer Expertise
Problem 5: Managing object state
Problem 6: Performance
Problem 7: Changing Database Engine
Part 3: What is an Object Relational Mapping framework, and how can it help?
How does ORM fit into a design?
Problem 1: References Between Classes
Problem 2: Sub/Super class relationships
1. Table per Concrete Class - Implicit polymorphism
2. Table per Concrete Class With Unions
3. Table per Class Hierarchy
4. Inheritance Relationships as Foreign Keys
Django Example:
Problem 3: Managing Identity
Problem 4: Developer Expertise
Problem 5: Managing Object State
Problem 6: Performance
Problem 7: Changing database engines
Part 4: Integrating ORM into the development process:
Design basics
Integration
Top Down
Bottom Up
Middle Out
Meet in the Middle
Integrating ORM in Practice
Part 5: Case Studies
Case Study One Carlson School of Management Help desk
Motivations
Design
Implementation
Problems
Improvements
Conclusions
Case Study Two General Dynamics Six-Delta
Motivations
Selecting an ORM solution
Integration
Problems
4. Improvements
Developer Reaction
Data Metrics
Looking Back
Part 6: Conclusions
Future Research/Open Questions
Object oriented databases
Inferred mapping
Sources
Special Thanks to:
Appendix A: Source Code
SQL Performance Test
5. Part 1: Understanding Object Oriented Persistence
The following sections give an overview of the basic application of various methods of
persistence in an object oriented environment, and provide an explanation of the use of a
database in a large scale persistent application. If you are already comfortable with these
topics, feel free to skim this section (the examples will be referenced in part 2). I will
provide examples using pseudo code, Java, Python and UML.
Object oriented programming is now an industry standard for large scale applications. Some
programming languages, such as Java, are entirely object oriented. Others, such as C++,
support object oriented principles alongside traditional functionality. Many large and small
scale software projects now rely on objects to encapsulate data and functional elements of
software.
What is an object?
In the simplest sense, an object consists of a collection of data and methods which operate
on that data. The structure of this data and the functionality of the methods are defined by
a class. When a class is instantiated, an object is created. The term object oriented
programming refers to the design paradigms which guide the creation of an application
based on objects.
What is persistence?
In an object oriented application, objects exist in memory during the execution of the
application. A persistent object is an object which can exist through a shutdown of the
system. Persistence is the method by which an object becomes persistent. In the following
example, I will outline some common persistence methods.
Persistence: a simple example:
Consider a simple class called Car. The Car class defines a set of data values relevant to a
Car: color, engine type, and number of doors. By instantiating this class, we can create a
Car object which lives in memory.
/****************************************************************
* Class Car V.1
* Tyler Smith
*****************/
class Car{
String color;
String engine;
int doors;
}
If we wish to make this object persistent, the easiest method is to simply write the data
6. fields of the Car to a file. Files are stored in the file system and can persist through system
shutdowns. To make our Car object persistent, we could create a file consisting of tuples
which store our data in a key-value pair. A key-value pair is a set of two elements: a key,
which specifies the related object field, and a value, which is the value of that field for a
given object.
#####################################################
# Car File V.1
# Tyler Smith
#############
color = "red"
engine = "V6"
doors = "4"
In a simple application, this strategy could be very effective. It would also be very simple to
implement, as it only requires two new methods, a writeToFile() and a readFromFile().
Adding a relation
Consider making a modification to the car class. We want to add a variable to store the
owner of the car. We will manage owners with a new class, called Owner:
/****************************************************************
* Class Owner V.1
* Tyler Smith
*****************/
class Owner{
String name;
}
We must now consider the addition of a relation. A relation is a connection or association
between multiple data sets. As we need to reference an Owner, our Car objects cannot exist
in isolation; they must contain a data field providing a connection to the owner. In the
object oriented environment, implementing this field is trivial: we can add a Owner
reference field to the Car class definition.
/****************************************************************
* Class Car V.2
* Tyler Smith
*****************/
class Car{
String color;
String engine;
int doors;
Owner owner;//This field holds a reference to an Owner object
}
In object oriented programming, this addition has a minimal impact on overall complexity.
The owner field simply contains a reference to the in-memory location of an Owner object.
7. However, when considered in the scope of persistent data, this reference introduces a
complexity which will garner much more discussion later in this paper: the problem of object
identity. During program execution, the identity of an object is simply its location in
memory. Every unique object has a unique memory location, which means the a given
memory address can only refer to a single object. This allows our Car object to trivially store
a reference to a unique Owner.
If we were to simply add a key-value pair for this memory location value, our software
would fail. It would fail because we have no guarantee that the memory location of the
object in one execution will be the same as in a later execution. In fact, it will almost
certainly be different.
We need to save some other value which will guarantee that when we read the Car back
into the application from a file the same Owner will be referenced. We could save the name
of the Owner, but this would not allow us to alter the name of an Owner after it was
assigned to a Car. A better choice is to add an identifier field to the owner, the sole task of
which is to manage the identity of the Owner.
That means our Owner class will look like this:
/****************************************************************
* Class Owner V.2
* Tyler Smith
*****************/
class Owner{
String name;
int owner_id;
//for the purpose of this example, I will not discuss how this id
//is made to be unique.
}
I will also add an ID to the Car class for the same reason - to make the identity of a Car
object constant throughout transitions to and from a persistent state.
/****************************************************************
* Class Car V.2
* Tyler Smith
*****************/
class Car{
String color;
String engine;
int doors;
Owner owner;//This field holds a reference to an Owner object
int car_id;
}
Our key-value Car file will now look like this:
#####################################################
# Car File V.2
# Tyler Smith
#############
color = "red"
8. engine = "V6"
doors = "4"
owner_id = "2"
car_id = "1"
The Owner file will look like this:
#####################################################
# Owner File V.1
# Tyler Smith
#############
name = "Tyler"
Id = "1"
So far, we have established a simple method for creating a persistent data type. We are
able to create and manipulate Car objects, and save and restore them from a persistent
state. We are able to maintain a persistent relationship between a Car and an Owner. In
the next section, I will give an example which demonstrates why databases are critical to
persistent systems.
Why do we need databases?
So far we've only considered a small number of Car objects. What if we were to try to store
100,000 Car objects? As we're currently storing one Car per file, this would require a
100,000 files. Most modern file systems are not optimized to manage this much data.
To combat this problem, we could merge the data into a single file, using a Comma
Separated Value format. In this manner we could consolidate the values into a single file
and reduce the overhead:
#####################################################
# Car File V.3
# Tyler Smith
#############
Color, Engine, Doors, Owner, Owner_Id #Headers
red, V6, 4, 1,1
blue, V8, 2, 2,2
This will allow us to reduce the overhead required to store hundreds of thousands of files.
Now, consider the problem of loading a specific Car from the file. We would have to either
search through the entire file, or load every Car into memory, and then search our in
memory objects. Both methods are O(n), and do not scale well.
Further, in the simple state presented, we have no method for maintaining the integrity of
the data. For example, we have no way to control whether two Cars can have the same
owner.
Relational databases were created to solve these problems. A relational database is a
9. specialized file system for persistent data designed to handle these complexities. A
relational database provides a layer of specialized control between the low level file
representation (similar to the above CSV example) and the program needing access to the
data.
Contemporary relational databases also provide a standard method of communicating with
the the database. SQL or Structured Query Language is the industry standard method of
performing operations on the database.
A relational database provides quick access to data, along with the ability to apply
constraints which force the data to conform to specific rules. For example, constraints could
be used to enforce the uniqueness of Cars or Owners, or to require that Cars could only
reference existing Owner objects. Relational databases are now the de facto standard in
persistent applications, object oriented and otherwise.
Java with a relational database
We can now expand our Car class to connect to a database by adding read and write
methods which access the database:
/****************************************************************
* Class Car V.3
* Tyler Smith
*****************/
class Car{
String color;
String engine;
int door;
Owner owner;
int car_id;
public Car readCarFromDatabase(){
connection = getConnection();
connection.runQuery("SELECT FROM car WHERE id = ...")
//SQL statement to retrieve a Car from the database
}
public void writeCarToDatabase(){
connection = getConnection();
connection.runQuery("INSERT INTO car ....")
//SQL statement to add a car to the database
}
}
We can now store Car objects between executions of our program, store a relationship
between Cars and Owners, and manage many persistent Cars in an efficient manner.
Part 1 Conclusions
For a large scale, persistent, object oriented application, it is necessary to use some manner
10. of advanced data storage system. Typically, this system is a relational database. Relational
databases are the industry standard for large scale data management, and provide a
reasonable back-end for object persistence. The Car/Owner example demonstrates the
basics of persistence, and provides a justification for the use of relational databases.
However, it is a very simple example. As the data model increases in complexity, the
difficulty of managing its data grows significantly. I will address these complexities in the
following section.
The primary goal of an Object Relational Mapping framework is not to remove this
complexity, but rather to encapsulate it within a well vetted tool which allows developers
working at the object level to leave the low level details of the data management to the
ORM tool, and focus on the intent of the overall application. In part two, I outline some
examples of complexities addressed by ORM tools.
Part 2: The Object/Relational Mismatch
There is a fundamental disparity between the way data is stored in a relational database
and the way data is stored in objects. Object oriented programming provides a framework
with which software can be built to reflect real world analogues. Relational databases
provide a fast, structured and reliable method of saving and restoring data. In a system
which requires both a database and object oriented development, there must be some
resolution whereby data that exists in the form of abstract objects can be preserved in a
relational database. Thus we need some method of correctly mapping object data to
database tables even in complex situations. It is from this need that Object Relational
Mapping tools were born.
In the following section, I will present a set of problems which demonstrate this paradigm
mismatch. I will follow this with a discussion of how ORM tools address these issues. Note
that I do not assume the existence of a home brewed ORM persistence layer in these
discussions.
Problem 1: References Between Classes
In a relational database, everything is stored in a tables. In a manner very similar to the csv
file example above, all of the instances of a class are stored in rows. In an object, local data
is stored in a linear manner, but references to other objects are not. As seen in the car
example, an object can have a reference to another object. References can exist both as an
aggregation relationship, where an object references but does not own another object, or as
a composition relationship, where an object owns other objects. [Fowler 68] These
relationships do not have a simple or implicit mapping to a table structure. This is the
fundamental problem which spurred the development of ORM frameworks.
In the car example, I provided a system with a fairly natural mapping to a persistent state.
The most basic elements of an object can be trivially mapped to rows in a table.
Simple Car Table (Matches Car V.1)
Color Engine Doors Car_Id
11. Red V8 4 1
However, if we consider the addition of the reference to an Owner, the mapping becomes
less direct. As shown in part 1, one method is to add an Id to the owner, and have each car
save a the Id of its owner.
Car Table (Version 2)
Color Engine Doors Car_Id Owner_Id
Red V8 4 1 1
Owner Table
Name Owner_Id
Ted 1
In a relational database, this type of relationship is called a foreign key reference. Notice
however that there is some ambiguity in this design. For example, we could create a
functionally equivalent relationship by giving each Car an Id, and having each Owner store
the Id of a Car.
Car Table (Version 3)
Color Engine Doors Car_Id
Red V8 4 1
Owner Table
Name Id Car Id
Ted 1 1
Neither of these methods is wrong. Both will allow us to load the objects from the database,
and recreate the relationship between them. However, there is not necessarily a database
mapping implicit from the class definitions. A relationship which is straightforward in object
oriented terms is not necessarily as straightforward when applied to a database.
The choice of relationship mapping also has effects on multiplicity. Multiplicity is the number
of objects associated with each part of a relationship. For example, a single Owner could
own multiple Cars. If we store the objects in the manner described by the Car table version
2, a Car can only have one owner, but an owner can have multiple cars. Similarly, in Car
table version 3, a Car can have many Owners, but an Owner can only have one car. We call
these relationships One-to-Many relationships.
Suppose we wanted to allow Owners to have multiple Cars, and Cars to have multiple
Owners. In object oriented programming, this is simple to implement. Each Car object can
contain a variable length set of references to Owner objects, and each Owner object
contains a variable length set of references to Car objects. These are called Many-to-Many
relationships.
The limitations of a relational database do not allow us to have variable length fields in a
database table. This means we need some other method of mapping these relationships; we
12. need a join table. A join table resolves the connection in a Many-to-Many relationship. A join
table for this relationship between Cars and Owners might look like this:
Car/Owner Join Table (Version 1)
Car Id Owner Id
1 2
2 2
1 3
In this table, Car 1 is owned by Owners 2 and 3, and Owner 2 owns both cars 1 and 2.
In object oriented programming, we would not necessarily need to define a datatype to
manage this relationship. In object definitions, variable length sets are easy to manage, as
are bi-directional inter-object relationships. In the simplest implementation, Cars and
Owners can just contain a set of references to each other. However, to handle the transition
to a relational database, we had to a create a new table with the sole purpose of managing
the connection between Car and Owner objects. This demonstrates that relationships
defined in object oriented terms do not necessarily have a one to one mapping to their
representation in the database.
The Car/Owner relationship above is an aggregation - the Car and Owner exist as
independent objects, and the deletion of one would not imply the deletion of the other. Now
suppose we wished to regard a Car object as a composition of various parts. For example,
suppose we added an Engine class:
/****************************************************************
* Class Part V.1
* Tyler Smith
*****************/
class Part{
String name;
float liters;
int engine_id;
}
The updated Car class:
/****************************************************************
* Class Car V.4
* Tyler Smith
*****************/
class Car{
String color;
Engine engine;//note we replaced the String engine with an Engine object
int door;
Owner owner;
int car_id;
//Database access methods left out for clarity
13. }
Once again, there is no direct mapping of this object object oriented relationship to a
database. We could choose to give Engine its own table in the database:
Engine Table (Version 1)
Name Liters Engine_Id
V8 5.7 1
Car Table (Version 4)
Color Engine_Id Doors Car_Id
Red 1 4 1
As the Car/Engine relationship is a composition, this means that deleting the Car would
mean also deleting the Engine. As the two are connected directly, it could be more efficient
to implement everything in the Car table:
Car Table (Version 5)
Color Doors Car_Id Engine_Name Liters Engine_Id
Red 4 1 V8 5.7 1
Again, neither implementation is strictly correct or incorrect. Using two tables provides a
direct mapping from object definition to table definition, and using one table increases
efficiency, and enforces the intended nature of the relationship.
The problem of inter-object relationships demontrates the first paradigm mismatch between
object oriented programming and relational databases. Objects manage relationships in a
very different manner from relational databases. There is not necessarily a one-to-one
mapping between class definitions and table definitions. This means that in a object oriented
persistent context, steps must be taken to account for this disparity during the object
transition from in-memory to persistent.
Problem 2: Sub/Super Class Relationships
The object oriented principle of inheritance presents another situation for which there is no
implicit relationship between objects and database tables. Suppose we wish to create
another class called Truck, and a superclass Vehicle to which both Car and Truck are
subclasses. The Vehicle superclass will allow us to manage data shared by Car and Truck in
one place, while allowing type-specific data to be managed at the subclass level.
Once again, this presents an ambiguity. We could create a mapping which ignores the
superclass entirely by creating a Car table a Truck table which duplicate the fields in the
superclass. Or we could create a superclass table which contains the superclass fields, and
references to subclass tables. [Bauer 193]
These problems are further complicated if we consider the possibility of an Owner containing
a reference to a Generic Vehicle. How should the relationship be mapped? Once again, there
14. is no single correct method.
Problem 3: Managing Identity
As mentioned in part 1, in a purely object oriented environment, object identity is trivial*.
Each object has a location in memory, so determining if two references point to the same
object is clear. However, when we transfer an object to and from a persistent state, we
can't trust the reference location to demonstrate identity. Suppose we create two car
objects, each with all of the same values. Without adding some additional structure, the
identity of any given object which has been re-loaded from a persistent state is ambiguous.
For example, using class Car V.1, suppose I create a Car with color = red, doors = 4 and
engine = v6. If (in the same program execution context) I create another Car with color =
red, doors = 4, and engine = v6, they are clearly not the same Car - they will have different
memory locations. However, if I save them and reload them in a new execution of the
program, I cannot trust the memory locations. How can I trust that they are different? The
program needs some additional structures and/or functions to manage identity in a
persistent context.
*There are situations where identity is non-trivial in object oriented programs, but in most
cases the memory-location comparison is sufficient.
Problem 4: Developer Expertise
The difference between object data and tabular data from a database is a critical conceptual
distinction. For developers working in an object oriented, database persisted application, it
is critical that they understand both possible representations of the data. For example, this
means understanding that a Car object containing a set of Owners will not have those
owners directly mapped to its row in the database - a row in the Owner table would contain
a reference to the car, or there might be a row establishing their relationship in a join
table.
This requirement of a wider breath of knowledge will make it harder for developers to
become specialized in a given subset of the program, and will mean more training before
new developers are comfortable with the system.
Further, consider the problem of developer training. A project which is primarily functional,
but requires some persistence, would require developers to be familiar with both database
functionality and the necessary functionality of the program. This extra requirement adds
significant overhead to the project.
Problem 5: Managing object state
In object oriented programming, objects typically have two states, live and dead. If a
process modifies a live object (provided concurrency is properly managed) we can be
confident that the change will take effect.
Objects in a persistent application can exist in four states, transient, persistent, detached,
and removed* [Bauer 386]. A transient object is an object without a database identity - it
15. has been created but has not yet been saved to the database. A persistent object is an
object with a database identity. A detached object is an object whose contents are not
necessarily consistent with what is stored in the database - the two are no longer
synchronized. Typically, a detached object is an object marked for deletion in the system,
but which still exists in the database. A removed object is no longer in the database, and is
awaiting garbage collection. We need infrastructure to manage the state of the objects, to
make sure edits to an object are not lost or overwritten.
Any persistent application needs to manage these states. For example, consider the
problem of concurrent updates to a persistent object. In a solely object oriented program,
concurrency can be managed at the execution level, using a semaphore or other control
structure. If two processes (or machines) read an object into memory from a persistent
state, and each makes a different edit to the object, we have a potential race condition as
the changes are saved to the database. This can happen even if the access to the object is
offset by a significant time delay; one process can simply overwrite the changes of the other
(by default, it has no way to know that the object is has in memory is old). There needs to
be some infrastructure to manage concurrency at the data access level.
*Note that these are the states defined by Hibernate, other ORM solutions may have
different states.
Problem 6: Performance
Invoking methods on an in-memory object is typically very fast. This is because memory
access has very low overhead compared to accessing a database. Working with a database,
every query run by the program has a significant time cost. A design which does not use
smart query ordering and structuring can have a big performance impact.
To test this impact, I wrote a simple time test program (see appendix A). My program
demonstrates the difference between executing a new query every time a new value is
needed and executing a single query to get all of the needed values at the same time. Even
testing on a local database (low network overhead), the single query was on average three
times faster than running a new query for each needed value. Any persistence
implementation needs to account for the potential performance issues associated with query
ordering and structuring.
Managing the contents of an object can be very important in performance as well. Consider
an object which owns a large set of other objects. Suppose we only need to make a small
edit to the object, unrelated to the large set of objects. A 'dumb' system would simply load
everything in the object, needed and otherwise. We need some way to load a subset of the
object and make our change without loading all of the data into memory.
Consider the problem of making a series of subsequent updates to an object. In a raw SQL
system (without significant optimization structure), each update would need its own
UPDATE statement. This potentially adds a lot of overhead (especially if the entire object
must be loaded into memory each time). We need some way to intelligently manage queries
such that they are executed in a logical manner, and with a minimal number of connections
to the database.
16. Problem 7: Changing Database Engine
In the lifespan of a product, it is likely that some components will change. For example,
suppose you built a database product based on SQL Server 2008. Then, some time later,
management says we can no longer use SQL Server 2008, and you need to move to an
open source database engine. This means several things: You must find out if and where
differences exist between the two database engines, and search for any places in your
source code where those changes might affect the program.
For example, suppose you have a column defined as VARCHAR(MAX) in one of your table
definitions. There is no VARCHAR(MAX) in MySQL. Thus to be compatible with MySQL, you
will need to search your source code for places VARCHAR(MAX) is used, and replace them
with a valid MySQL field definition.
This problem is even worse if you have custom database functions or datatypes, or if you
need to support multiple database types simultaneously.
Part 3: What is an Object Relational Mapping framework,
and how can it help?
Object relational mapping is the process of transferring data stored in objects to relational
database tables. An ORM framework is a tool which manages this transfer. Any persistent
application already has some semblance of ORM, typically in the form of methods such as
those seen in the above examples - methods which perform operations on the persistent
data. An ORM framework is a method of abstracting the details of these Create, Read,
Update, and Delete operations (commonly referred to as CRUD) such that the high level
implementation does not need to know the details.
From Java Persistence with Hibernate: "In a nutshell, object/relational mapping is the
automated (and transparent) persistence of objects in a Java application to the tables in a
relational database, using metadata that describes the mapping between the objects and
the database." [Bauer 25]
Object relational mapping software started to gain notoriety in the late 1990s with the
production of tools such as Oracle's TopLink. ORM was originally most popular with Java,
but in recent years has been expanded to most contemporary object oriented languages.
How does ORM fit into a design?
In the above Car example, we had a simple two tier system - we had a java
implementation, and database back-end:
17. In a two tier system, all of the implementation for both the application and data persistence
are stored together, in this case in the Car object. If we add an ORM tool to this application,
we can hide the details of the persistence functionality.
18. Now the Car class does not need to know how the data is being retrieved, or how to map
object data to and from tabular data - the ORM framework handles all of these details. This
means that the complexities addressed in part 2 can be hidden from the high level Car
implementation.
I will now outline how an ORM framework can handle the problems outlined in part 2. Note
that not all ORM frameworks implement solutions to all of these problems. I won't provide
extremely low level implementation details, as they can differ between implementations.
Rather, I will give a design-level explanation of how ORM solutions can assist in these
issues. My explanation will be based primarily on Hibernate, the ORM solution employed by
my project at General Dynamics and Django, the ORM tool I used at the Carlson school of
management.
It is important to note that an ORM framework does not necessarily need to be created by a
third party. Everything I outline here can be done by hand. Many of the advantages can be
achieved by establishing a multi tier architecture and hiding the data implementation layer.
However, my discussion is based on the use of a third party tool, as most project leads
likely do not have the time or budget to implement an entire ORM solution.
19. Problem 1: References Between Classes
References between classes in object oriented software present a problem, as the mapping
of these relationships to a relational database is ambiguous. ORM tools handle these
ambiguities by employing user provided annotations or mapping files to clarify complex
relationships.
In the Hibernate ORM, mapping data provided by the user is used to tell the the tool
specifically how some objects relate to others. Alongside the data fields in a class, the user
provides annotations which specify the nature of the relationship. For example, let's
consider a Hibernate annotation to our Car class:
/****************************************************************
* Class Car V.4 (Java)
* Tyler Smith
*****************/
class Car{
//Other fields hidden for clarity.
//Hibernate annotation to make relationship
//In this case we are saying a single Car has many Owners
@ManyToOne(targetEntity = Owner.class)
Owner owner;
}
This annotation tells Hibernate exactly how the relationship should be mapped to the
database. Note that the developer still needs to understand how relationships work in a
database - the ORM solution won't allow developers to blindly trust that the objects will be
mapped correctly.
In Django, the functionality is very similar. The user provides information to the ORM to
specify the nature of the relationship:
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
" Class Car V.4 (Python)
" Tyler Smith
""""""""""""""""""""""
Class Car(models.Model):
#Other fields hidden for clarity
#In this example, we have a many to many relationship between Car and
Owner
owner = models.ManyToManyField(Owner)
Note that the ORM tools are not able to make complex design assumptions, such as
establishing whether a relationship is an aggregation or composition. Instead, the ORM
framework provides methods of formalizing such relationships with the class definition, as
seen above. Thus the burden of resolving the mapping ambiguity still falls on the developer,
but after the mapping is decided, its nature is clear and formalized.
When relationships are formalized in an object definition, the ORM framework can employ
this definition whenever operations are requested. A developer does not need to explicitly
consider the nature of each relationship held by an object to perform operations elsewhere
20. in the code. The ORM framework can keep track of the relationships associated with a given
object, and manage the data accordingly.
Problem 2: Sub/Super class relationships
This difficulty in mapping is not automatically solved in all ORM frameworks; the user needs
to make some design decisions as to how inheritance is handled. The Hibernate
documentation provides 4 possible mapping schemes to handle inheritance in persistent
classes [Bauer 191]:
1. Table per Concrete Class - Implicit polymorphism
In this method, we tell the Hibernate to map one table for each concrete (non-abstract)
class. All properties, inherited and local, are mapped to columns. Hibernate can
automatically generate queries for polymorphic method calls against the superclass. This
solution is effective for models with very little inheritance, but presents risks. For example,
if Car and Truck both inherit from an abstract superclass Vehicle, and a User has a reference
to a Vehicle, how do we map this relationship in the database? We can't have a correctly
constrained yet generic reference to one of two tables. [Bauer 193]
2. Table per Concrete Class With Unions
In this method, each concrete class, including super classes, is mapped to a table. This has
the advantage of shared superclass properties. In one table per class, superclass properties
had to be mapped to each sub class. We can do this because Hibernate will then use a
UNION to join the results of queries against these tables to get shared sub and super class
data. This solution also solves the problem of associations with inherited classes, as
"Hibernate can use a UNION query to simulate a single table as the target of the association
mapping" [Bauer 199]
3. Table per Class Hierarchy
In this method, each hierarchy of classes is mapped to a single table, so shared data is
implicitly shared, no need for unions or multiple queries. However, this method presents
some potentially critical problems. Subclass properties must be nullable, as they are not
necessarily instantiated in every row. This means that extra caution must be used if
accessing the data manually, as many columns could potentially be null for a given value.
[Bauer 200]
4. Inheritance Relationships as Foreign Keys
In this method, every class which defines its own properties is mapped to its own table. This
implementation will require more tables and queries, but is normalized. It is simple to
21. understand, but can result in unnecessary complexity. It involves treating every is
a relationship as a has a in terms of the schema. This means abstract and concrete classes,
and even interfaces, can have their own tables. [Bauer 203]
Django Example:
I couldn't find an explanation in the Django documentation as to how inheritance is
implemented. To find out, I did a simple test, using the following Python classes. Note that
all classes are concrete.
from django.db import models
class Vehicle(models.Model):
name = models.CharField(max_length=200,)
color = models.CharField(max_length=200,)
class Car(Vehicle):
trunk = models.BooleanField()
class Truck(Vehicle):
bedLength = models.IntegerField(blank= True, default = 0)
Django mapped each class to its own table, as follows:
Vehicle:
Field Type
id Int
name varchar
color varchar
Car:
Field Type
vehicle_ptr_id int
trunk tinyint
Truck:
Field Type
vehicle_ptr_id int
bedLength int
Adding a Car, we see:
Vehicle:
id name color
1 Tyler's Car Red
22. Car:
vehicle_ptr_id trunk
1 1
When an object is added, its sub and superclass properties are split up and added to the
relevant tables. Then the tables are connected via a pointer, stored in the sub class. Note
that the sub class does not have its own id field. This is because the superclass stores the
id, and two such id's would be redundant. This mapping is an example of table per concrete
class with unions.
Problem 3: Managing Identity
This problem has not actually been completly addressed by ORM frameworks as of this
writing. The complexity exists in establishing the identifying feature of an object. The
memory location fails, as it will change as soon as the current process is completed. Using
the database identifier (primary key) works, but leaves us with a identity-less object until
the object is made persistent - in the transient state, the object has no identity, and thus
two transient objects of the same type could incorrectly be found equal (null == null).
The accepted solution to this has been to use a business key, an identifier similar to the
database primary key, but which can be set before the creation of the relevant row in the
database.[Bauer 397] Use of this method requires domain classes which appropriately
manage equality (typically by overriding the default equals method).
Problem 4: Developer Expertise
The data layer abstraction provided by an ORM framework provides developers working on
high level function code with an interface which allows them access to the data, but shields
them from low level data details. While it is still important that they have an idea how the
data layer works, developers do not need to have an advanced understanding in order to
perform data operations.
Note that this encapsulation is not limited to third party software - home brewed data
access layers can provide the same benefit.
Problem 5: Managing Object State
ORM tools provide structured methods for managing object state to keep track of object
identity and transience. For example, Hibernate maintains a version field in each row of the
database. Each time a the row is updated, Hibernate automatically checks the version of the
in memory object against the version of the object stored in the database. If they are not
the same, then the object is old and needs to be refreshed before any data can be written.
This is called offline record locking.
Problem 6: Performance
23. An ORM framework affords many options for performance improvement. These include
methods such as lazy or eager loading, transaction based database operations, and a
variety of caching options.
Earlier, I discussed the problem of making a small update to a large object. Loading the
entire object (and its children) into memory requires a lot of overhead. Lazy loading is a
method of loading only certain parts of an object when needed, as opposed to loading the
entire object into memory. For a large object with many children, loading every object in
the set could mean thousands of unnecessary database reads, and even more if we consider
and sets contained within those objects. Lazy loading allows us to load the parent object
without loading the children. [bauer 571] Hibernate implements lazy loading by creating
proxy objects in the place of sets, and only instantiating the sets when they are explicitly
requested by the user.
In a substantial persistent system, we also need to consider the concept of a transaction. A
transaction is a series of operations. It is often optimal to combine a series of modifications
to an object or set of objects into a single statement, or series of statements. Thus for
performance reasons, it is optimal to have some method of combining operations. In
Hibernate, transactions are optimized to do minimal work on the database. This means
updates may not necessarily be executed in the order the requests appear in the source
code.
Problem 7: Changing database engines
Most ORM frameworks are written and tested against multiple database back-ends. Typically
this allows developers to simply tell the framework which type of back-end is being used,
and the low level differences can be changed within the tool. This means that if the
database back-end changes, the product does not need to drastically change, as the
flexability is already built into the ORM layer.
Part 4: Integrating ORM into the development process:
The integration of an ORM framework can take place in many different stages in
development. It may be as early as the design phase, or as late as an update to an already
complete product. The decision of when and how to integrate an ORM framework is
non-trivial, as the persistence layer of any application is critical to its success.
Design basics
Object oriented persistent projects typically share some key design elements related to
persistence. First, most are built on a domain model (though they may not call it that). A
domain model is a design that has classes corresponding to distinct persistent elements.
The Car example above used a domain model - the class Car is a domain class. [Bauer 107]
Second, many projects also use the Data Access Object (DAO) design pattern. In this
pattern, data access objects live above the domain classes in the class structure, and
provide high level access to data. In projects using the Data Access Object pattern, domain
level classes typically have little or no implementation details, only data fields and mapping
24. data.
Integration
In Java Persistence with Hibernate the authors discuss methods for integrating Hibernate
into a project: [Bauer 40]
Top Down
In a top down development model, an ORM framework is integrated into an existing domain
model. This means the program already has a set of classes defining the persistent data
types, and needs to integrate a method for mapping the persistent classes to the database.
Note that it is assume that there is not an existing database schema.
Bottom Up
In a bottom up development model, an existing database schema is used as the basis for
integration of an ORM framework, and the development of code to operate on the data. This
would likely occur if a program was required to work on a legacy database - the new system
must be made to work with the existing database schema.
Middle Out
In a middle out model, developers start with a model of the mapping between objects and
tables, and design both the code and the schema based on the mapping. This means that
before the classes are written or the database schema is decided, the specifics of the
mapping between objects and the database are deduced.
Meet in the Middle
In this model, developers start with an existing schema and existing code base, and
integrate an ORM framework. The authors of Java Persistence and Hibernate consider this to
be the most difficult method of integrating Hibernate into a project, as it often requires
significant refactoring to get the domain classes and database schema to agree. "This can
be an incredibly painful scenario, and it is, fortunately, exceedingly rare."
In the next section, I analyze two projects which switched to a ORM implementation from an
existing platform - both were established projects. In the first study, we used a bottom up
strategy. In the second study, the meet in the middle strategy was used.
Integrating ORM in Practice
25. In the next section, I describe the process taken by two software projects for integrating an
ORM framework. These case studies serve as real world examples of the processes
described above. At the Carlson School of Management, we used the bottom up strategy,
integrating an ORM tool with an existing database schema. At General Dynamics, we used
the meet in the middle strategy, integrating an ORM framework with an existing schema
and domain model.
Part 5: Case Studies
To provide real world context to the discussion of ORM frameworks, I did two case studies.
The goal of both studies was to investigate whether ORM tools can successfully address the
problems described in part two in a production environment. In both projects I studied, I
focused on the following questions:
1. What was the existing implementation before the integration of an ORM framework?
2. What were the problems/concerns with that implementation?
3. What was the primary motivation for switching to an ORM framework?
4. Why was the ORM framework in question chosen?
5. Was the integration as difficult as anticipated?
6. Has it been successful?
7. What would you do differently?
I gathered this data through my own experience (I worked on both projects), through
interviews with project members, and through code analysis tools.
Case Study One Carlson School of Management Help desk
Project Overview
At the Carlson School of Management, I worked on the Laptop Management System. This
system is a web based program which manages laptop repairs and equipment checkouts.
The system manages over one thousand students, and hundreds of laptops, and is used by
approximately 15 staff members.
Motivations
The program was originally written in PHP, using a MySQL database. In early 2009, the
decision was made to switch to Django. There were three primary reasons for the switch.
First, Django allows very fast web page development, which would allow us to easily write
report-generation pages. Second, Django has HTML template options which allow more
flexibility in page design. Finally, it was hoped that Django would be easier to maintain, as
many aspects of the PHP based version were fairly old and disconnected.
Design
The original project was implemented with PHP (non-object oriented) and MySQL. The
second version was written in Python, using the ORM tool Django. As we needed to keep
track of legacy data, the database schema remained largely the same.
We used a domain model, implementing each persistent type as a class based on our
existing schema. Django handles all of the communication with the database. We did not
need to use any type of advanced data access structure. Instead, we used built in methods
26. provided by Django for operations such as selecting a large set of objects. To retrieve a set
of objects from the database, the developer needs only to execute a filter operation, which
acts as a SELECT query. To save an object, the developer calls a save() method.
For example, if I want to retrieve all repair tickets with non-null laptops, I can use this line
of python:
all_tickets = Ticket.objects.filter(laptop__isnull=False)
All persistent classes extend the Model superclass. This superclass provides access to
methods such as filter, which returns a subset of the instances of that class saved to the
database. The above line uses a filter to get all Ticket objects from the database where the
laptop field is non-null. The corresponding raw-SQL implementation would require executing
a SELECT statement, followed by a line by line parsing of the returned row, creating objects
for each row.
Django also handles all transaction level details - we did not need to worry about whether
an object was persistent or transient, Django managed those details.
Implementation
The implementation took approximately 12 months. Much of this development time was due
to the time taken to re-implement the system, as we converted from conventional PHP to
object oriented Python. We also had to write SQL scripts to transfer data from the old
database schema to the new, Django based schema.
The implementation was done in stages. Carl Allen implemented the ticket management
portion first, and then I implemented the equipment management section and report
generation tools. Some of the higher level functionality (forms, etc) were complicated, but
the data implementation was fairly quick. Typically, adding a class and the associated table
took 2-4 hours.
Problems
We had many tricky problems originate from our use of legacy data in the database. In
many cases, we had poorly enforced constraints in the legacy data which transfered to the
new database. This meant that the assumptions made by Django regarding data integrity
were often incorrect. For example, Django assumes that if a foreign key reference is
non-null, then the associated object must exist. We often had tickets referencing laptops
which had since been deleted. This resulted in verbose, hard to fix errors. For example, the
following code caused errors when a ticket referenced a non-existent laptop:
all_tickets = Ticket.objects.filter(laptop__isnull=False).order_by('-pk')
for ticket in all_tickets:
laptop = ticket.laptop
This code parsed all of the tickets in our system with non-null laptop fields, and assigned a
local laptop variable to the laptop associated with each ticket. The following error was
thrown when one of the laptops referenced by a ticket did not exist:
27. It is clear from this error that a laptop is missing. However, we're given no information as to
which laptop is missing. Carlson has thousands of laptops and thousands of tickets. Therefor
28. to solve this problem I had to bypass Django altogether, and execute SQL directly against
the database to find bad laptop references. These error codes are often lengthy to read, as
most of the lines in the stack trace are within Django source code.
We also had issues stemming from the large degree of control Django has over the
application. The most complex of which was a data validation error which would cause the
system to silently ignore requests. We think this problem has to do with Django's caching
policy, but it has yet to be resolved.
In the PHP implementation, the code to be run with each page request is very
straightforward. In Django, many things happen in the background, hidden from the user
and the developer. While this decreases the amount of code needed to accomplish a given
task, it makes searching for the source of an error much more complex.
Improvements
Using Django allowed us to rapidly add elements to the application. The primary motivation
for the switch was to allow report generation. With Django's built in commands for data
access, we could easily gather and manage a lot of data without lots of low level SQL. We
could just request the data and work with it.
Using Django also allowed us to decrease the size of our code base. We went from 13,000
lines of PHP to about 2,000 lines of Python (Reported by CLOC). There are some minor
functionality discrepancies, as Django handled some user management functions which
were handled manually in the older version, and the older version of the software did not
have reporting tools.
Conclusions
Overall, users and management have been pleased with the Django based implementation.
However, while our code size and reporting ability have been dramatically improved, errors
have become very hard to trace. Finding the source of an error typically means searching
online for the meaning of the error code, and then manually parsing the database data to
find the row causing the problem. Some errors, such as the silent request rejection error
discussed above, have yet to be resolved. The time spent resolving these errors has
lessened the efficiency we hoped we could gain from Django.
Case Study Two General Dynamics Six-Delta
Project Overview
Over the past year, I've had the opportunity to do an internship at General Dynamics
Advanced Information Systems in Bloomington, Minnesota. I work on the Six-Delta project,
which is a persistent java based application.
Six-Delta uses SQL Server as a back-end. The project has about 12 developers, and about
500,000 lines of code. The program has about 75 persistent classes - classes that are saved
29. to the database. Six-Delta is based on a domain model of persistence. In the domain model,
the business logic of the system is separated from the data implementation through domain
objects - objects which have corresponding tables in the database.
Motivations
In late 2008, the project moved from a raw sql based implementation to an Object
Relational Mapping framework. I interviewed Six-Delta chief architect Paul Hed and
database lead Paul Wehlage to discuss this change.
From an architectural perspective The motivation to switch to ORM came as part of a
general drive to move to an n-tier architecture. Before ORM, Six-Delta was a 2-tier
application using custom direct sql based to manage persistent data. The program was
growing steadily, continuing to add components would have meant more persistent classes.
In a 2 tier architecure, all of the complexity of the data access layer is visible in the
functional logic of the program. The goal was to insert an ORM framework as a 3rd layer to
manage some of this complexity.
From a data-access perspective, the motivation for ORM came from a desire for better
performance and integrity. The project already had a hand-coded ORM implementation. The
implementation could do some of the specialized operations discussed earlier in this
document, such as lazy loading. However, it was lacking in several key areas. It did not
have a method of offline version checking, which meant that concurrent data access could
result in race conditions and lost changes. It also did not have any concept of a transaction.
Updates to an object were executed field by field, and were very slow.
From a general design perspective, the motivation from ORM came from a desire to
decrease complexity in the data access classes. Maintaining the data layer of the
implementation was very complex - each persistent class required 5 data maintenance
classes. Maintaining the database schema took a lot of work. Event though some support
classes could be auto-generated, the system was complex enough that only certain
developers had the expertise to make schema changes. As the number of tables was
increasing dramatically (from 15 to close to 50), the complexity and overhead associated
with maintaining the home brew data access layer was growing too great.
Selecting an ORM solution
The team chose an open source ORM framework called Hibernate. Hibernate was considered
alongside two other ORM solutions, Enterprise Java Beans and Java Data Objects. Hibernate
was chosen as it is the most popular open source solution for Java, and is fairly mature.
Enterprise Java Beans was considered too large and complex, and Java Data Objects was
too new.
Using annotations in persistent classes, Hibernate generates SQL code, and executes it in
transactions. To save an object, the developer uses a transaction. A transaction is a series
of changes to persistent objects, which is managed by hibernate. During a transaction,
Hibernate generates SQL statements, which it executes against the database.
Integration
The planned integration time was 3 weeks. While the lead Hibernate developers advocate
against inserting ORM into an existing project with an existing schema, the team was
confident that it could be added quickly, as the data access layer of Six-Delta was
30. functionally very similar Hibernate's implementation - both implementations used a domain
model with Data Access Objects. Data access objects are objects whose provide an interface
to the persistent domain classes.
Integration was more complex than expected, primarily because of the difficulty of
integrating transctions into all of the Six-Delta data dependent code. While the base layer of
Hibernate's data access design was consistent with the pre-Hibernate Six-Delta
implementation, Six-Delta had no transactions, and a significant refactoring was required
before transactions were successfully implemented. The team also chose to integrate Spring
with Hibernate. Spring provides tools for tighter management of transactions, along with
connection management tools. The integration ultimately took approximately 16 months
before a stable release was available (note that many other software changes too place in
this time, Hibernate was not the only item requiring development time).
Problems
As time passed, other elements of Hibernate surfaced which required additional work.
Hibernate does not implement an efficient delete operation - it loads the entire object into
memory before deleting in the database. The team had to implement SQL based delete
operations to allow for efficient deleting. Hibernate does not allow for column defaults in
SQL Server, which meant that Hibernate could not be used to auto-generate a database
schema. Hibernate is slow to initialize, which means that for very small sub-application
elements, Hibernate is too slow to be useful. Hibernate is also incapable of managing two
schemas at the same time, which makes database upgrade testing complex (verify this is
the problem associated with schemas). Hibernate has troublesome debug options. It can
print out the base elements of the SQL statements it executes, but can't print the whole
thing (statements look like INSERT (id, name) INTO TABLE (?,?) ) so problems can be hard
to trace.
While none of these problems were critical, they required more low level coding than would
be ideal in an ORM framework managed data access layer.
Improvements
Hibernate succeeded in its primary goals. In the current state, it is much easier for
developers to make schema changes without special implementation knowledge. In general
(with the exception of deletion) Hibernate has made database operations much faster.
Finally, Hibernate has dramatically decreased the amount of time required to implement a
schema change.
Developer Reaction
When I first inquired about how Spring and Hibernate were integrated into the program, I
was simply told: magic. Obviously this was hyperbole on the part of the developer I was
speaking to, but the developer mentality implied by this reply was clear: developers were
not familiar with the design of the data access layer. Hibernate errors can be verbose and
confusing, and the integration into the system is not very straightforward. This meant that
even experienced developers can often have trouble deducing errors that arise in the data
layer, as the layer of abstraction has allowed them to ignore the implementation details.
Data Metrics
31. I was given permission by Paul Hed to use historical defect data in this report. The following
is a plot of problem and regression reports in the two years before the switch to Hibernate.
Problems and Regressions Before Hibernate
As the current version of the software is larger than the pre-hibernate version, I also plotted
the regressions and problems relative to the number of lines of code in the last
non-Hibernate release (I don't have access to LOC data for all previous releases, so these
values are all relative to the same release).
Problems and Regressions per KLOC
Overall, we see an average of 47.68 total regressions and problems per month, with an
average of 0.098 total regressions and problems per KLOC. Interestingly, there is a
32. dramatic spike in problem reports approximately one year before the transition to
Hibernate.
Problems and Regressions, Post Hibernate
There is also a significant spike in December of 2008. Paul Hed noted that this spike was
likely due to complications regarding the transition to hibernate. The team had to maintain a
non-Hibernate baseline, while implementing transaction management tools in the Hibernate
baseline. This resulted in a lot of code changes.
The spike in April 2010 was due to a major release testing process, which simply found
more bugs - there wasn't a significant change in development, just an increase in testing.
Again, the below values are all relative to a single KLOC value, as I don't have access to
historical KLOC data.
Problems and Regressions per KLOC
33. After the Hibernate release, the average number of regressions and problems per month
dropped to 42.21, and average regressions and problems per KLOC dropped significantly to
0.067. This corresponds to an approximately 12 percent drop in total regressions and
problems, and an approximately 32 percent drop in regressions and problems per KLOC
(noting that KLOC values are not strict).
Despite this improvement, Paul Hed asserted that the pre-Hibernate data access layer was
significantly more stable than the current implementation, largely due to the overall
simplicity of the design. In the Hibernate implementation, the data goes through more steps
and tools as it goes from user to database. This added complexity makes it harder to be
confident in the code.
Note: Lines of code metrics are from the Coverity Prevent static analysis tool.
Looking Back
I asked Paul Wehlage what he would do differently if he could start over with the transition.
He was generally satisfied with Hibernate's performance, but said that spending more time
with it, learning the nuances and tricky spots, would have been beneficial before attempting
integration. He also said more work should have been done to educate developers on
Hibernate.
Part 6: Conclusions
There is unquestionably a significant paradigm mismatch between the object oriented
development and relational databases. Object relational mapping frameworks provide a
compelling solution to this problem.
At the beginning of this document, I posed the question "In a persistent, object oriented
application, is an ORM framework an advantageous method of implementing the persistent
34. layer?". Through the examples of ORM application, I demonstrated a variety of ways an
ORM framework can assist in implementing complex transitions between the OO world and
relational databases. In the case studies, we saw examples of ORM solutions in action.
While both projects I studied were ultimately successful, both suffered from the same type
of problem: the inclusion of a black-box framework trusted to handle a significant layer of
an application is very dangerous. When things work as expected, the ORM framework can
be ignored. However, it is critical that developers working with the system have a solid
understanding of the implementation, so that when errors inevitably arise, developers
charged with finding the solution are not faced with opening the black box for the first time.
Thus the answer to my question is: an ORM framework can be an immensely helpful tool for
improving the speed and quality of the persistence layer. However, the implementation of
an ORM framework (or any major framework) cannot be a black box operation; any
developers interacting must understand the system. If the system is treated as magic,
eventually the developers will be called to debug a problem, and it's very hard to debug
magic.
From the perspective of a software architect or project manager, the implication is that the
costs of integrating an ORM framework stretch beyond the initial coding costs. Successfully
integrating an ORM framework requires a commitment to train all of the developers on a
project in the use of the framework. This is not to say that integrating an ORM solution is a
waste of resources; rather, management needs to understand the long term commitment of
including such a complex tool.
Future Research/Open Questions
Object oriented databases
In recent years, relational databases have been the standard for data management. This
paper assumes the reader is planning to use a relational database. However, there are
specialized object oriented databases designed to circumvent the problems caused by the
object relational mismatch. As ORM frameworks become increasingly popular, it would be
interesting to compare the performance of a relational database and ORM framework to an
object oriented database. Performance could be compared both in terms of speed and bug
occurrences.
Inferred mapping
Hibernate allows the developer to map relationships both with XML mapping files, and with
in-code annotations. I would like to develop a tool which could analyze user specified
domain classes and infer annotations or mapping files.
35. Sources
Bauer, Christian, and Gavin King. Java Persistence with Hibernate. Greenwich, Conn.:
Manning, 2007. Print.
Beighley, Lynn. Head First SQL. Beijing: O'Reilly Media, 2007. Print.
Fowler, Martin. UML Distilled: a Brief Guide to the Standard Object Modeling Language.
Boston [etc.: Addison-Wesley, 2009. Print.
Rod Johnson, "J2EE Development Frameworks," Computer, vol. 38, no. 1, pp. 107-110, Jan. 2005,
doi:10.1109/MC.2005.22
"Version 2.0 (English)." The Django Book. Web. 13 June 2010.
<http://www.djangobook.com/en/2.0/>.
36. Special Thanks to:
Phil Barry - Primary advisor
Eric Van Wyk and Mats Heimdahl - Secondary advisors.
Paul Wehlage and Paul Hed - General Dynamics engineers who did interviews.
Matt Maloney and Garreth McMaster - Carlson School of Management managers who did
interviews.
Doug Smith - Reader and database advisor.
37. Appendix A: Source Code
SQL Performance Test
import java.sql.DriverManager;
import java.sql.Statement;
/**
* TimeTest
* @author Tyler Smith
*
* This class is designed to test a series of queries against a
* database. The queries accomplish the same thing, but one is optimally structured, and the
* other is not. The idea is to show the importance of optimal query structuring.
*
* Obvious this test is very simplified. However, it does reflect the importance
* of considering sequential access when performing CRUD operations on object data.
*
*/
public class TimeTest {
private static String bulkQuery = "SELECT * FROM [AccessTest].[dbo].[presidents]";
private static String query1 = "SELECT name FROM [AccessTest].[dbo].[presidents]";
private static String query2 = "SELECT id FROM [AccessTest].[dbo].[presidents]";
private static String query3 = "SELECT birthday FROM [AccessTest].[dbo].[presidents]";
private static String query4 = "SELECT gender FROM [AccessTest].[dbo].[presidents]";
public static void main(String[] args) {
java.sql.Connection con = null;
try{
//Get the SQL Driver
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver").newInstance();
String url = "jdbc:sqlserver://localhost:1433;" +
"user=test.user;password=test.password;" +
"databaseName=AccessTest";
con = DriverManager.getConnection(url);
Statement st = con.createStatement();
//Test single statement
long start_SQL1 = System.nanoTime();
for(int i=0;i<10000;i++){
st.executeQuery(bulkQuery);
}
long finish_SQL1 = System.nanoTime();
long net_SQL1 = finish_SQL1 - start_SQL1;
System.out.println("Net time, single queries = " + net_SQL1);
//Test multiple statements
long start_SQL2 = System.nanoTime();
for(int i=0;i<10000;i++){
st.executeQuery(query1);
st.executeQuery(query2);
st.executeQuery(query3);
st.executeQuery(query4);
}
long finish_SQL2 = System.nanoTime();