Defense Against the Dark Arts Protecting Your Data Against ORMs
Object-Relational Mappers "An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.” Fundamental Perspective Shift  =>  Inevitably, something will be lost Enables Rapid Development Simplifies Application Code Standardizes Relationships  Standardizes Data Structures Sometimes Sucks
Defense Against the Dark Arts Overview Relationships Inheritance Data Types Memory Usage Data Integrity Version Control Connections
Relationships Object Relationships Maintained in application code  (transparent to developers) Simplified Validation  (easier for developers to remember) Cheap, but viable alternative to foreign key constraints Examples 1 : ∞ | one-to-many = A has_many B 1 : ∞ | one-to-many = A has_many B through A_B 1 : 1 | one-to-one = A has_one C ∞  : 1 | many-to-one = B belongs_to A  1 : 1 | one-to-one = C belongs_to A ∞  : ∞ | many-to-many = D has_and_belongs_to_many A
Inheritance Concrete vs Single Table Inheritance Caution: These are TOTALLY DIFFERENT. And confusing. Use  PG Table Inheritance  with  Abstract  parent class  (or Partitioning) Parent structure allows code reuse and some helpful queries Child tables are physically separate, so have their own performance metadata - indexes, keys, etc. Use  Single Table Inheritance  for  Small Data  sets All Child classes are physically in Parent table with “type” attribute This sucks for a lot of data, and is hard to maintain & extend
Inheritance Postgres Table Inheritance  with  Abstract  parent class  Parent structure is really for app code reuse, not giant tables Child tables have their own performance metadata – indexes, etc. class Weapon < ActiveRecord::Base self.abstract_class = true class Wand < Weapon CREATE TABLE Weapon( id int, name text); CREATE TABLE Wand( wood text, length int, core text)  INHERITS (Weapon); SELECT * FROM Weapons; --wands and other weapons SELECT * FROM Wands; --only wands
Inheritance Single Table Inheritance  for  Small Data  sets All child classes physically in Parent table with “type” attribute This sucks for a lot of data, and is hard to maintain & extend class Weapon < ActiveRecord::Base class Wand < Weapon CREATE TABLE Weapon( id int, name text,  wood text, length int, core text,  type text ); SELECT * FROM Weapons WHERE type = ‘Wand’;
Data Types Standard Data Types Port to new DBMS easily Developers don’t have to learn new data types Use tools written for any DBMS without modification Miss out on Postgres awesomeness Waste space & memory Compromise data integrity Examples Custom Data Types: INTERVAL, smallint, floating point Size Limitation: Zip code, Phone number, Email Address
Memory Usage RDBMSs store rows ORMs retrieve objects Every time you use any piece of that object’s (row’s) data, you get back everything you ever added on to that model (table). > Accounts.find(2) SELECT * FROM &quot;accounts&quot; WHERE (&quot;accounts&quot;.&quot;id&quot; = 2) > Accounts.find(2).updated_at SELECT * FROM &quot;accounts&quot; WHERE (&quot;accounts&quot;.&quot;id&quot; = 2) Number and data type of attributes per table DO matter Watch out for large fields, TOAST data especially
Data Integrity Safeguards & Dark Arts Trickery Foreign Key / Relationship enforcement Standardized Validation in model (& thus across application) NULL vs Empty String Defense:  Look at data created in all scenarios.  The slightest application code difference can mean different data. Varies by ORM. Object–to–row updates can Nullify an entire row Defense:  Add NULL constraints to database  Specify if and how fields can be updated   (e.g. keys can’t be set to NULL)
Version Control Schema Management Ideally correlated with application changes Rails db migrations stay in branch with dependent code Migration scripts include up &  down  to reverse effects Data Migrations YMMV - Find the right tool for the job Iterative or set-based?  How much time do I have at run-time? How will this impact the production site? Small dynamic migrations stay with the schema change logic Adding/updating custom data should be separate
Connections Connection Persistence Configuration is only evaluated at deploy time Expense of creating & dropping connections is limited Every call gets wrapped in a transaction Very important to remember for migrations and callback-style background processes sometimes naively launched in parallel
Strategy: Know Your Data &quot;Trust is for people with poor surveillance”   –  Col. James R. Trahan, USMC Don't be at the mercy of your application code Run Bad Data Checks Cron job/Rake task to run stored checks & email results CREATE TABLE data_checks( id int, name text,  description text, check_sql text, fix_sql text); Don't guess what’s happening,  find out Monitor logs with PgFouine to find problem queries System Tables (index usage, pg_stat, pg_stat_io, null fill)
Never Fight Alone

Defense Against the Dark Arts: Protecting Your Data from ORMs

  • 1.
    Defense Against the DarkArts Protecting Your Data Against ORMs
  • 2.
    Object-Relational Mappers &quot;Anobject that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.” Fundamental Perspective Shift => Inevitably, something will be lost Enables Rapid Development Simplifies Application Code Standardizes Relationships Standardizes Data Structures Sometimes Sucks
  • 3.
    Defense Against theDark Arts Overview Relationships Inheritance Data Types Memory Usage Data Integrity Version Control Connections
  • 4.
    Relationships Object RelationshipsMaintained in application code (transparent to developers) Simplified Validation (easier for developers to remember) Cheap, but viable alternative to foreign key constraints Examples 1 : ∞ | one-to-many = A has_many B 1 : ∞ | one-to-many = A has_many B through A_B 1 : 1 | one-to-one = A has_one C ∞ : 1 | many-to-one = B belongs_to A 1 : 1 | one-to-one = C belongs_to A ∞ : ∞ | many-to-many = D has_and_belongs_to_many A
  • 5.
    Inheritance Concrete vsSingle Table Inheritance Caution: These are TOTALLY DIFFERENT. And confusing. Use PG Table Inheritance with Abstract parent class (or Partitioning) Parent structure allows code reuse and some helpful queries Child tables are physically separate, so have their own performance metadata - indexes, keys, etc. Use Single Table Inheritance for Small Data sets All Child classes are physically in Parent table with “type” attribute This sucks for a lot of data, and is hard to maintain & extend
  • 6.
    Inheritance Postgres TableInheritance with Abstract parent class Parent structure is really for app code reuse, not giant tables Child tables have their own performance metadata – indexes, etc. class Weapon < ActiveRecord::Base self.abstract_class = true class Wand < Weapon CREATE TABLE Weapon( id int, name text); CREATE TABLE Wand( wood text, length int, core text) INHERITS (Weapon); SELECT * FROM Weapons; --wands and other weapons SELECT * FROM Wands; --only wands
  • 7.
    Inheritance Single TableInheritance for Small Data sets All child classes physically in Parent table with “type” attribute This sucks for a lot of data, and is hard to maintain & extend class Weapon < ActiveRecord::Base class Wand < Weapon CREATE TABLE Weapon( id int, name text, wood text, length int, core text, type text ); SELECT * FROM Weapons WHERE type = ‘Wand’;
  • 8.
    Data Types StandardData Types Port to new DBMS easily Developers don’t have to learn new data types Use tools written for any DBMS without modification Miss out on Postgres awesomeness Waste space & memory Compromise data integrity Examples Custom Data Types: INTERVAL, smallint, floating point Size Limitation: Zip code, Phone number, Email Address
  • 9.
    Memory Usage RDBMSsstore rows ORMs retrieve objects Every time you use any piece of that object’s (row’s) data, you get back everything you ever added on to that model (table). > Accounts.find(2) SELECT * FROM &quot;accounts&quot; WHERE (&quot;accounts&quot;.&quot;id&quot; = 2) > Accounts.find(2).updated_at SELECT * FROM &quot;accounts&quot; WHERE (&quot;accounts&quot;.&quot;id&quot; = 2) Number and data type of attributes per table DO matter Watch out for large fields, TOAST data especially
  • 10.
    Data Integrity Safeguards& Dark Arts Trickery Foreign Key / Relationship enforcement Standardized Validation in model (& thus across application) NULL vs Empty String Defense: Look at data created in all scenarios. The slightest application code difference can mean different data. Varies by ORM. Object–to–row updates can Nullify an entire row Defense: Add NULL constraints to database Specify if and how fields can be updated (e.g. keys can’t be set to NULL)
  • 11.
    Version Control SchemaManagement Ideally correlated with application changes Rails db migrations stay in branch with dependent code Migration scripts include up & down to reverse effects Data Migrations YMMV - Find the right tool for the job Iterative or set-based? How much time do I have at run-time? How will this impact the production site? Small dynamic migrations stay with the schema change logic Adding/updating custom data should be separate
  • 12.
    Connections Connection PersistenceConfiguration is only evaluated at deploy time Expense of creating & dropping connections is limited Every call gets wrapped in a transaction Very important to remember for migrations and callback-style background processes sometimes naively launched in parallel
  • 13.
    Strategy: Know YourData &quot;Trust is for people with poor surveillance” – Col. James R. Trahan, USMC Don't be at the mercy of your application code Run Bad Data Checks Cron job/Rake task to run stored checks & email results CREATE TABLE data_checks( id int, name text, description text, check_sql text, fix_sql text); Don't guess what’s happening, find out Monitor logs with PgFouine to find problem queries System Tables (index usage, pg_stat, pg_stat_io, null fill)
  • 14.

Editor's Notes

  • #2 Thank you for coming. If you haven&apos;t followed the Harry Potter series in any way, I apologize for my attempt at reaching an international audience through my favorite children&apos;s fantasy series. My name is Vanessa Hurst and I am a Database and Analytics Engineer for Paperless Post, a customizable online stationery startup just down the road in Chelsea. I studied Computer Science and Systems and Information Engineering at the University of Virginia. I have experience in databases ranging from a few hundred megabyte CMSes for non-profits to terabytes of financial data and high traffic consumer websites. I&apos;ve worked in data processing, product development, and business intelligence. I am happy open-source convert and lone data wrangler in a land of web developers using Ruby on Rails.
  • #3 ORM = Object-Relational Mapper &amp;quot;An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.” – design pattern by Martin Fowler that inspired ActiveRecord Object-relational gap – Rails 3, Apress Examples: ActiveRecord (Ruby) http://ar.rubyonrails.org/ SQL Alchemy (Python) http://www.sqlalchemy.org/ Hibernate (Java) http://hibernate.org/ http://hibernate.org/about/orm.html - LINQ (.Net/Microsoft) http://msdn.microsoft.com/en-us/library/cc161164.aspx RapidDataMapper (PHP) http://www.rapiddatamapper.com/ Interested in how they’re implemented and why? Check out http://excoventures.com/talks/orms-good-bad-necessary.pdf
  • #4 Topics we’ll cover
  • #5 Purpose of examples: Complex relationships are not impossible, and often having them declared in your application &amp; thus transparent to developers is desirable.
  • #6 PG Table Inheritance = Concrete Table Inheritance =&gt; for sharing common attributes Object inheritance = Single Table Inheritance =&gt; for entirely the same attributes
  • #7 Postgres = Concrete Table inheritance – http://www.postgresql.org/docs/9.0/static/ddl-inherit.html
  • #8 Single Table Inheritance - http://www.martinfowler.com/eaaCatalog/singleTableInheritance.html
  • #10 Value proposition of a split table entirely changes here – before, split table when you’re worried the physical pages of the data might be splitting &amp; slowing down single-row retrieval. Since you can’t limit your application by columns (without making a view with triggers or making the object read-only), you need to watch your data structure in a different way. E.g. Rails Console &gt; Accounts.find(2) = SELECT * FROM &amp;quot;accounts&amp;quot; WHERE (&amp;quot;accounts&amp;quot;.&amp;quot;id&amp;quot; = 2) Rails Console &gt; Accounts.find(2).updated_at = SELECT * FROM &amp;quot;accounts&amp;quot; WHERE (&amp;quot;accounts&amp;quot;.&amp;quot;id&amp;quot; = 2) //rails code operates on Account object
  • #11 ActiveRecord likes to create NULLs, but save empty strings. You can create triggers to prevent empty strings or clean up to NULLs, but remember they are not actually the same and you may want to support empty strings in some scenarios. Refreshing Cached Data after set-based updates - http://stackoverflow.com/questions/670379/large-volume-database-updates-with-an-orm Example from postgresql mailing list: “Evidently RoR&apos;s ActiveRecord helpfully converts a string containing nought but spaces to nil when a numeric value is required for the column type. The problem arises with a single unit record received from the government system that has a UOM code provided but the associated decimal value field is blank. Since the default is zero in our DB I have altered our load program to coerce a value of zero for strings containing only spaces destined for numeric columns.  But, it feels ugly.  I would really like to be able to coerce nils to some value on a column by column basis on the DBMS side.  This is not really a DEFAULT value and I do not know what I would call it if such a thing did exist.  I suppose a trigger and function is called for.”
  • #12 If developers need to write migrations, down method helps set the standard that progress-saving code with option to easily rollback or save state is necessary.
  • #15 For any organization, especially startups, your data is everything. Your application itself may need to drastically change, but if your data isn’t properly constructed you won’t be able to reuse it (just like poorly written code must be scrapped, but even more expensive and sad). Show your team the data created by features – most often the problem is that an application developer just never sees what the code actually causes. Bad data can be more dangerous than no data.