Dare to Build Vertical Design 
with Relational Data 
Is EAV an Anti-Pattern?
About me 
 Project Manager @ 
 10 years professional experience 
 Microsoft Certified Specialist 
 ivelin.andreev@icb.bg 
 http://www.linkedin.com/in/ivelin 
 Business Interests 
 ASP.NET, AJAX, jQuery 
 SOA, Integration 
 GIS, Mapping 
 SQL optimization 
2 |
Agenda 
3 | 
 The ChemXchange challenge 
 What is EAV 
 Is it an anti-pattern? 
 Alternatives 
 When EAV 
 Demo
What is ChemXchange 
4 | 
 Dangerous substances 
 Strict legislation 
 Complex supply chain 
 Problems 
 No effective risk communication 
 Information lacks compliance with EU standards 
 Cultural and linguistic (seasonal workers) 
 High industry information management costs 
 2’000’000 illnesses every year cost € 75 billion/year
Material Safety Data Sheet 
 Information on the properties of a substance 
 Complexity 
5 | 
 Not fixed SDS schema 
 6 countries, 8 languages 
 17 Sections, 1800 fields 
 Quantity (5y estimate) 
 120’000 SDS, 100’000 projects 
 36’000’00 values, 48’000’000 phrases
Behind the Scenes 
 One SDS has 
 Common values – all countries 
 i.e. Chemical subtances 
 65 national values 
 Trade name(ethanol) / Synonym(ethyl alcohol, grain alcohol) 
 850 language dependent phrase fields 
 R45: May cause cancer. 
 R45: Può provocare il cancro. 
 R45: Kan forårsake kreft. 
 50+ National Extensions 
 Country specific schema fragments 
 Schema changed during development (2 times)
Decision Making 
 Know Why 
 Consider Alternatives 
 Relational DB model 
 NoSQL (Mongo DB) 
 SQL XML Data Type 
 Entity-Attribute-Value 
 Proove it will work! 
7 |
Entity-Attribute-Value 
 Entity 
 The object being described 
 Attribute 
 Object properties 
 Value 
 The value of an attribute 
 Also known as 
 Vertical DB 
 Open Schema 
 Anti-pattern… 
 stores all attributes in one table 
 in one column 
 one attribute per row
Classic EAV Model 
 Relational Data 
 EAV Tables 
 Entity 
 Entity Attribute 
 Value
The Criticism Says … 
 Top Design Mistake beginners do 
 Difficult to rely on Attribute Names 
 Difficult to enforce Data Type integrity 
 Difficult to enforce NOT NULL attributes 
 Difficult to enforce Referential Integrity 
 Difficult to Reconstruct Entity 
 Complex Queries for trivial tasks 
Solution ? 
Do it all in application logic ?
The Anti-Pattern Strikes Back 
 Attribute Metadata 
 Data type 
 IsRequired 
 Multiplicity 
 Validation expression 
 Strong Type columns 
 Read Object-at-Time 
 Write Value-at-Time 
 Schema Caching 
 Own Search Plan
To Limit EAV Drawbacks 
 Preconditions 
 Define: Slow 
 Not for simple and static schemas 
 Numerous sparse data 
 No EAV - system data relations 
 Identify attributes used in code 
 Schema versioning and mapping (3.5 to 4.0)
Enhanced EAV Model 
 EAV Tables 
 Entity 
 Attribute 
 Value
Why EAV 
Pros 
 Less time for requirements 
 Less time to design, develop 
 Easy to add new entities 
 Generic interface components 
 Easy to change schema 
 Require less understanding 
Cons 
 Complex SQL for simple reports 
 Complex reports are … 
 Poor performance for large data 
 Learning curve 
 Difficult to debug 
Hybrid 
 Model the “meaningful” attributes – business logic and search 
 Custom attributes are used primary for visualization
Relational DB Model 
 Pros 
 RDBMS maintains data integrity and consistency 
 RDBMS used for other modules 
 Relational solution already exists 
 Cons 
 Extremely complex schema and naming 
 Requires deep schema understanding 
 Difficult changes in schema over time 
 Specific user interface (1800 fields)
- MongoDB 
 Pros 
 MongoDB – open, BSON document based 
 Flexible schema, flexible data types 
 Stores the whole object and its relations 
 JavaScript Shell commands 
 Designed for bigness and scalability 
 C# and LINQ Driver exist (free) 
 Cons 
 Command line admin tools!!! 
 No multi-doc transactions 
 Relationships maintained by developers 
 Redundant data (names reflect storage) 
 Language – phrases 
 Country – national fields
XML and SQL Server 
 SQL Server native XML support 
 Typed vs. Untyped XML 
 XQuery 
 XML Data Type Methods 
 query(‘XQuery’) 
 value(‘XQuery’, SQLType) 
 exist(‘XQuery’) 
 modify(XML DML) 
 nodes(‘XQuery’) as Table(Col)
XML SQL Data Type 
 Pros 
 Similar to EAV with much out of the box 
 Solution already exists (Oracle) 
 Functionally close to NoSQL 
 Document-XML export / import 
 Cons 
 Same problems as NoSQL 
 Poor performance 
 Query returns XML 
 Still no XML schema for the SDS object 
 Language phrases are out of the XML 
 Company bankrupted and sold out
Proving it Will Work 
 Proof of Concept 
 Identified 250+ reusable attribute classes 
 Unit (Name, Value) 
 Phrase (PhraseID, PhraseText, SubPhrase [0:N]) 
 Edit 
• View and Edit controls with jQuery and real AJAX 
• Edit controls save JSON 
• DB generates missing parent nodes 
 Search 
• 40’000’000 property values 
• 25’000’000 phrases 
• 25 filters 
 Lazy Loading
Advanced Document Search 
 60+ search fields 
 One field - Many properties 
 PIVOT Data 
 Redundant search 
 Duplicate relational data 
 Works for 30% of fields 
 Complex maintenance 
 Substances, national fields 
 Values in different languages
Is EAV Search Slow ? 
 Likely to happen… 
 Search in own organization (1000 documents) 
 Search only in last schema version 
 Search on up to 5 filters 
 The worst case… 
 Multiple joins (30+) 
 Cannot use statistics (no optimal query plan) 
 Optimization 
 Prioritized queries 
 Report generator ready 
 Queries ordered by speed 
 Average on last 20 executions 
 Applies on daily basis 
 Interrupt if no data returned 
 Nightly index rebuild
Our Lessons Learned 
 Schema must be flexible 
 Numerous classes 
 Sparse data 
Cost 
 Queries are slow 
22 | 
 user waits extra 0.5s 
 Queries are complex 
 We can handle them 
 It is fun 
 No row-at-a-time 
 (Not for other reason) 
Benefits 
 Save 75% of time 
 Flexible architecture 
 Dynamic UI 
 Support future schemas
DEMO 
 Advanced Search 
 PIVOT 
 XML DataType
SQL Saturday #152 Sponsors

Dare to build vertical design with relational data (Entity-Attribute-Value)

  • 1.
    Dare to BuildVertical Design with Relational Data Is EAV an Anti-Pattern?
  • 2.
    About me Project Manager @ 10 years professional experience Microsoft Certified Specialist ivelin.andreev@icb.bg http://www.linkedin.com/in/ivelin Business Interests ASP.NET, AJAX, jQuery SOA, Integration GIS, Mapping SQL optimization 2 |
  • 3.
    Agenda 3 | The ChemXchange challenge What is EAV Is it an anti-pattern? Alternatives When EAV Demo
  • 4.
    What is ChemXchange 4 | Dangerous substances Strict legislation Complex supply chain Problems No effective risk communication Information lacks compliance with EU standards Cultural and linguistic (seasonal workers) High industry information management costs 2’000’000 illnesses every year cost € 75 billion/year
  • 5.
    Material Safety DataSheet Information on the properties of a substance Complexity 5 | Not fixed SDS schema 6 countries, 8 languages 17 Sections, 1800 fields Quantity (5y estimate) 120’000 SDS, 100’000 projects 36’000’00 values, 48’000’000 phrases
  • 6.
    Behind the Scenes One SDS has Common values – all countries i.e. Chemical subtances 65 national values Trade name(ethanol) / Synonym(ethyl alcohol, grain alcohol) 850 language dependent phrase fields R45: May cause cancer. R45: Può provocare il cancro. R45: Kan forårsake kreft. 50+ National Extensions Country specific schema fragments Schema changed during development (2 times)
  • 7.
    Decision Making Know Why Consider Alternatives Relational DB model NoSQL (Mongo DB) SQL XML Data Type Entity-Attribute-Value Proove it will work! 7 |
  • 8.
    Entity-Attribute-Value Entity The object being described Attribute Object properties Value The value of an attribute Also known as Vertical DB Open Schema Anti-pattern… stores all attributes in one table in one column one attribute per row
  • 9.
    Classic EAV Model Relational Data EAV Tables Entity Entity Attribute Value
  • 10.
    The Criticism Says… Top Design Mistake beginners do Difficult to rely on Attribute Names Difficult to enforce Data Type integrity Difficult to enforce NOT NULL attributes Difficult to enforce Referential Integrity Difficult to Reconstruct Entity Complex Queries for trivial tasks Solution ? Do it all in application logic ?
  • 11.
    The Anti-Pattern StrikesBack Attribute Metadata Data type IsRequired Multiplicity Validation expression Strong Type columns Read Object-at-Time Write Value-at-Time Schema Caching Own Search Plan
  • 12.
    To Limit EAVDrawbacks Preconditions Define: Slow Not for simple and static schemas Numerous sparse data No EAV - system data relations Identify attributes used in code Schema versioning and mapping (3.5 to 4.0)
  • 13.
    Enhanced EAV Model EAV Tables Entity Attribute Value
  • 14.
    Why EAV Pros Less time for requirements Less time to design, develop Easy to add new entities Generic interface components Easy to change schema Require less understanding Cons Complex SQL for simple reports Complex reports are … Poor performance for large data Learning curve Difficult to debug Hybrid Model the “meaningful” attributes – business logic and search Custom attributes are used primary for visualization
  • 15.
    Relational DB Model Pros RDBMS maintains data integrity and consistency RDBMS used for other modules Relational solution already exists Cons Extremely complex schema and naming Requires deep schema understanding Difficult changes in schema over time Specific user interface (1800 fields)
  • 16.
    - MongoDB Pros MongoDB – open, BSON document based Flexible schema, flexible data types Stores the whole object and its relations JavaScript Shell commands Designed for bigness and scalability C# and LINQ Driver exist (free) Cons Command line admin tools!!! No multi-doc transactions Relationships maintained by developers Redundant data (names reflect storage) Language – phrases Country – national fields
  • 17.
    XML and SQLServer SQL Server native XML support Typed vs. Untyped XML XQuery XML Data Type Methods query(‘XQuery’) value(‘XQuery’, SQLType) exist(‘XQuery’) modify(XML DML) nodes(‘XQuery’) as Table(Col)
  • 18.
    XML SQL DataType Pros Similar to EAV with much out of the box Solution already exists (Oracle) Functionally close to NoSQL Document-XML export / import Cons Same problems as NoSQL Poor performance Query returns XML Still no XML schema for the SDS object Language phrases are out of the XML Company bankrupted and sold out
  • 19.
    Proving it WillWork Proof of Concept Identified 250+ reusable attribute classes Unit (Name, Value) Phrase (PhraseID, PhraseText, SubPhrase [0:N]) Edit • View and Edit controls with jQuery and real AJAX • Edit controls save JSON • DB generates missing parent nodes Search • 40’000’000 property values • 25’000’000 phrases • 25 filters Lazy Loading
  • 20.
    Advanced Document Search 60+ search fields One field - Many properties PIVOT Data Redundant search Duplicate relational data Works for 30% of fields Complex maintenance Substances, national fields Values in different languages
  • 21.
    Is EAV SearchSlow ? Likely to happen… Search in own organization (1000 documents) Search only in last schema version Search on up to 5 filters The worst case… Multiple joins (30+) Cannot use statistics (no optimal query plan) Optimization Prioritized queries Report generator ready Queries ordered by speed Average on last 20 executions Applies on daily basis Interrupt if no data returned Nightly index rebuild
  • 22.
    Our Lessons Learned Schema must be flexible Numerous classes Sparse data Cost Queries are slow 22 | user waits extra 0.5s Queries are complex We can handle them It is fun No row-at-a-time (Not for other reason) Benefits Save 75% of time Flexible architecture Dynamic UI Support future schemas
  • 23.
    DEMO AdvancedSearch PIVOT XML DataType
  • 25.