Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dare to build vertical design with relational data (Entity-Attribute-Value)


Published on

Entity-Attribute-Value model is often called “anti-pattern” by the criticism. And probably they would be right if one misses to read the “Handle with Care” label on it. Enthusiastic inexperienced developers would easily compromise the benefits of relational DB but the coin has yet another side. Hierarchical object with thousands of properties, unknown schema, flexibility and millions of records. As always – we have to sacrifice one thing in order to win another. Then all it comes to priorities and ability for decision making. At this lecture you will not get a step-by-step manual but instead get ideas for how to build one for you. A challenge, a proof of concept, hard work and successful project for millions – that is the story to share.

Published in: Software

Dare to build vertical design with relational data (Entity-Attribute-Value)

  1. 1. Dare to Build Vertical Design with Relational Data Is EAV an Anti-Pattern?
  2. 2. About me Project Manager @ 10 years professional experience Microsoft Certified Specialist Business Interests ASP.NET, AJAX, jQuery SOA, Integration GIS, Mapping SQL optimization 2 |
  3. 3. Agenda 3 | The ChemXchange challenge What is EAV Is it an anti-pattern? Alternatives When EAV Demo
  4. 4. What is ChemXchange 4 | Dangerous substances Strict legislation Complex supply chain Problems No effective risk communication Information lacks compliance with EU standards Cultural and linguistic (seasonal workers) High industry information management costs 2’000’000 illnesses every year cost € 75 billion/year
  5. 5. Material Safety Data Sheet Information on the properties of a substance Complexity 5 | Not fixed SDS schema 6 countries, 8 languages 17 Sections, 1800 fields Quantity (5y estimate) 120’000 SDS, 100’000 projects 36’000’00 values, 48’000’000 phrases
  6. 6. Behind the Scenes One SDS has Common values – all countries i.e. Chemical subtances 65 national values Trade name(ethanol) / Synonym(ethyl alcohol, grain alcohol) 850 language dependent phrase fields R45: May cause cancer. R45: Può provocare il cancro. R45: Kan forårsake kreft. 50+ National Extensions Country specific schema fragments Schema changed during development (2 times)
  7. 7. Decision Making Know Why Consider Alternatives Relational DB model NoSQL (Mongo DB) SQL XML Data Type Entity-Attribute-Value Proove it will work! 7 |
  8. 8. Entity-Attribute-Value Entity The object being described Attribute Object properties Value The value of an attribute Also known as Vertical DB Open Schema Anti-pattern… stores all attributes in one table in one column one attribute per row
  9. 9. Classic EAV Model Relational Data EAV Tables Entity Entity Attribute Value
  10. 10. The Criticism Says … Top Design Mistake beginners do Difficult to rely on Attribute Names Difficult to enforce Data Type integrity Difficult to enforce NOT NULL attributes Difficult to enforce Referential Integrity Difficult to Reconstruct Entity Complex Queries for trivial tasks Solution ? Do it all in application logic ?
  11. 11. The Anti-Pattern Strikes Back Attribute Metadata Data type IsRequired Multiplicity Validation expression Strong Type columns Read Object-at-Time Write Value-at-Time Schema Caching Own Search Plan
  12. 12. To Limit EAV Drawbacks Preconditions Define: Slow Not for simple and static schemas Numerous sparse data No EAV - system data relations Identify attributes used in code Schema versioning and mapping (3.5 to 4.0)
  13. 13. Enhanced EAV Model EAV Tables Entity Attribute Value
  14. 14. Why EAV Pros Less time for requirements Less time to design, develop Easy to add new entities Generic interface components Easy to change schema Require less understanding Cons Complex SQL for simple reports Complex reports are … Poor performance for large data Learning curve Difficult to debug Hybrid Model the “meaningful” attributes – business logic and search Custom attributes are used primary for visualization
  15. 15. Relational DB Model Pros RDBMS maintains data integrity and consistency RDBMS used for other modules Relational solution already exists Cons Extremely complex schema and naming Requires deep schema understanding Difficult changes in schema over time Specific user interface (1800 fields)
  16. 16. - MongoDB Pros MongoDB – open, BSON document based Flexible schema, flexible data types Stores the whole object and its relations JavaScript Shell commands Designed for bigness and scalability C# and LINQ Driver exist (free) Cons Command line admin tools!!! No multi-doc transactions Relationships maintained by developers Redundant data (names reflect storage) Language – phrases Country – national fields
  17. 17. XML and SQL Server SQL Server native XML support Typed vs. Untyped XML XQuery XML Data Type Methods query(‘XQuery’) value(‘XQuery’, SQLType) exist(‘XQuery’) modify(XML DML) nodes(‘XQuery’) as Table(Col)
  18. 18. XML SQL Data Type Pros Similar to EAV with much out of the box Solution already exists (Oracle) Functionally close to NoSQL Document-XML export / import Cons Same problems as NoSQL Poor performance Query returns XML Still no XML schema for the SDS object Language phrases are out of the XML Company bankrupted and sold out
  19. 19. Proving it Will Work Proof of Concept Identified 250+ reusable attribute classes Unit (Name, Value) Phrase (PhraseID, PhraseText, SubPhrase [0:N]) Edit • View and Edit controls with jQuery and real AJAX • Edit controls save JSON • DB generates missing parent nodes Search • 40’000’000 property values • 25’000’000 phrases • 25 filters Lazy Loading
  20. 20. Advanced Document Search 60+ search fields One field - Many properties PIVOT Data Redundant search Duplicate relational data Works for 30% of fields Complex maintenance Substances, national fields Values in different languages
  21. 21. Is EAV Search Slow ? Likely to happen… Search in own organization (1000 documents) Search only in last schema version Search on up to 5 filters The worst case… Multiple joins (30+) Cannot use statistics (no optimal query plan) Optimization Prioritized queries Report generator ready Queries ordered by speed Average on last 20 executions Applies on daily basis Interrupt if no data returned Nightly index rebuild
  22. 22. Our Lessons Learned Schema must be flexible Numerous classes Sparse data Cost Queries are slow 22 | user waits extra 0.5s Queries are complex We can handle them It is fun No row-at-a-time (Not for other reason) Benefits Save 75% of time Flexible architecture Dynamic UI Support future schemas
  23. 23. DEMO Advanced Search PIVOT XML DataType
  24. 24. SQL Saturday #152 Sponsors