Data Modelers Save Their Careers:
Surviving and Thriving with NoSQL
Joe Maguire
Data Quality Strategies, LLC
http://www.DataQualityStrategies.com/
© 2013 Data Quality Strategies, LLC
Thesis
• Relational DBMS’s have dominated,
• ...so relational modeling subsumed other
forms, including conceptual modeling.
• As R-DBMS wanes, so does relational
modeling – and sadly, whatever it subsumed.
• Conceptual modeling must be saved.
• Relational modelers can step in to save it...
• ...with some significant effort.
25 June 2013 © 2013 Data Quality Strategies, LLC 2
My Perspective
• Over three decades in industry
• Career is a three-legged stool
– Product development for software vendors
– Solution design for enterprises
– Author, Industry Analyst, Thought Leader
• Specialize in
– Modeling
– Requirements analysis
– Data architecture
– Data quality
• Joe.Maguire@DataQualityStrategies.com
25 June 2013 © 2013 Data Quality Strategies, LLC 3
Agenda
• History
• Current Events
• Your Future as a Data Modeler
• Q&A
25 June 2013 © 2013 Data Quality Strategies, LLC 4
A Big-Picture Framework
25 June 2013 © 2013 Data Quality Strategies, LLC 5
Meta-model Data Perspective
Conceptual • Entities
• Attributes
• Relationships
• Identifiers
Logical • Tables
• Columns
• Primary and foreign keys
Physical • Indexes
• Table spaces
• Vertical and horizontal partitioning
• Denormalizations
Good Ideas in the Framework
• Information Hiding
– e.g., conceptual excludes implementation details
• The Type/Instance distinction
– Models describe categories, data describes members
• Application/Data Independence
– Data modeling is separate from process modeling
• User Requirements ≠ System Requirements
– Users should not participate in logical and physical
• Model-Driven Development
– Forward and reverse engineering across model levels
25 June 2013 © 2013 Data Quality Strategies, LLC 6
A Big-Picture Framework, distorted
25 June 2013 © 2013 Data Quality Strategies, LLC 7
Meta-model Data Perspective
Relational • Entities / Tables
• Attributes / Columns
• Relationships / FKs
• Identifiers / PKs
Physical • Indexes
• Table spaces
• Vertical and horizontal partitioning
• Denormalizations
How the Distortion Happens
• Tool Vendors Dismiss Conceptual Modeling
– Because their tools cannot support it anyway
• Info Mgmt Specialists Confuse Models w Reality
– E.g., believing the relational model suffices to
describe the universe
• Institutionalized Expediency
– We know about conceptual modeling, but to save
time, we combine it with relational modeling...
– ...then we formalize that into our dev processes...
– ...and eventually, that becomes the “best practices.”
25 June 2013 © 2013 Data Quality Strategies, LLC 8
Distortions, Revisited
• Summary of Distortions:
– Distortion: Conceptual means vague
– Distortion: Logical implies relational
• Rather than implying XML, OO, KV Store, Array
Database, Graph Database
• Results of Distortions:
– Two levels only: relational and physical
– Relational modeling used for user requirements
25 June 2013 © 2013 Data Quality Strategies, LLC 9
Agenda
• History
• Current Events
• Your Future as a Data Modeler
• Q&A
25 June 2013 © 2013 Data Quality Strategies, LLC 10
Current Events: NoSQL
• The “Just Say No” Interpretation
25 June 2013 © 2013 Data Quality Strategies, LLC 11
Meta-model Data Perspective
Logical
Relational
• Entities / Tables
• Attributes / Columns
• Relationships / FKs
• Identifiers / PKs
Physical NO LONGER RELATIONAL:
• Schemas Based on Big Table Implementations
• Alien DDL language
• Limited Support from Modeling Tools
Current Events: NoSQL
25 June 2013 © 2013 Data Quality Strategies, LLC 12
• The “Not Only SQL” Interpretation
– Okay, so there might be some work for you
– But you’re at risk of being marginalized
Agenda
• History
• Current Events
• Your Future as a Data Modeler
• Summary
• Q&A
25 June 2013 © 2013 Data Quality Strategies, LLC 13
Your Future as a Modeler
25 June 2013 © 2013 Data Quality Strategies, LLC 14
• Remaining Relevant
– Selfishly: Saving your career
– Nobly: Serving your client / company / customer
• What You Can Do:
– Wait for relational projects
– Become a NoSQL database designer
– Help your client choose data platforms
• That starts with understanding the problems
– which starts with CONCEPTUAL MODELING.
A New (?) Modeling Framework
• Conceptual Modeling
• Choosing a Logical Meta-model
• Logical Modeling
• Physical Modeling
• Tool Support?
25 June 2013 © 2013 Data Quality Strategies, LLC 15
Conceptual Modeling
• Behaviors and constructs will compare to
relational modeling:
– Keep some
– Discard some
– Stress some
– Change some
25 June 2013 © 2013 Data Quality Strategies, LLC 16
Conceptual Data Model Example
25 June 2013 © 2013 Data Quality Strategies, LLC 17
Keep Some
• Keep Entities
• Keep Attributes
• Keep Relationships
• Keep Identifiers
• Keep Maximum Cardinality of Relationships
25 June 2013 © 2013 Data Quality Strategies, LLC 18
Keep Entities
• Minimum Expressiveness
• Entities, Not Tables
– Don’t express horizontal or vertical partitioning for
performance
• But yes if motivated by privacy/security/risk
• Entity names, not table names
– Honor user vocabulary, not IT naming standards
25 June 2013 © 2013 Data Quality Strategies, LLC 19
Keep Attributes
• Honor The User Phenomenon
– Attributes are part of user discourse
• Attributes, Not Columns
– Worry about scale
(nominal, numeric, ordinal, Boolean, cyclic), not
data type
– Attribute names, not column names
• Support In-Progress Models
– During which attributes can become entities
25 June 2013 © 2013 Data Quality Strategies, LLC 20
Keep Relationships
• Minimum Expressiveness
– Relationships are part of user discourse
• Allow Many-Many and Collection Entities
– If the latter seem illegal, you’ve been in IT too long
• Relationships, not FKs
25 June 2013 © 2013 Data Quality Strategies, LLC 21
• Relationships, not Foreign Keys
– (achievement DOES NOT have code or creatureID)
Keep Relationships
25 June 2013 © 2013 Data Quality Strategies, LLC 22
• Many-Many Allowed
Keep Relationships
25 June 2013 © 2013 Data Quality Strategies, LLC 23
Keep Identifiers
• Identifiers, Not PKs
– IDs are not motivated by computerization, but by
typography
– IDs predate the information revolution
• and the automotive revolution, for that matter
– Allow collection entities
• Support In-Progress Modeling
– IDs help the modeler ferret out the homonym
problem
25 June 2013 © 2013 Data Quality Strategies, LLC 24
Keep Identifiers
• Identifiers, not PKs. (E.g., Collection Entities):
– (each squad is identified by the skaters on it.)
25 June 2013 © 2013 Data Quality Strategies, LLC 25
Discard Some
• Discard Foreign Keys
– They’re relational
• Discard Minimum Cardinality
– A function of process or policy, not data
– Over-reported by users
• Discard Most Constraints
– A function of process or policy, not data
– Are over-reported by users
25 June 2013 © 2013 Data Quality Strategies, LLC 26
Discard Minimum Cardinality
• Must EVERY instance of meeting have a person?
– No. E.g., CassandraSummit 2014 already has a date and
location but has zero persons associated with it.
• More generally: Should the DBMS refuse to store
incomplete data?
– People get interrupted and want to save their partial
work.
25 June 2013 © 2013 Data Quality Strategies, LLC 27
Keep/Discard Rule of Thumb
• Keep
– Anything that helps you and the users together
discover and name the user categories
• Discard
– Anything else
25 June 2013 © 2013 Data Quality Strategies, LLC 28
Conceptual Data Model Examples
25 June 2013 © 2013 Data Quality Strategies, LLC 29
Stress Some
• Stress Consistency Requirements
– Relational modelers (of non-distributed databases)
have not been asking about these.
• Stress Data Volume / Velocity Requirements
– Can lead or force your to relax application-data
independence
25 June 2013 © 2013 Data Quality Strategies, LLC 30
Change Some
• Change Your Process
– From math-y normalization to English-y
conversation with users
– Very difficult to achieve rigor conversationally
25 June 2013 © 2013 Data Quality Strategies, LLC 31
• More help:
– Mastering Data Modeling: A
User-Driven Approach
by Carlis & Maguire
A New Modeling Framework
• Conceptual Modeling
• Choosing a Logical Meta-Model
• Logical Modeling
• Physical Modeling
• Tool Support?
25 June 2013 © 2013 Data Quality Strategies, LLC 32
Choosing a Logical Meta-Model
• Don’t Assume Relational (Duh...)
• Don’t Assume Big Table, KV-Store, Cassandra
• Lots of Choices
– Relational
– Key-Value Store
– XML/Document Database
– Graph database
– Array database
– ...
25 June 2013 © 2013 Data Quality Strategies, LLC 33
A New Modeling Framework
• Conceptual Modeling
• Choosing a Logical Meta-Model
• Logical Modeling
• Physical Modeling
• Tool Support?
25 June 2013 © 2013 Data Quality Strategies, LLC 34
Logical, Physical, and Tool Support
• Minimal Support From Modeling Tools
– Because few tools support conceptual modeling
– Because vendors have not caught up to NoSQL yet
• Community Needs to Develop Shapes
– And the attendant transformations from conceptual
shapes to Big-Table shapes
• During Logical NoSQL Modeling, Process
Requirements Will Infiltrate
25 June 2013 © 2013 Data Quality Strategies, LLC 35
Agenda
• History
• Current Events
• Your Future as a Data Modeler
• Summary
• Q&A
25 June 2013 © 2013 Data Quality Strategies, LLC 36
Summary
• Recommit to Conceptual Modeling for
Requirements Analysis
– Some but not all relational-modeling skills will
apply
– Must learn to focus on user communication, not
nerdy stuff like intermediate normal forms
25 June 2013 © 2013 Data Quality Strategies, LLC 37
Summary
• Remember the fundamentals, so that you can
make informed decisions about relaxing them
– Application-data independence (relax knowingly)
– Distinguish problems from solutions (relax at your
own peril)
– Consistency level as a user requirement (as you
ask, you’ll find immediate consistency is often
negotiable)
25 June 2013 © 2013 Data Quality Strategies, LLC 38
Summary
• Additional Benefits
– Users will like you better
– Agile developers will like you better
– This framework works in traditional, all-SQL
environments
25 June 2013 © 2013 Data Quality Strategies, LLC 39
Q&A
• Joe.Maguire@DataQualityStrategies.com
• www.DataQualityStrategies.com
25 June 2013 © 2013 Data Quality Strategies, LLC 40

Data Modelers Still Have Jobs: Adjusting for the NoSQL Environment

  • 1.
    Data Modelers SaveTheir Careers: Surviving and Thriving with NoSQL Joe Maguire Data Quality Strategies, LLC http://www.DataQualityStrategies.com/ © 2013 Data Quality Strategies, LLC
  • 2.
    Thesis • Relational DBMS’shave dominated, • ...so relational modeling subsumed other forms, including conceptual modeling. • As R-DBMS wanes, so does relational modeling – and sadly, whatever it subsumed. • Conceptual modeling must be saved. • Relational modelers can step in to save it... • ...with some significant effort. 25 June 2013 © 2013 Data Quality Strategies, LLC 2
  • 3.
    My Perspective • Overthree decades in industry • Career is a three-legged stool – Product development for software vendors – Solution design for enterprises – Author, Industry Analyst, Thought Leader • Specialize in – Modeling – Requirements analysis – Data architecture – Data quality • Joe.Maguire@DataQualityStrategies.com 25 June 2013 © 2013 Data Quality Strategies, LLC 3
  • 4.
    Agenda • History • CurrentEvents • Your Future as a Data Modeler • Q&A 25 June 2013 © 2013 Data Quality Strategies, LLC 4
  • 5.
    A Big-Picture Framework 25June 2013 © 2013 Data Quality Strategies, LLC 5 Meta-model Data Perspective Conceptual • Entities • Attributes • Relationships • Identifiers Logical • Tables • Columns • Primary and foreign keys Physical • Indexes • Table spaces • Vertical and horizontal partitioning • Denormalizations
  • 6.
    Good Ideas inthe Framework • Information Hiding – e.g., conceptual excludes implementation details • The Type/Instance distinction – Models describe categories, data describes members • Application/Data Independence – Data modeling is separate from process modeling • User Requirements ≠ System Requirements – Users should not participate in logical and physical • Model-Driven Development – Forward and reverse engineering across model levels 25 June 2013 © 2013 Data Quality Strategies, LLC 6
  • 7.
    A Big-Picture Framework,distorted 25 June 2013 © 2013 Data Quality Strategies, LLC 7 Meta-model Data Perspective Relational • Entities / Tables • Attributes / Columns • Relationships / FKs • Identifiers / PKs Physical • Indexes • Table spaces • Vertical and horizontal partitioning • Denormalizations
  • 8.
    How the DistortionHappens • Tool Vendors Dismiss Conceptual Modeling – Because their tools cannot support it anyway • Info Mgmt Specialists Confuse Models w Reality – E.g., believing the relational model suffices to describe the universe • Institutionalized Expediency – We know about conceptual modeling, but to save time, we combine it with relational modeling... – ...then we formalize that into our dev processes... – ...and eventually, that becomes the “best practices.” 25 June 2013 © 2013 Data Quality Strategies, LLC 8
  • 9.
    Distortions, Revisited • Summaryof Distortions: – Distortion: Conceptual means vague – Distortion: Logical implies relational • Rather than implying XML, OO, KV Store, Array Database, Graph Database • Results of Distortions: – Two levels only: relational and physical – Relational modeling used for user requirements 25 June 2013 © 2013 Data Quality Strategies, LLC 9
  • 10.
    Agenda • History • CurrentEvents • Your Future as a Data Modeler • Q&A 25 June 2013 © 2013 Data Quality Strategies, LLC 10
  • 11.
    Current Events: NoSQL •The “Just Say No” Interpretation 25 June 2013 © 2013 Data Quality Strategies, LLC 11 Meta-model Data Perspective Logical Relational • Entities / Tables • Attributes / Columns • Relationships / FKs • Identifiers / PKs Physical NO LONGER RELATIONAL: • Schemas Based on Big Table Implementations • Alien DDL language • Limited Support from Modeling Tools
  • 12.
    Current Events: NoSQL 25June 2013 © 2013 Data Quality Strategies, LLC 12 • The “Not Only SQL” Interpretation – Okay, so there might be some work for you – But you’re at risk of being marginalized
  • 13.
    Agenda • History • CurrentEvents • Your Future as a Data Modeler • Summary • Q&A 25 June 2013 © 2013 Data Quality Strategies, LLC 13
  • 14.
    Your Future asa Modeler 25 June 2013 © 2013 Data Quality Strategies, LLC 14 • Remaining Relevant – Selfishly: Saving your career – Nobly: Serving your client / company / customer • What You Can Do: – Wait for relational projects – Become a NoSQL database designer – Help your client choose data platforms • That starts with understanding the problems – which starts with CONCEPTUAL MODELING.
  • 15.
    A New (?)Modeling Framework • Conceptual Modeling • Choosing a Logical Meta-model • Logical Modeling • Physical Modeling • Tool Support? 25 June 2013 © 2013 Data Quality Strategies, LLC 15
  • 16.
    Conceptual Modeling • Behaviorsand constructs will compare to relational modeling: – Keep some – Discard some – Stress some – Change some 25 June 2013 © 2013 Data Quality Strategies, LLC 16
  • 17.
    Conceptual Data ModelExample 25 June 2013 © 2013 Data Quality Strategies, LLC 17
  • 18.
    Keep Some • KeepEntities • Keep Attributes • Keep Relationships • Keep Identifiers • Keep Maximum Cardinality of Relationships 25 June 2013 © 2013 Data Quality Strategies, LLC 18
  • 19.
    Keep Entities • MinimumExpressiveness • Entities, Not Tables – Don’t express horizontal or vertical partitioning for performance • But yes if motivated by privacy/security/risk • Entity names, not table names – Honor user vocabulary, not IT naming standards 25 June 2013 © 2013 Data Quality Strategies, LLC 19
  • 20.
    Keep Attributes • HonorThe User Phenomenon – Attributes are part of user discourse • Attributes, Not Columns – Worry about scale (nominal, numeric, ordinal, Boolean, cyclic), not data type – Attribute names, not column names • Support In-Progress Models – During which attributes can become entities 25 June 2013 © 2013 Data Quality Strategies, LLC 20
  • 21.
    Keep Relationships • MinimumExpressiveness – Relationships are part of user discourse • Allow Many-Many and Collection Entities – If the latter seem illegal, you’ve been in IT too long • Relationships, not FKs 25 June 2013 © 2013 Data Quality Strategies, LLC 21
  • 22.
    • Relationships, notForeign Keys – (achievement DOES NOT have code or creatureID) Keep Relationships 25 June 2013 © 2013 Data Quality Strategies, LLC 22
  • 23.
    • Many-Many Allowed KeepRelationships 25 June 2013 © 2013 Data Quality Strategies, LLC 23
  • 24.
    Keep Identifiers • Identifiers,Not PKs – IDs are not motivated by computerization, but by typography – IDs predate the information revolution • and the automotive revolution, for that matter – Allow collection entities • Support In-Progress Modeling – IDs help the modeler ferret out the homonym problem 25 June 2013 © 2013 Data Quality Strategies, LLC 24
  • 25.
    Keep Identifiers • Identifiers,not PKs. (E.g., Collection Entities): – (each squad is identified by the skaters on it.) 25 June 2013 © 2013 Data Quality Strategies, LLC 25
  • 26.
    Discard Some • DiscardForeign Keys – They’re relational • Discard Minimum Cardinality – A function of process or policy, not data – Over-reported by users • Discard Most Constraints – A function of process or policy, not data – Are over-reported by users 25 June 2013 © 2013 Data Quality Strategies, LLC 26
  • 27.
    Discard Minimum Cardinality •Must EVERY instance of meeting have a person? – No. E.g., CassandraSummit 2014 already has a date and location but has zero persons associated with it. • More generally: Should the DBMS refuse to store incomplete data? – People get interrupted and want to save their partial work. 25 June 2013 © 2013 Data Quality Strategies, LLC 27
  • 28.
    Keep/Discard Rule ofThumb • Keep – Anything that helps you and the users together discover and name the user categories • Discard – Anything else 25 June 2013 © 2013 Data Quality Strategies, LLC 28
  • 29.
    Conceptual Data ModelExamples 25 June 2013 © 2013 Data Quality Strategies, LLC 29
  • 30.
    Stress Some • StressConsistency Requirements – Relational modelers (of non-distributed databases) have not been asking about these. • Stress Data Volume / Velocity Requirements – Can lead or force your to relax application-data independence 25 June 2013 © 2013 Data Quality Strategies, LLC 30
  • 31.
    Change Some • ChangeYour Process – From math-y normalization to English-y conversation with users – Very difficult to achieve rigor conversationally 25 June 2013 © 2013 Data Quality Strategies, LLC 31 • More help: – Mastering Data Modeling: A User-Driven Approach by Carlis & Maguire
  • 32.
    A New ModelingFramework • Conceptual Modeling • Choosing a Logical Meta-Model • Logical Modeling • Physical Modeling • Tool Support? 25 June 2013 © 2013 Data Quality Strategies, LLC 32
  • 33.
    Choosing a LogicalMeta-Model • Don’t Assume Relational (Duh...) • Don’t Assume Big Table, KV-Store, Cassandra • Lots of Choices – Relational – Key-Value Store – XML/Document Database – Graph database – Array database – ... 25 June 2013 © 2013 Data Quality Strategies, LLC 33
  • 34.
    A New ModelingFramework • Conceptual Modeling • Choosing a Logical Meta-Model • Logical Modeling • Physical Modeling • Tool Support? 25 June 2013 © 2013 Data Quality Strategies, LLC 34
  • 35.
    Logical, Physical, andTool Support • Minimal Support From Modeling Tools – Because few tools support conceptual modeling – Because vendors have not caught up to NoSQL yet • Community Needs to Develop Shapes – And the attendant transformations from conceptual shapes to Big-Table shapes • During Logical NoSQL Modeling, Process Requirements Will Infiltrate 25 June 2013 © 2013 Data Quality Strategies, LLC 35
  • 36.
    Agenda • History • CurrentEvents • Your Future as a Data Modeler • Summary • Q&A 25 June 2013 © 2013 Data Quality Strategies, LLC 36
  • 37.
    Summary • Recommit toConceptual Modeling for Requirements Analysis – Some but not all relational-modeling skills will apply – Must learn to focus on user communication, not nerdy stuff like intermediate normal forms 25 June 2013 © 2013 Data Quality Strategies, LLC 37
  • 38.
    Summary • Remember thefundamentals, so that you can make informed decisions about relaxing them – Application-data independence (relax knowingly) – Distinguish problems from solutions (relax at your own peril) – Consistency level as a user requirement (as you ask, you’ll find immediate consistency is often negotiable) 25 June 2013 © 2013 Data Quality Strategies, LLC 38
  • 39.
    Summary • Additional Benefits –Users will like you better – Agile developers will like you better – This framework works in traditional, all-SQL environments 25 June 2013 © 2013 Data Quality Strategies, LLC 39
  • 40.

Editor's Notes

  • #6 Point of having a merged cell for physical: it’s all coming together – it’s increasingly difficult to distinguish the underlying physical model services…Here again, hypertext is not 1:1 with HTML – it’s beyond-the-basics hypertext as manifested, e.g., in Web publishing and collaboration-oriented systems/serversXQuery is not mainstream today, but it is exceptionally powerful and was co-developed in conjunction with XPath 2.0
  • #8 Point of having a merged cell for physical: it’s all coming together – it’s increasingly difficult to distinguish the underlying physical model services…Here again, hypertext is not 1:1 with HTML – it’s beyond-the-basics hypertext as manifested, e.g., in Web publishing and collaboration-oriented systems/serversXQuery is not mainstream today, but it is exceptionally powerful and was co-developed in conjunction with XPath 2.0
  • #13 Point of having a merged cell for physical: it’s all coming together – it’s increasingly difficult to distinguish the underlying physical model services…Here again, hypertext is not 1:1 with HTML – it’s beyond-the-basics hypertext as manifested, e.g., in Web publishing and collaboration-oriented systems/serversXQuery is not mainstream today, but it is exceptionally powerful and was co-developed in conjunction with XPath 2.0
  • #15 Point of having a merged cell for physical: it’s all coming together – it’s increasingly difficult to distinguish the underlying physical model services…Here again, hypertext is not 1:1 with HTML – it’s beyond-the-basics hypertext as manifested, e.g., in Web publishing and collaboration-oriented systems/serversXQuery is not mainstream today, but it is exceptionally powerful and was co-developed in conjunction with XPath 2.0
  • #18 Point of this slide: reinforce ability to discern major similarities/differences between two tools/services focused on similar domain, by comparing/contrasting model diagrams Non-technical people can easily learn how to read/use this type of model – not the case with most logical and physical model diagramming techniquesEvernote conceptual model fragment example from http://www.quepublishing.com/articles/article.aspx?p=1684320 Incomplete – a full conceptual model includes accompanying documentation, e.g., with entity definitions and examplesMicrosoft OneNote 2010 conceptual model fragment example from http://www.quepublishing.com/articles/article.aspx?p=1684320 Reason for including it: it provides an example, comparing it to the Evernote conceptual model fragment, of how easy it is to understand domains, when using conceptual models – e.g., the fact that OneNote has a more elaborate info item containment structure, and supports tags at the item/paragraph level, while Evernote tagging is at the note/page level. That’s not meant to be a judgment call; the extent to which Evernote or OneNote is more useful is a function of your info item/note-taking needs.
  • #30 Point of this slide: reinforce ability to discern major similarities/differences between two tools/services focused on similar domain, by comparing/contrasting model diagrams Non-technical people can easily learn how to read/use this type of model – not the case with most logical and physical model diagramming techniquesEvernote conceptual model fragment example from http://www.quepublishing.com/articles/article.aspx?p=1684320 Incomplete – a full conceptual model includes accompanying documentation, e.g., with entity definitions and examplesMicrosoft OneNote 2010 conceptual model fragment example from http://www.quepublishing.com/articles/article.aspx?p=1684320 Reason for including it: it provides an example, comparing it to the Evernote conceptual model fragment, of how easy it is to understand domains, when using conceptual models – e.g., the fact that OneNote has a more elaborate info item containment structure, and supports tags at the item/paragraph level, while Evernote tagging is at the note/page level. That’s not meant to be a judgment call; the extent to which Evernote or OneNote is more useful is a function of your info item/note-taking needs.