SlideShare a Scribd company logo
1 of 49
Download to read offline
I’ve Always Wanted To
Data Model
Ian Varley, Salesforce.com
Data Week, 2013-10-02
Lightning Talk (10 minutes)
Who am I?
Ian Varley
Austin, TX
Salesforce.com
Big Data Team
@thefutureian
What’s Data Modeling?
The act of taking the intelligible
structure of the world around us, and
making it concrete enough for
computers to act on it.
(More specifically, data modeling usually
has to do with storing it in a database.)
Traditionally, data modeling has meant
Entity Attribute Relationship
modeling techniques.

There are variants that are more “OO” (like UML) but they
share most of the same core assumptions.
Many a project was sunk
due to shitty data modeling.
It’s a difficult occupation.
You have to be part engineer, part
psychologist, and part philosopher.
If you’re doing it, you’re not alone.
Lots of smart folks think about this stuff.
(David Hay, Steve Hoberman, Joe Celko, many more.)
But.
The expressive power of our
conceptual modeling techniques hasn’t
improved much since the 1970s.

We mostly look at the world in the
same static way we did 40 years ago.
Partly, this is because our discipline is
wedded to relational (SQL) DBs.

When the only tool you have
is a hammer ...
A book that opened my eyes ...

(He said a lot of the stuff I’m about to say back in 1978!)
I don’t have a lot of answers.
But I want to raise some questions.
And hopefully, start a conversation.
Here are 5 observations about the
tools of traditional data modeling.
#1: nobody actually knows
what an “entity” really is.
“Entity” is another word for Category,
in linguistics terms.
And an important property of linguistic
categories is that they are slippery.
See:
● Steven Pinker: The Stuff Of Thought
● Douglas Hofstadter: Surfaces & Essences
● George Lakoff: Women, Fire, and Dangerous Things
part: an abstract definition of
a connected set of physical
materials that serve some
purpose, and that people are
willing to buy

part: one instance of a part
type, which arrives on the QA
line at a specific time and
either does or doesn't meet
quality standards
And if you think you can “solve” the
problem, I’ve got some world trade
center insurance policies to sell you.
That said, there are a couple tools we
could adopt that would help:
● First-class Sub- / Super-Typing
● First-class Scoping and Aliasing
(Not that there aren’t ways to do this in ERD models, but
they’re unobvious and not widely used.)
#2: entities, attributes, and
relationships are really the
same thing, maaaan ...

http://the-hippie-portfolio.tumblr.com/
Say I’ve got a “parent” in my model.
Is it:
● A “parent” entity?
● A “person” entity with
an “isParent” attribute?
● Two “person” entities in
a “parent” relationship?
It’s all of them; the distinction is
arbitrary.
The real structure is just a graph … but
none of our modeling tools are that
flexible, nor is it helpful to think that
abstractly about most software.
Normally, we make the choice based
on our experience and gut feeling, and
pretend there’s a science to it.
But the whole way of thinking is a
convenience based on “records”.
I have no idea what to do about this.
Tools that allow you to view any part of
your model in any of those ways?
I have no idea what to do about this.
Tools that allow you to view any part of
your model in any of those ways?
I have no idea what to do about this.
Tools that allow you to view any part of
your model in any of those ways?
This isn’t realistic with today’s tools, so
this is just idle speculation.
#3: prescriptive models
encourage black & white
thinking in a gray world
You have to make decisions (about
entities, attributes, relationships, types)
up front. But sometimes that’s not right.
This is a strength of (some) NoSQL
databases: you can do data first, and
surface structure later.
Sometimes the deep structure is
actually ambiguous.
This can apply broadly.
(What if an employee isn’t really “in” a department, but has
flexible membership based on where she spends her time?)
You can represent that in a traditional
data model, sure.
But you’re not encouraged to.
#4: static models make the
time dimension unwieldy
Entity models are generally silent on
the ways data changes.
Many modern databases can keep
older versions of objects.
But should they? For which entities
How many versions? etc.
Worse, what about when the model
changes at runtime, and you need to
also retain knowledge of what the old
model was?
As in #3, there are ways to model this
in entity models, but it’s not easy, so
most people just don’t think about it.
#5: boxes & lines aren’t
how we actually think
Our spatial processing of diagrams
doesn’t map well to our temporal,
spatial, and causal comprehension of
data structure.
What do people really do?
Skip making models when their
models look too complicated.
F*** THAT NOISE.
Is there an alternative? Not yet.
What could move the needle?
● Prototype based modeling
● Proper scoping
● Semantic zooming
The map is not the territory.
In conclusion …
if you dig this stuff, let’s talk!
@thefutureian

More Related Content

What's hot

Hpai class 4 - text classification w colab - 020520 and in class demo
Hpai   class 4 - text classification w colab - 020520 and in class demoHpai   class 4 - text classification w colab - 020520 and in class demo
Hpai class 4 - text classification w colab - 020520 and in class demomelendez321
 
Hpai class 14 - brain cells and memory - 031620
Hpai   class 14 - brain cells and memory - 031620Hpai   class 14 - brain cells and memory - 031620
Hpai class 14 - brain cells and memory - 031620melendez321
 
Using Social Science Data in ABM: Opportunities and Challenges
Using Social Science Data in ABM: Opportunities and ChallengesUsing Social Science Data in ABM: Opportunities and Challenges
Using Social Science Data in ABM: Opportunities and ChallengesEdmund Chattoe-Brown
 
Augmented 11022020-ieee
Augmented 11022020-ieeeAugmented 11022020-ieee
Augmented 11022020-ieeeRaman Kannan
 
Making sense of messy problems - Systems Thinking for multi-channel UX
Making sense of messy problems - Systems Thinking for multi-channel UXMaking sense of messy problems - Systems Thinking for multi-channel UX
Making sense of messy problems - Systems Thinking for multi-channel UXjohanna kollmann
 
Hpai class 12 - potpourri & perception - 032620 actual
Hpai   class 12 - potpourri & perception - 032620 actualHpai   class 12 - potpourri & perception - 032620 actual
Hpai class 12 - potpourri & perception - 032620 actualmelendez321
 

What's hot (6)

Hpai class 4 - text classification w colab - 020520 and in class demo
Hpai   class 4 - text classification w colab - 020520 and in class demoHpai   class 4 - text classification w colab - 020520 and in class demo
Hpai class 4 - text classification w colab - 020520 and in class demo
 
Hpai class 14 - brain cells and memory - 031620
Hpai   class 14 - brain cells and memory - 031620Hpai   class 14 - brain cells and memory - 031620
Hpai class 14 - brain cells and memory - 031620
 
Using Social Science Data in ABM: Opportunities and Challenges
Using Social Science Data in ABM: Opportunities and ChallengesUsing Social Science Data in ABM: Opportunities and Challenges
Using Social Science Data in ABM: Opportunities and Challenges
 
Augmented 11022020-ieee
Augmented 11022020-ieeeAugmented 11022020-ieee
Augmented 11022020-ieee
 
Making sense of messy problems - Systems Thinking for multi-channel UX
Making sense of messy problems - Systems Thinking for multi-channel UXMaking sense of messy problems - Systems Thinking for multi-channel UX
Making sense of messy problems - Systems Thinking for multi-channel UX
 
Hpai class 12 - potpourri & perception - 032620 actual
Hpai   class 12 - potpourri & perception - 032620 actualHpai   class 12 - potpourri & perception - 032620 actual
Hpai class 12 - potpourri & perception - 032620 actual
 

Similar to I've Always Wanted To Data Model - Data Week 2013

Hybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyHybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyAnthony (Tony) Sarris
 
“The real world”: information in the workplace versus information in college ...
“The real world”: information in the workplace versus information in college ...“The real world”: information in the workplace versus information in college ...
“The real world”: information in the workplace versus information in college ...IL Group (CILIP Information Literacy Group)
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
The Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
The Untold Benefits of Ethical Design - Web Directions Summit 2018, SydneyThe Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
The Untold Benefits of Ethical Design - Web Directions Summit 2018, SydneyHolger Bartel
 
ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011tpgoddard
 
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docxLearning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docxsmile790243
 
Why Software Drives Us Crazy
Why Software Drives Us CrazyWhy Software Drives Us Crazy
Why Software Drives Us CrazyTechWell
 
Don't demo facts. Demo stories! (handouts)
Don't demo facts. Demo stories! (handouts)Don't demo facts. Demo stories! (handouts)
Don't demo facts. Demo stories! (handouts)Tudor Girba
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)Truong Bomi
 
Rebecca parsons agile east
Rebecca parsons   agile eastRebecca parsons   agile east
Rebecca parsons agile eastKmanthei
 
they should clearly communicate what the model component represents. .pdf
they should clearly communicate what the model component represents. .pdfthey should clearly communicate what the model component represents. .pdf
they should clearly communicate what the model component represents. .pdfsrinivas9922
 
From/To: Everything You Wanted to Know About the Future of Your Work But Were...
From/To: Everything You Wanted to Know About the Future of Your Work But Were...From/To: Everything You Wanted to Know About the Future of Your Work But Were...
From/To: Everything You Wanted to Know About the Future of Your Work But Were...Cognizant
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Miningebelani
 
Object Oriented Analysis And Design
Object Oriented Analysis And DesignObject Oriented Analysis And Design
Object Oriented Analysis And DesignSahil Mahajan
 
Flexible Content Requires Future-Ready Organizations
Flexible Content Requires Future-Ready OrganizationsFlexible Content Requires Future-Ready Organizations
Flexible Content Requires Future-Ready OrganizationsSara Wachter-Boettcher
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLBritney Muller
 
Understanding and Conceptualizing interaction - Mary Margarat
Understanding and Conceptualizing interaction  - Mary MargaratUnderstanding and Conceptualizing interaction  - Mary Margarat
Understanding and Conceptualizing interaction - Mary MargaratMary Margarat
 

Similar to I've Always Wanted To Data Model - Data Week 2013 (20)

Ai lecture1 final
Ai lecture1 finalAi lecture1 final
Ai lecture1 final
 
Hybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyHybrid use of machine learning and ontology
Hybrid use of machine learning and ontology
 
“The real world”: information in the workplace versus information in college ...
“The real world”: information in the workplace versus information in college ...“The real world”: information in the workplace versus information in college ...
“The real world”: information in the workplace versus information in college ...
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
The Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
The Untold Benefits of Ethical Design - Web Directions Summit 2018, SydneyThe Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
The Untold Benefits of Ethical Design - Web Directions Summit 2018, Sydney
 
ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011ASLD Presentation 13 October 2011
ASLD Presentation 13 October 2011
 
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docxLearning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
Learning Activity #1Joe is the Vice-President of Hyperlink Syste.docx
 
Why Software Drives Us Crazy
Why Software Drives Us CrazyWhy Software Drives Us Crazy
Why Software Drives Us Crazy
 
Don't demo facts. Demo stories! (handouts)
Don't demo facts. Demo stories! (handouts)Don't demo facts. Demo stories! (handouts)
Don't demo facts. Demo stories! (handouts)
 
The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)The Analytics Stack Guidebook (Holistics)
The Analytics Stack Guidebook (Holistics)
 
Rebecca parsons agile east
Rebecca parsons   agile eastRebecca parsons   agile east
Rebecca parsons agile east
 
they should clearly communicate what the model component represents. .pdf
they should clearly communicate what the model component represents. .pdfthey should clearly communicate what the model component represents. .pdf
they should clearly communicate what the model component represents. .pdf
 
From/To: Everything You Wanted to Know About the Future of Your Work But Were...
From/To: Everything You Wanted to Know About the Future of Your Work But Were...From/To: Everything You Wanted to Know About the Future of Your Work But Were...
From/To: Everything You Wanted to Know About the Future of Your Work But Were...
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
Object Oriented Analysis And Design
Object Oriented Analysis And DesignObject Oriented Analysis And Design
Object Oriented Analysis And Design
 
Flexible Content Requires Future-Ready Organizations
Flexible Content Requires Future-Ready OrganizationsFlexible Content Requires Future-Ready Organizations
Flexible Content Requires Future-Ready Organizations
 
Theseus' data
Theseus' dataTheseus' data
Theseus' data
 
Machine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXLMachine Learning for SEOs - SMXL
Machine Learning for SEOs - SMXL
 
Understanding and Conceptualizing interaction - Mary Margarat
Understanding and Conceptualizing interaction  - Mary MargaratUnderstanding and Conceptualizing interaction  - Mary Margarat
Understanding and Conceptualizing interaction - Mary Margarat
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 

I've Always Wanted To Data Model - Data Week 2013

  • 1. I’ve Always Wanted To Data Model Ian Varley, Salesforce.com Data Week, 2013-10-02 Lightning Talk (10 minutes)
  • 2. Who am I? Ian Varley Austin, TX Salesforce.com Big Data Team @thefutureian
  • 4. The act of taking the intelligible structure of the world around us, and making it concrete enough for computers to act on it. (More specifically, data modeling usually has to do with storing it in a database.)
  • 5. Traditionally, data modeling has meant Entity Attribute Relationship modeling techniques. There are variants that are more “OO” (like UML) but they share most of the same core assumptions.
  • 6. Many a project was sunk due to shitty data modeling.
  • 7. It’s a difficult occupation. You have to be part engineer, part psychologist, and part philosopher.
  • 8. If you’re doing it, you’re not alone. Lots of smart folks think about this stuff. (David Hay, Steve Hoberman, Joe Celko, many more.)
  • 10. The expressive power of our conceptual modeling techniques hasn’t improved much since the 1970s. We mostly look at the world in the same static way we did 40 years ago.
  • 11. Partly, this is because our discipline is wedded to relational (SQL) DBs. When the only tool you have is a hammer ...
  • 12. A book that opened my eyes ... (He said a lot of the stuff I’m about to say back in 1978!)
  • 13. I don’t have a lot of answers. But I want to raise some questions. And hopefully, start a conversation.
  • 14. Here are 5 observations about the tools of traditional data modeling.
  • 15. #1: nobody actually knows what an “entity” really is.
  • 16. “Entity” is another word for Category, in linguistics terms. And an important property of linguistic categories is that they are slippery. See: ● Steven Pinker: The Stuff Of Thought ● Douglas Hofstadter: Surfaces & Essences ● George Lakoff: Women, Fire, and Dangerous Things
  • 17. part: an abstract definition of a connected set of physical materials that serve some purpose, and that people are willing to buy part: one instance of a part type, which arrives on the QA line at a specific time and either does or doesn't meet quality standards
  • 18. And if you think you can “solve” the problem, I’ve got some world trade center insurance policies to sell you.
  • 19. That said, there are a couple tools we could adopt that would help: ● First-class Sub- / Super-Typing ● First-class Scoping and Aliasing (Not that there aren’t ways to do this in ERD models, but they’re unobvious and not widely used.)
  • 20. #2: entities, attributes, and relationships are really the same thing, maaaan ... http://the-hippie-portfolio.tumblr.com/
  • 21. Say I’ve got a “parent” in my model. Is it: ● A “parent” entity? ● A “person” entity with an “isParent” attribute? ● Two “person” entities in a “parent” relationship? It’s all of them; the distinction is arbitrary.
  • 22. The real structure is just a graph … but none of our modeling tools are that flexible, nor is it helpful to think that abstractly about most software.
  • 23. Normally, we make the choice based on our experience and gut feeling, and pretend there’s a science to it.
  • 24. But the whole way of thinking is a convenience based on “records”.
  • 25. I have no idea what to do about this. Tools that allow you to view any part of your model in any of those ways?
  • 26. I have no idea what to do about this. Tools that allow you to view any part of your model in any of those ways?
  • 27. I have no idea what to do about this. Tools that allow you to view any part of your model in any of those ways?
  • 28. This isn’t realistic with today’s tools, so this is just idle speculation.
  • 29. #3: prescriptive models encourage black & white thinking in a gray world
  • 30. You have to make decisions (about entities, attributes, relationships, types) up front. But sometimes that’s not right.
  • 31. This is a strength of (some) NoSQL databases: you can do data first, and surface structure later.
  • 32. Sometimes the deep structure is actually ambiguous.
  • 33.
  • 34. This can apply broadly. (What if an employee isn’t really “in” a department, but has flexible membership based on where she spends her time?)
  • 35. You can represent that in a traditional data model, sure. But you’re not encouraged to.
  • 36. #4: static models make the time dimension unwieldy
  • 37. Entity models are generally silent on the ways data changes.
  • 38. Many modern databases can keep older versions of objects. But should they? For which entities How many versions? etc.
  • 39. Worse, what about when the model changes at runtime, and you need to also retain knowledge of what the old model was?
  • 40. As in #3, there are ways to model this in entity models, but it’s not easy, so most people just don’t think about it.
  • 41. #5: boxes & lines aren’t how we actually think
  • 42. Our spatial processing of diagrams doesn’t map well to our temporal, spatial, and causal comprehension of data structure.
  • 43. What do people really do? Skip making models when their models look too complicated.
  • 44.
  • 46. Is there an alternative? Not yet.
  • 47. What could move the needle? ● Prototype based modeling ● Proper scoping ● Semantic zooming
  • 48. The map is not the territory.
  • 49. In conclusion … if you dig this stuff, let’s talk! @thefutureian