I've Always Wanted To Data Model - Data Week 2013

562 views

Published on

One of the tenets of Big Data is that it allows developers to work with "unstructured" data. But unless you're piping /dev/random, there's no such thing as *truly* unstructured data; only data whose structure you don't understand yet. In this lightning talk, we'll take a tour of the core fundamentals of deep data structure modeling, and see how the rigid tools and techniques of the past have failed us in the modern world of agile software and big data. We'll delve into what hope there is for understanding the semantics and structure of data that doesn't play by the rules of an RDBMS.

Published in: Technology, Business
1 Comment
0 Likes
Statistics
Notes
  • Your straw man ERD is the same deception Kimball pulled in his first DW book. I am not impressed.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
562
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
10
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

I've Always Wanted To Data Model - Data Week 2013

  1. 1. I’ve Always Wanted To Data Model Ian Varley, Salesforce.com Data Week, 2013-10-02 Lightning Talk (10 minutes)
  2. 2. Who am I? Ian Varley Austin, TX Salesforce.com Big Data Team @thefutureian
  3. 3. What’s Data Modeling?
  4. 4. The act of taking the intelligible structure of the world around us, and making it concrete enough for computers to act on it. (More specifically, data modeling usually has to do with storing it in a database.)
  5. 5. Traditionally, data modeling has meant Entity Attribute Relationship modeling techniques. There are variants that are more “OO” (like UML) but they share most of the same core assumptions.
  6. 6. Many a project was sunk due to shitty data modeling.
  7. 7. It’s a difficult occupation. You have to be part engineer, part psychologist, and part philosopher.
  8. 8. If you’re doing it, you’re not alone. Lots of smart folks think about this stuff. (David Hay, Steve Hoberman, Joe Celko, many more.)
  9. 9. But.
  10. 10. The expressive power of our conceptual modeling techniques hasn’t improved much since the 1970s. We mostly look at the world in the same static way we did 40 years ago.
  11. 11. Partly, this is because our discipline is wedded to relational (SQL) DBs. When the only tool you have is a hammer ...
  12. 12. A book that opened my eyes ... (He said a lot of the stuff I’m about to say back in 1978!)
  13. 13. I don’t have a lot of answers. But I want to raise some questions. And hopefully, start a conversation.
  14. 14. Here are 5 observations about the tools of traditional data modeling.
  15. 15. #1: nobody actually knows what an “entity” really is.
  16. 16. “Entity” is another word for Category, in linguistics terms. And an important property of linguistic categories is that they are slippery. See: ● Steven Pinker: The Stuff Of Thought ● Douglas Hofstadter: Surfaces & Essences ● George Lakoff: Women, Fire, and Dangerous Things
  17. 17. part: an abstract definition of a connected set of physical materials that serve some purpose, and that people are willing to buy part: one instance of a part type, which arrives on the QA line at a specific time and either does or doesn't meet quality standards
  18. 18. And if you think you can “solve” the problem, I’ve got some world trade center insurance policies to sell you.
  19. 19. That said, there are a couple tools we could adopt that would help: ● First-class Sub- / Super-Typing ● First-class Scoping and Aliasing (Not that there aren’t ways to do this in ERD models, but they’re unobvious and not widely used.)
  20. 20. #2: entities, attributes, and relationships are really the same thing, maaaan ... http://the-hippie-portfolio.tumblr.com/
  21. 21. Say I’ve got a “parent” in my model. Is it: ● A “parent” entity? ● A “person” entity with an “isParent” attribute? ● Two “person” entities in a “parent” relationship? It’s all of them; the distinction is arbitrary.
  22. 22. The real structure is just a graph … but none of our modeling tools are that flexible, nor is it helpful to think that abstractly about most software.
  23. 23. Normally, we make the choice based on our experience and gut feeling, and pretend there’s a science to it.
  24. 24. But the whole way of thinking is a convenience based on “records”.
  25. 25. I have no idea what to do about this. Tools that allow you to view any part of your model in any of those ways?
  26. 26. I have no idea what to do about this. Tools that allow you to view any part of your model in any of those ways?
  27. 27. I have no idea what to do about this. Tools that allow you to view any part of your model in any of those ways?
  28. 28. This isn’t realistic with today’s tools, so this is just idle speculation.
  29. 29. #3: prescriptive models encourage black & white thinking in a gray world
  30. 30. You have to make decisions (about entities, attributes, relationships, types) up front. But sometimes that’s not right.
  31. 31. This is a strength of (some) NoSQL databases: you can do data first, and surface structure later.
  32. 32. Sometimes the deep structure is actually ambiguous.
  33. 33. This can apply broadly. (What if an employee isn’t really “in” a department, but has flexible membership based on where she spends her time?)
  34. 34. You can represent that in a traditional data model, sure. But you’re not encouraged to.
  35. 35. #4: static models make the time dimension unwieldy
  36. 36. Entity models are generally silent on the ways data changes.
  37. 37. Many modern databases can keep older versions of objects. But should they? For which entities How many versions? etc.
  38. 38. Worse, what about when the model changes at runtime, and you need to also retain knowledge of what the old model was?
  39. 39. As in #3, there are ways to model this in entity models, but it’s not easy, so most people just don’t think about it.
  40. 40. #5: boxes & lines aren’t how we actually think
  41. 41. Our spatial processing of diagrams doesn’t map well to our temporal, spatial, and causal comprehension of data structure.
  42. 42. What do people really do? Skip making models when their models look too complicated.
  43. 43. F*** THAT NOISE.
  44. 44. Is there an alternative? Not yet.
  45. 45. What could move the needle? ● Prototype based modeling ● Proper scoping ● Semantic zooming
  46. 46. The map is not the territory.
  47. 47. In conclusion … if you dig this stuff, let’s talk! @thefutureian

×