Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to Survive as a Data Architect in a Polyglot Database World

Karen Lopez talks to data architects and data moders how they can best deliver value on modern data drive projects beyond relational database technologies. She covers NoSQL Databases and Datastores, which data stories they best fit and which ones they don't. She ends with 10 tips for adding more value to ployschematic database solutions.

  • Login to see the comments

How to Survive as a Data Architect in a Polyglot Database World

  1. 1. Surviving as a Data Architect in a Polyglot Database World Karen Lopez @datachick
  2. 2.
  3. 3. Karen López Karen has 20+ years of data and information architecture experience on large, multi-project programs. She is a frequent speaker on data modeling, data-driven methodologies and pattern data models. She wants you to love your data.
  4. 4. POLL: Audience Who are you?
  5. 5. Outcomes: Leave here understanding: Why multi- model designs are important Database / datastore types Multi-model thinking How to future-proof your data arch career How to learn more
  6. 6. A Good Data Architect Cost, Benefit, Risk Data Protection Business Requirements & models Design Models Fit to needs
  7. 7. Polyglot Polyschematic
  8. 8. What do you mean, SURVIVE? • Continue to be involved in data- related architectural decisions • Data models wanted and appreciated • Data model driven development still a thing • Valued for bringing business needs and database opportunities together • Team player • Knowledgeable of database features
  9. 9. What’s new in the database world? Hybrid Hybrid Hybrid
  10. 10. Hybrid Purely relational (SQL) databases don’t exist any longer • Columnstore features • XML • JSON • Other NoSQL features 01 Applications make use of multiple database and datastore technologies 02 Schemas in a variety of places 03
  11. 11. Typical Characteristics of Traditional Apps Real-time request/response Synchronous Always connected Transactional Scale-Up Failover
  12. 12. Modern Cloud Apps Characteristics Asynchronous / Queued Eventual Consistency Microservices Command & Query Pattern Polyglot Persistence
  13. 13. Do you support/model/do stuff In relational technologies? In pre-relational tech? In post-relational (NoSQL) tech? How’s that going for ya?
  14. 14. A very fast review of NoSQL & SQL I have other recorded webinars at Dataversity that go into more detail
  15. 15. Basically Available Soft State Eventually Consistent BASE ACID Atomic Consistent Isolated Durable BASE - ACID
  16. 16. This used to be an Either/Or decision BASE ACID
  17. 17. Polyglot persistence •Optimized for data •Optimized for workload Not all new •EAV •XML •Architecture paradigm: OLAP/DW and OLTP But now….
  18. 18. NoSQL, Not Only SQL, SQL* Relational Key Value Columnar Column Family Document Hadoop Graph Plus all the hybrid versions of these…
  19. 19. Relational
  20. 20. Partition Key Row Key Value Marketing 00001 Name Age Bob 35 Marketing 00002 Name Age Karen 42 Marketing Department Name Count MKTG 104 Sales 00010 Name Age Sam 29 • Simple Map • Semi-Structured • De-normalized Key-Value Databases
  21. 21. Cassandra Redis Oracle NoSQL CosmosDB Key-Value
  22. 22. • No schema, no relationships • Collections of documents • Documents are JSON objects Document databases { "title": "Polo Long Sleeve Shirt", "color": "red", "material": "cotton" } { "title": "NoSQL Distilled, "author": [ "Martin Fowler", “P. Sadalage" ], "isbn": "978-0321826626", "pages": 192 }
  23. 23. Document CosmosDB MongoDB BSON & JSON Databases, Documents, Collections
  24. 24. • Map-based • Relationships of Edges and Nodes Graph databases
  25. 25. Graph Neo4j DataStax Enterprise Graph SQL Server CosmosDB
  26. 26. Wait…What? Graph Database and Processing inside relational database engine
  27. 27. Column Family
  28. 28. Columnar HP Vertica SAP IQ
  29. 29. This means… • SQL vs. NoSQL isn’t a thing any longer • Other RDBMS vendors will be adding non- relational functions and structures • Data Architects who specialize in relational-only modeling and design will be overly specialized • We don’t have a choice to ignore these new database models
  30. 30. Most relational databases are adding NoSQL support Columnstore Graph XML JSON You’ve probably even developed your own as a table
  31. 31. Hybrid is the future, though • Column Family + Key Value • Relational + Graph + Columnstore • Relational + Columnstore • Graph + Column Family + Key Value + Document …all in the same engine. This is a major change in how things worked a decade ago.
  32. 32. Did you notice? • Database as a Service from Microsoft • Supports (for now) • Graph • Key Value • Column Family • Document • Globally distributed data • Scalable • Tunable Consistency CosmosDB
  33. 33. But what does this all mean? Data Modeling process impacts Data Modeler impacts
  34. 34. Quotes from team NoSQL • No need for a data model • Can’t be mapped to a data model • SQL is not flexible • SQL Databases don’t scale • SQL Databases die after about 1 GB size • SQL Databases use old technology
  35. 35. Quotes from team SQL (relational) • Eventual Consistency? WTH? • What do you mean data quality just doesn’t matter? • My tools won’t work with these databases • It’s all a fad; There will be a new fad tomorrow • Bbbbbut…but..but DATA QUALITY!!!!! INTEGRITY!!!
  36. 36. What is the role of existing Data Models?
  37. 37. Data Models – Traditional Process Conceptual (Data) Model Logical Data Model Physical Data Model(s) OLTP OLTP OLTP OLTP OLTP MARTMART OLTP OLTP OLTP
  38. 38. What about the Modeling Tasks & methods Schema Schemaless Schema on read Polyschematic
  39. 39. Schemas • In the database as physical structure • In a separate location such as • DSD • Schema layer on top of the physical structure • In the application code • In the data itself
  40. 40. New Process Apply Apply logical data model understandings to physical structure • Using data modeling and/or tools • Using a whiteboard • Using code Understand Understand data Understand Understand underlying architecture
  41. 41. Traditional Data Modeler Involvement Project Initiation Architecture and Infrastructure Design SW Requirements Development Deployment
  42. 42. Modern Data Modeler Involvement Project Initiation Architecture and Infrastructure Design SW Requirements Development Deployment
  43. 43. What do we really mean by scale? Rapidly add more compute & data power Massively parallel processing Cheap, commodity hardware, but lots of it Optimized for Query/Reads/Questions/Telling stories
  44. 44. Most NoSQL • Was developed with scale in mind • Has tunable consistency • Scale out, versus scale up • Uses distributed data (multiple copies), globally • Understanding workloads and workload trends is important This metadata isn’t often collected and modeled by data architect
  45. 45. Where Data Models Can Help • reverse engineer, where possible. • one entity per tab spreadsheet • some Normalized (master data + combinations) Create a Physical Data Model of Key Data Sources • definitions • gotchas • expected domains • metadata Model the metadata • Describe data consistently • Don’t prescribe structure • Naming standards • Datatype-ish Think Differently
  46. 46. Think Differently • Data Modeling tools have to catch up • New tools are being developed • Data Models can’t be prescriptions all the time • Naming standards over constraints • Data Types are general, higher level • Tunable data quality
  47. 47. How are you transitioning to “different”?
  48. 48. Entity-Relationship Modeling & IDEF1X Is it enough? Should we extend it? Should we create new notations for each type of Database? Scrap and start all of it again?
  49. 49. What about all those very sexy data modelers?
  50. 50. “Every design decision should include cost, benefit and risk” - Karen Lopez
  51. 51. Great Enterprise Data Modelers Characteristics Patient Zen balance / non-attachment Get-er done Collaboration not sentries/cops Broad Business Domain Broad, strategic view Deep, project empathy Respected Good project experience Good negotiators 52
  52. 52. So let’s summarize: 1. The more SQL-like features available for NoSQL databases, the more likely a data modeling tool is to support it. 2. Modeling tool vendors will support features that users ask for cause them to win deals. This is not a bad thing. 3. Serious NoSQL vendors* understand that hybrid is the enterprise data story. They want us to find a way. 4. Our data models have value, even if the NoSQL solution doesn’t require a lot of constraints.
  53. 53. 10 Tips for Data Modelers 1. Learn about these methods – don’t avoid them 2. Get hands-on training. Get certified even 3. Learn the lingo 4. Use the lingo 5. Be able to describe data modeling and data governance to the context of these database technologies and their use cases
  54. 54. 10 Tips for Data Modelers 6. Bring data models (and other models) to the team 7. Be ahead of the curve on new NoSQL and SQL features in your DBMSs 8. Understand the use cases for each type of database technology 9. Let go, a little, when it makes sense 10. Enjoy the new database smell!
  55. 55. What to do Learn Get Hands-on experience with these new data technologies Talk to your tool vendors Bring models you have
  56. 56. Resources
  57. 57. Making Sense of NoSQL clearly and concisely explains the concepts, features, benefits, potential, and limitations of NoSQL technologies. Using examples and use cases, illustrations, and plain, jargon-free writing, this guide shows how you can effectively assemble a NoSQL solution to replace or augment the traditional RDBMS you have now.
  58. 58. And it’s FREE!
  59. 59. This book is written for anyone who is working with, or will be working with MongoDB, including business analysts, data modelers, database administrators, developers, project managers, and data scientists.
  60. 60. PostgreSQL Riak Hbase MongoDB Neo4J CouchDB Redis
  61. 61. Questions?
  62. 62. • #TEAMDATA Thank you, you were great.