Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

1,838 views

Published on

Data modeling is one of the most important steps ensuring performance and scalability of Cassandra-powered applications. The existing Chebotko data modeling methodology lays out important data modeling principles, rules and patterns to design a conceptual, logical and physical data models. While this approach enables rigorous and sound schema design, it requires specialized training and experience. To dramatically reduce time, simplify and streamline the Cassandra database design process, we develop an online tool that automates the most complex, error-prone, and time-consuming data modeling tasks: conceptual-to-logical mapping, logical-to-physical mapping, and CQL generation.

In this talk, using real life examples from the IoT domain, we demonstrate how to design correct and efficient database schemas for Cassandra. First, we use our tool, called KDM, to design a conceptual data model and specify application access patterns. Second, we demonstrate how KDM generates a logical data model that is visualized using Chebotko diagram notation. Third, we explain how to configure a logical data model and automatically generate a physical data model. Fourth, we showcase how KDM generates a CQL script for instantiating a physical data model in Cassandra. Finally, we discuss best practices for Cassandra data modeling with KDM.

The KDM tool is available for free at kdm.dataview.org and is used by many in industry and academia.

Andrey Kashlev - Wayne State University
Andrey Kashlev is a PhD candidate in big data, working in the Department of Computer Science at Wayne State University. His research focuses on big data, including data modeling for NoSQL, big data workflows, and provenance management. He has published numerous research articles in peer-reviewed international journals and conferences, including IEEE Transactions on Services Computing, Data and Knowledge Engineering, International Journal of Computers and Their Applications, and the IEEE International Congress on Big Data.

Published in: Technology
  • Be the first to comment

Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

  1. 1. World’s Best Data Modeling Tool for Apache Cassandra 1© 2015. All Rights Reserved. Artem ChebotkoAndrey Kashlev
  2. 2. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 2© 2015. All Rights Reserved.
  3. 3. Data Modeling Process • Data requirements • Application requirements • Schema Design • Optimization 3© 2015. All Rights Reserved.
  4. 4. Cassandra Data Modeling Methodology © 2015. All Rights Reserved. 4 Conceptual Data Model Application Workflow Logical Data Model Physical Data Model Mapping Optimization
  5. 5. Methodology Models © 2015. All Rights Reserved. 5 Model Representation Conceptual Data Model ERD Application Workflow Model Graph Logical Data Model Chebotko Diagram Physical Data Model Chebotko Diagram, CQL
  6. 6. Methodology Protocols © 2015. All Rights Reserved. 6 • Conceptual-to-logical mapping – Mapping rules – Mapping patterns • Physical optimizations – Partition size analysis – Duplication factor analysis – Keys, aggregation, transactions, …
  7. 7. Example © 2015. All Rights Reserved. 7 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords
  8. 8. sensor_data location K parameter K timestamp C↓ id C↑ value 1 Example © 2015. All Rights Reserved. 8 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Entity and Relationship Types
  9. 9. sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value 1 2 Example © 2015. All Rights Reserved. 9 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Equality Search Atributes
  10. 10. sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↑ id C↑ value 1 2 3 Example © 2015. All Rights Reserved. 10 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Inequality Search Attributes
  11. 11. sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↑ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value 1 2 3 4 Example © 2015. All Rights Reserved. 11 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Ordering Attributes
  12. 12. sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↑ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value 1 2 3 4 5 Example © 2015. All Rights Reserved. 12 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Key Attributes
  13. 13. Methodology Pros and Cons Correctness Completeness © 2015. All Rights Reserved. 13 Complexity Time investment
  14. 14. Human Errors Happen … © 2015. All Rights Reserved. 14
  15. 15. Automation © 2015. All Rights Reserved. 15 Complexity Time investment Human Error
  16. 16. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 16© 2015. All Rights Reserved.
  17. 17. The KDM Tool • Streamlines the methodology • Guides the user • Automates data modeling tasks: – Conceptual-to-logical mapping – Physical optimization – CQL generation 17© 2015. All Rights Reserved.
  18. 18. KDM Automation Workflow 18© 2015. All Rights Reserved.
  19. 19. KDM Automation Workflow 19© 2015. All Rights Reserved. Design Conceptual Data Model Step1 Solution architect
  20. 20. KDM Automation Workflow 20© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Solution architect Step1 Step2 Solution architect
  21. 21. KDM Automation Workflow 21© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models KDM Solution architect Step1 Step2 Automated Solution architect
  22. 22. KDM Automation Workflow 22© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model KDM Solution architect Step1 Step2 Step3Automated Solution architect Solution architect
  23. 23. KDM Automation Workflow 23© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model Generate Physical Data Model KDM Solution architect Step1 Step2 Step3Automated Automated Solution architect Solution architect KDM
  24. 24. KDM Automation Workflow 24© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model Generate Physical Data Model Configure Physical Data Model KDM Solution architect Step1 Step2 Step3 Step4Automated Automated Solution architect Solution architect Solution architect KDM
  25. 25. KDM Automation Workflow 25© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model Generate Physical Data Model Configure Physical Data Model Generate Physical Schema KDM Solution architect Step1 Step2 Step3 Step4Automated Automated Automated Solution architect Solution architect Solution architect KDM KDM
  26. 26. KDM Automation Workflow 26© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model Generate Physical Data Model Configure Physical Data Model Generate Physical Schema Download CQL Script KDM Solution architect Step1 Step2 Step3 Step4 Step5Automated Automated Automated Solution architect Solution architect Solution architect Solution architect KDM KDM
  27. 27. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 27© 2015. All Rights Reserved.
  28. 28. 28
  29. 29. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 29© 2015. All Rights Reserved.
  30. 30. © 2015. All Rights Reserved. 30
  31. 31. 31© 2015. All Rights Reserved. • KDM: – automates most complex tasks – eliminates human error – simplifies data modeling – guides – is a general purpose tool Summary
  32. 32. 32© 2015. All Rights Reserved. • build new data models • verify existing data models • teach/learn data modeling How Can KDM Help You?
  33. 33. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 33© 2015. All Rights Reserved.
  34. 34. Future Work • Materialized views © 2015. All Rights Reserved. 34
  35. 35. Future Work • Materialized views • User Defined Types © 2015. All Rights Reserved. 35
  36. 36. Future Work • Materialized views • User Defined Types • Analysis and physical optimization © 2015. All Rights Reserved. 36
  37. 37. Future Work • Materialized views • User Defined Types • Analysis and physical optimization • Support for application workflow design © 2015. All Rights Reserved. 37
  38. 38. Future Work • Materialized views • User Defined Types • Analysis and physical optimization • Support for application workflow design • Support for Chebotko Diagrams © 2015. All Rights Reserved. 38
  39. 39. Sign up for KDM – it’s FREE! • KDM: kdm.dataview.org • Methodology: academy.datastax.com • Planet Cassandra blog posts: – KDM: An Automated Data Modeling Tool for Apache Cassandra, Pt. 1, Pt. 2 • Artem Chebotko, Andrey Kashlev, Shiyong Lu, “A Big Data Modeling Methodology for Apache Cassandra”, IEEE International Congress on Big Data, 2015. © 2015. All Rights Reserved. 39
  40. 40. Acknowledgements • Andrey Kashlev would like to thank: – Dr. Shiyong Lu – Anthony Piazza • Artem Chebotko would like to thank: – Anthony Piazza – Patrick McFadin – Jonathan Ellis – Tim Berglund © 2015. All Rights Reserved. 40
  41. 41. Thank you

×