Successfully reported this slideshow.
Your SlideShare is downloading. ×

The Science of database CICD - UKOUG Breakthrough

Loading in …3

Check these out next

1 of 77 Ad

More Related Content


The Science of database CICD - UKOUG Breakthrough

  1. 1. The Science of Database CI/CD What do we know about it? Jasmin Fluri
  2. 2. Agenda
  3. 3. About me Jasmin Fluri Schaltstelle GmbH Database Development Development Process Automation and Tooling Continuous Integration Continuous Delivery Software Engineering @jasminfluri @jasminfluri jasminfluri Pipelining
  4. 4. What am I going to talk about? 4
  5. 5. Research Project – Introducing CI/CD into database development projects! 5 What we know about CI/CD and why it is harder for database development!
  6. 6. At the end of this talk you will know: - why you should adopt CI/CD in DB development - how you start - and how your team will profit from it! 6
  7. 7. What do we know about releasing software? 7
  8. 8. DevOps Goal: We want to ship changes continuously to get fast and continuous feedback! 8
  9. 9. High performing teams deploy changes at least once a day! Source: DevOps Report DORA 2021 9
  10. 10. High performing teams have a low change failure rate! 10 Source: DevOps Report DORA 2021
  11. 11. Why do they recover so much better from incidents? 11
  12. 12. Small changes = small risk = small possible damage = small feedback loops 12 Source: NYTimes – The best path to long term success is slow, simple and boring!
  13. 13. What positive effect do small changes have? 13 Small changes have small impact! It’s easy to see all the elements that are affected! Small changes only introduce a small risk of change! Small changes can more easily be reverted or fixed when they are faulty.
  14. 14. … so deploying small changes, often is a good practice to do! 14
  15. 15. … but it’s not common in database development! 15
  16. 16. What is CI/CD? 16
  17. 17. Software Delivery Lifecycle Plan Develop Build Test Release Deploy Feedback Monitor Continuous Delivery Continuous Integration Development Operations & Customer Involvement Software Delivery Lifecycle Developing Small Increments Continuous (automated) Integration and Testing Continuous Releasing and automated Installation Monitoring and Feedback
  18. 18. Continuous Delivery != Continuous Deployment 18
  19. 19. CI/CD != DevOps CI/CD != Agile 19
  20. 20. Agile • Developing small increments • Always having a runnable version of the product 20 DevOps CI/CD • Interconnecting tasks and processes • Whole app lifecycle managed by a single team • Service-Infrastructure • Automation • Hands-off integration and deployment • Fast feedback • Automated quality assurance
  21. 21. How does CI/CD for Applications look like? 21
  22. 22. Developer Version Control System Continuous Integration Server Continuous Integration Pipeline Pushing changes into Version Control Trigger pipeline on commit or on merge request execute 1 - Checkout source code 2 - Build and test of application 3 – Run system tests, static code analysis and metrics 4 - Generate reports and notifications 5 - notify Continuous Integration File-based Development Packages can replace old versions No state No order of changes!
  23. 23. What is special about database CI/CD? 23
  24. 24. Continuous Integration and Delivery for DB is more than using a Database Migration Tool! 24
  25. 25. Developer Version Control System Continuous Integration Server Continuous Integration Pipeline Pushing changes into Version Control Trigger pipeline on commit or on merge request execute 1 - Checkout source code 2 - Build and test of database 3 – Run system tests, static code analysis and metrics 4 - Generate reports and notifications 5 - notify Continuous Integration File-based Development and Generated Code Changes have an order! Builds are deployments! We depend upon state!
  26. 26. What do we know are preconditions for good database CI/CD? 26
  27. 27. Before you start building a database CI/CD pipeline you need… Static Code Analysis Automated Tests Version Control of Everything Database Migration Tool Decoupled Codebase 27
  28. 28. Version Control 28
  29. 29. … without Version Control there’s no single source of truth! 29
  30. 30. You need to store EVERYTHING in version control! 30 The scary 36.6%
  31. 31. There’s no best practice or standard of how to version SQL code! – But there are some ways that work… 31
  32. 32. Pure File based development isn’t common in database development Problem: you often not only want your migration scripts in VCS but also the DDLs! DDLs allow you to build a system from scratch fast! DMLs allow you to migrate to the next version! 32
  33. 33. Database Source Code in Version Control myproject/ ├── … ├── docs/ │ ├── ├── db/ │ ├── 0_sequences/ │ ├── 1_tables/ │ ├── 2_mviews/ │ ├── 3_views/ │ ├── 4_packages/ │ └── 5_utils/ ├── migrations/ │ ├── scripts/ └── db-tests/ ├── packages/ └── data/ DDL code Test code and test data Sample Project Structure • DDLs contain the definition of our database objects • Migration scripts allow us to migrate to the next version! • Test code tests our database logic and behaviour. Migration Scripts
  34. 34. The majority of people uses a mix of generating and writing migration scripts! 34
  35. 35. To be able to do file-based development for database migration scripts we need repeatability! 35
  36. 36. What happens if your database scripts are not repeatable? Non Repeatable Migration Script If it fails, we need a new one with the remaining changes (and corrections) 36 Repeatable Migration Script If it fails, we alter it, and run it again – it will continue where it failed.
  37. 37. Passive version control (generating code from the database) introduces the risk, that we miss exporting things we changed. 37
  38. 38. Architecture of your VCS the repository structure and git workflow
  39. 39. The Problem with Feature Branches Tests on branches are pointless, if they miss changes made by others!
  40. 40. You cannot test your changes of database applications properly, when you are using feature branches! … same thing applies to infrastructure code!
  41. 41. Trunk-based development and Continuous Integration Integration Pipeline Delivery Pipeline triggers
  42. 42. if (system = database){ build = deployment; }
  43. 43. Merging Application Development with Database Development Working Trunk-based Submodules – Mixing Workflows 43 APP Repository Gitflow DB Repository Trunk Based DB Repository Trunk Based Submodule APP & DB Repository Trunk Based
  44. 44. How do you review code when you do trunk-based development? 44
  45. 45. Daily struggle to wait for code reviews of your feature branches… 45
  46. 46. «But we don’t have the time to do pair programming» 46
  47. 47. Static Code Analysis 47
  48. 48. To be able to do code reviews, you need standards! Those standards should be enforced automatically! 48
  49. 49. Database Migration Tool 49
  50. 50. Datenbank Schema Evolution Database Version 1 Database Version 2 Initial DDL DDL & DML Database Version 3 DDL & DML …
  51. 51. Application Repository Database V00__baseline.sql V01__revision01.sql V02__revision02.sql V03__revision03.sql V04__revision04.sql V05__revision05.sql V06__revision06.sql V07__revision07.sql V08__revision08.sql V09__revision09.sql Migrations Utilities R_synonyms.sql R_grants.sql R_checks.sql Database Schema Evolution - Migration Based Approach
  52. 52. Database Migration Tools 52 The scary 14.7%
  53. 53. Changeset formats – just use SQL! 53 The scary 43.7%
  54. 54. Automated Database Tests 54
  55. 55. Our research showed that projects without automated testing spend around 25% of their teams capacity for manual testing. 55
  56. 56. 56
  57. 57. Unit Testing in your Application - API Tests - Integration tests - Unit Tests of the application backend Testing Tools Unit Testing in your Database - Unit Tests - Integration tests
  58. 58. Decoupled Codebase 58
  59. 59. What happens when release cycles are coupled to other teams (or components) ? 59 Team A Team B Team C Release 1 Release 2 Release 3 Release 4 This is not Continuous Delivery!
  60. 60. Database Access – Access Layer 60
  61. 61. Why is a versioned API important? 61
  62. 62. Application A Application B Application C Service Layer (Database) Infrastructure provide Services consume Services Databases are Services! Data Services! Because… when the database changes… … applications need to change as well if we don’t have a versioned access!
  63. 63. Coupling of release activities must be eliminated! 63 Application A Service XY Infrastructure provide Services consume Services Version 1 Version 2 Version 3 Team A Version 4 Team B Teams want to work independently and continuously!
  64. 64. 64
  65. 65. Application A Application B Application C Service XY (Database) Infrastructure provide Services consume Services Service Delivery Organisation Version 1 Version 2 Version 3 This is even more important, when we are shipping small changes continuously!
  66. 66. Now we have all preconditions to build a CI Pipeline!
  67. 67. Continuous Integration & Release Pipeline 67 Development Environment Version Control Pipelines Infrastructure New Functionality develops pushes CI Server CI Database triggers starts Static Code Analysis Check out Code Deploy SQL Code Unit Tests System Tests Set Release Number End of Release Backu p Developer Artefact Repository Push Artefact Deployment Artefact New Commit on Main Branch Release Pipeline Static Code Analysis Tool Integration Pipeline End of Integration
  68. 68. Environment - Setup 68 Version Control Server Continuous Integration Server Development Environment Test Environment Production Environment Developer Localhost DDL Export Scripts / Functionality Artefact Repository Database Schema Migration Tool CI Environment Static Code Analysis Tool Production Pipeline Test Pipeline Integration Pipeline Push Pull Export Push Migration Scripts and DDLs Trigger Clone Trigger Execute
  69. 69. But why do we need CI/CD? 69
  70. 70. Not automating integration and deployments is a change preventer! 70
  71. 71. What are results of our research? 71
  72. 72. Introducing CI/CD into a database development project… … increased the amount of deployments by over 5x! More deployments means more testing! More deployments also means less risk! 72
  73. 73. Introducing CI/CD into a database development project… … decreased the amount of failed deployments by over 75%! If it hurts, do it more often! Automation introduces a repeatable process, without human interventions! 73
  74. 74. Introducing CI/CD into a database development project… … decreased cognitive load of the developers!  Focus now lies on the development of functionality, not in manually deploying changes and troubleshooting! 74
  75. 75. Introducing CI/CD into a database development project… … did not change the lead time for changes! All participating projects used fixed release windows for deploying changes into production! Fixed release windows = Fixed lead time! 75
  76. 76. @jasminfluri @jasminfluri jasminfluri Thank you for your time! What questions do you have?
  77. 77. Ressources / References Accelerate Book : DORA 2022 : devops-report-now-out DORA 2021 : The best path to long term success is slow : best-path-to-long-term-change-is-slow-simple-and-boring.html Martin Fowler : Continuous Integration 77

Editor's Notes

  • What are we going to see today?
    First we will have a look at what are preconditions of database schema evolutions or database migrations?
    Second we will explore how database deployments or schema migrations are conducted
    And third we will have time for questions and then proceed with a hands-on lab.
  • Now If you have a version control system you will need to think about a directory structure of your repository.
    The main thing when it comes to structuring is storing production code separate from test code and test data.

    There are a couple of things to consider:
    Store the ddl of all of your objects per object type for easier navigation
    Store migration scripts separately
    Store tests and test data separate from production code
    Use the same naming for test packages and production packages to simplify navigation
    Use one test package per productive package that is tested
    Test data and tests should be stored separate from each other to ensure reusability of the test data in different test contexts

    Now that we have our directory structure we can have a look at our source code workflow.
  • Let’s have a look at the first precondition. The repository structure of our version control repository like git, and the workflow that we need to set up in our development process.
  • So The problem with long-running feature branches is that you can't test the commits in the right order. In both database and infrastructure development, it often depends on which patch or change is applied first before another is applied. If you have long running feature branches, you don't know what changes have been made to your application in the meantime. Therefore, testing is not very efficient or safe.
  • Well, Continuous Integration works best for databases and for infrastructure code when version control is trunk-based.
    Because many changes to database applications cannot be tested properly if feature branches are used! The same is true for infrastructure code.
    So if you are doing either infrastructure provisioning or database engineering, you should develop trunk-based.

    Let’s have a look why!
  • If we look at how trunk-based development and continuous integration work, the continuous integration pipeline is triggered with each commit on the mainline.
    CI is therefore executed when a new commit is made and starts:
    Migration scripts.
    Post-migration scripts such as recompiling, restoring synonyms, creating permissions.
    Tests are executed
    Metrics are calculated
    Static code analysis is performed

    The delivery process can then be started manually and only performs migrations and post-migration scripts.
  • Now there are two different approaches to implementing trunk-based development.
    Either you start completely trunk-based as shown here on the left, so that the whole application including the database is developed trunk-based. This requires a big amount of pair programming and putting changes of small size into production at a time.

    Or you work in Git with submodules, and split the application and the database into two repositories.
    The database repository is then included as a submodule in the application repository so that the CI/CD pipeline has access to both. The application part can be then developed for example with a gitflow workflow and the database trunk based. Database changes would then be deployed, when a new release of the application is deployed. This is easier to implement when a versioned API above the database exists that decouples the two parts.
  • If I tell people that they should to pair programming, I often hear that they don’t have time for pair programming, because they have lot’s of stuff to do.
    The thinking is that they are faster if they don’t do pair or mob programming.
    But the nature of things is, that you usually have dependencies in development. Those can be knowledge gaps that you have, where you need support from a colleague, or access dependencies.

    As an example the lead time is shown of the yellow task that needs both developers. It is a lot longer, than if those two would just sit together and do it in pair programming. With the side effect that if they would do pair programming, they would also share knowledge about the feature at the same time. So they would be faster and also smarter afterwards.
  • A database migration tool helps us migrate the database schema from one state to another.
    This means we start with an initial DDL script that builds database version 1.

    From here on we will always have migrations that transform our database into the next version. And our database migration tool tracks those migrations, checks if a migration script was already run against a database and if not, installs the migration.
    This way we can apply changes to our database like adding tables, adding foreign keys, removing data or refactoring the database with new migration scripts.
  • The tool that we are going to see today is flyway that follows a migration based approach.
    Database migration files are stored as a series of files with continuous enumeration, telling the tool in which order the scripts need to be run, to succeed.
    Usually there are migraiton scripts that run only once against a database instance and we have so called repeatable scripts, that run after every migration. Those scripts can set for example synonyms that might get lost during migrations or grants or create database views and procedures that don’t include persistence or state. Also recompilation script are quite common repeatable scripts, that check if all the database objects are still in a valid state and can be executed.
  • The third precondition is that we need to test our database.
  • If you have a java Application you can use jUnit and the jUnit extension DBUnit to write database integration tests or also database unit tests. The problem here is, that you need a running database instance to run the tests against, so they are more integration tests than they are unit tests, because you also test the interaction of the two systems with every test and cannot run them without having the database present.
  • We have primarily two kinds of teams, teams that provide services to others and teams that consume services from other teams.
    For application teams or software delivery teams it is especially important to have a service layer abstracting the infrastructure, to make them independent and allowing them to build and run their application independently of any other team.
    That’s the main goal of DevOps – You build it, you run it. So everything a team builds, they can also independently run in production. There’s no hand-off to an operations team that takes over production, the application teams does everything on its own.

    So, Especially above the infrastructure layer, a service layer is essential so that application teams can use them. These interfaces should be made available in different versions whenever possible.
  • Versioning looks something like this.

    Our service XY provides several versions of its API.
    This allows the consuming applications to change the version individually if they either want the benefit of new functionalities or if they want to invest time in the lifecycle of the application and switch to the latest service version.

    A very Important aspect is: Applications should be allowed to decide for themselves when they switch to newer versions of an API. Forcing applications to do this synchronously undermines the prioritization within an application team of what is important to them. It may be enough for a team to use functionalities from version 1, because they don’t require new functionalities of newer verisons.