Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pass 2018 Testing in the enterprise data architecture


Published on

Testing options for an Enterprise Data Architecture, including tSQLt, ApprovalTests, and QueryApprovals

Published in: Software
  • Be the first to comment

Pass 2018 Testing in the enterprise data architecture

  1. 1. Testing Challenges and How To Overcome Them Ike Ellis, MVP, Crafting Bytes Testing in an Enterprise Data Architecture
  2. 2. Please silence cell phones
  3. 3. Free online webinar events Free 1-day local training events Local user groups around the world Online special interest user groups Business analytics training Get involved Explore everything PASS has to offer Free Online Resources Newsletters
  4. 4. Download the GuideBook App and search: PASS Summit 2018 Follow the QR code link displayed on session signage throughout the conference venue and in the program guide Session evaluations Your feedback is important and valuable. Go to 3 Ways to Access: Submit by 5pm Friday, November 16th to win prizes.
  5. 5. Ike Ellis MVP & Partner, Crafting Bytes Author of Developing Azure Solutions MVP for 8 years Speaker at PASS Summit, SQL Bits, Redgate SQL in the City, and many other events @ike_ellis Teach courses on SQL Server, SQL Development, Business Intelligence, Azure Cloud, and Power BI
  6. 6. Agenda • Why do we test? What’s the purpose? • What is an Enterprise Data Architecture? • Why are the challenges of testing in an EDA? • What solutions are exist for those challenges? • What challenges will likely always be there? • What are some best practices in general to work within the constraints we have?
  7. 7. Purpose of automatic testing • To verify that changes to a production system do not break earlier expected work • To preserve intent and self-document functionality • To automatically deploy a system once all tests successfully pass • To increase our rate of change and match the pace the business expects from us • To test things unattended so the QA department is not a bottleneck for product delivery 7
  8. 8. Testing: The epicenter of a deployment pipeline 8
  9. 9. What’s better at preserving intent? Comments, Source Control Comments, or Tests? • Source control comments are done AFTER the changes, not before or with the changes • Comments are often outdated, ignored, and can’t be reviewed at compilation • Tests fail when intent is violated, pointing to the code that failed, making it inviolate 9
  10. 10. Testing: Rate of change 10 Time it takes to make a change Time since project began
  11. 11. QA Dept as Bottleneck • Overtime, developers can outpace a QA team. More application surface area is built, there will be more to completely test to see the impact. • Even the smallest changes begin having the largest impact 11
  12. 12. Enterprise Data Architecture • Data Mart/Data Warehouse ODS • Data Lake Architecture • Lambda Architecture • Kappa Architecture • IoT Architecture 12
  13. 13. Data Mart/Data Warehouse/ODS 13
  14. 14. Data Lake Architecture on Azure 14
  15. 15. Lambda Architecture 15
  16. 16. Kappa Architecture 16
  17. 17. IoT Architecture 17
  18. 18. What do all of these architectures have in common? • Some move data more than others, but there is always an element of moving data from one place to the other • They have processes that drastically change the data so that it doesn’t resemble what it originally looked like • They store the exact same data several time. I’ve seen company EDAs that store the same source data seven times before it arrives at the final destination • I’ve seen companies that store different views of the same aggregation: Weekly, Monthly, Quarterly, MTD, YTD, YOY 18
  19. 19. What are the common challenges of testing these environments? Why do so many companies fail at automated testing and give up? 19
  20. 20. Well, it’s not their fault, and it’s not yours either • EDAs have some insane complexity • Most EDAs have all the things that software developers avoid when writing code 20
  21. 21. Problem #1: Testing the weather 21
  22. 22. Problem #2: Side-Effects Stored procedures always violate the DRY principle: Don’t repeat yourself They do this for a variety of reasons: • It is difficult to pass tables of data between procedures • It is easy to load a temp table up and perform 100s of operations against it • Stored procedures don’t have any of the components necessary to avoid repetition like inheritance, encapsulation, polymorphism, arrays, foreach loops, etc. All of this means we end up getting very large stored procedures with 1000s of lines that comprise a lot of our ETL 22
  23. 23. Problem #3: Queries take a long time to run • If a suite of tests take hours to run, then we can’t run them very often. • We will never have good test coverage of our architecture if we can’t run the tests in between changes to the system • Queries timeout • Queries against source systems are a bad idea, meaning we can’t tell if we got all the data with 100% accuracy 23
  24. 24. Problem #4: Source systems don’t really want us there If we’re pulling data from transactional systems, testing against the OLTP database creates a load. Create too much, and we’ll get kicked off the source, pronto.
  25. 25. Problem #5: All of these EDAs have a lot of repetitive data. Where do we test? • Do we test data from source to staging? From staging to ODS? From source to ODS? From source to DW? • With all the repetition, we might have the manufacturing problem that Taichi Ohno railed against while inventing lean manufacturing and Just-in-Time 25
  26. 26. Problem #6: Accuracy is deadly important • Test data is not always available • Production data must be 100% reconciled and accurate • Cash flow statements, reports to the board, SEC filings, auditing records, etc. • No mistakes, means no mistakes 26
  27. 27. Remember our goals with testing • Did a change break the system? • Are we preserving existing, needed functionality? • Can we deploy automatically when our tests run? 27
  28. 28. OK, that’s the bad news, now what can we do about it? • Testing Strategies • Testing Tools • Testing Best Practices 28
  29. 29. Testing Strategies • Are we all looking at the same requirements? • Write a test plan, execute it! • 100% test coverage is impossible, so what will we test? • Where will we test? • Where does it tend to break? Are we recording our outages and bugs? • Stay focused 29
  30. 30. It’s not a perfect world, but here are some things we can do • tSQLt • Approval Tests • Except Query Replacement Testing • QueryApprovals 30
  31. 31. 31
  32. 32. Demo: tSQLt 32
  33. 33. Demo: ApprovalTests 33
  34. 34. Demo: QueryApprovals No logo yet….we need one…send it over and we’ll put it on our GitHub and give you credit! 34
  35. 35. Demo: Except 35
  36. 36. Demo: CI/CD tests 36
  37. 37. 3rd Party Products (you need $$$) 37
  38. 38. Best practices for data testing • Test row counts • Test random and sampled data elements • Start with thorough manual testing and then slowly automate that • Alert in your audit log • Test records are streaming • Make testing fast! • Deleting tests is essential! • It’s absolutely ok to delete tests that aren’t working for you. It is not OK to give up on testing. 38
  39. 39. Report Testing • SSRS Testing • Power BI Testing • Push logic into the database • Push coloring into the database • Test at the database • JPG/PDF testing is still not mature enough to use • Test report data sizes and subscription dates 39
  40. 40. Develop a culture of testing There are teams that talk about testing in every sentence There are teams that treat testing as a burden or someone else’s problem Guess who write better products, has happy users, and enjoys making changes to their own system? Remember change lock. It will make you want to quit your job. Write tests to avoid change lock 40
  41. 41. Resources • QueryApproval Source Code: • • tSQLt • • Approval Test Demo Code • 41
  42. 42. Contact Me! YouTube San Diego Tech Immersion Group Twitter: @ike_ellis